Coding Agents III: Sandboxing & Best Practices

Pi has no built-in boundaries — it operates with full user permissions. Part 3 of our series shows how to run Coding Agents safely using Dev Containers, and shares hard-won best practices for productive agent workflows.
coding-agents
AI
DevOps
software-engineering
Author

Palaimon Team

Published

March 19, 2026

In the previous post, we explored how Coding Agents work under the hood — System Prompts, Tools, Skills, and the Agent Loop. We saw that Pi’s run_command tool gives the agent access to any CLI command on the system. That’s powerful. It’s also dangerous.

Pi, by default, has no boundaries. It operates with the full permissions of the user running it. If your user can delete system files, so can Pi. If your user can push to production, so can Pi. This is by design — Pi trusts the operator to set appropriate constraints. But it means that sandboxing is not optional; it’s essential.

Agent in a box

The Default Danger

Let’s be concrete about what can go wrong when you run an unbounded Coding Agent:

  • File system damage: The agent might overwrite or delete files outside the project directory — configuration files, SSH keys, or other projects.
  • Network access: The agent could make outbound requests to external services, potentially leaking source code or credentials.
  • Privilege escalation: If the agent runs under a user with sudo access, a misinterpreted instruction could lead to system-wide changes.
  • Git operations: The agent might push commits to the wrong branch, force-push, or modify .gitconfig.

These risks range from common (the agent writes to an unexpected file) to rare but catastrophic (the agent modifies system files). Sandboxing protects against the full spectrum.

Alternative Sandboxing Approaches

Dev Containers aren’t the only option. Here’s a technical comparison of the main approaches:

Approach Kernel Isolation GPU Access Filesystem Isolation Startup Time Agent Suitability
None (bare metal) ✗ None ✓ Native ✗ None 0s ⚠ Dangerous
Bubblewrap ◐ Namespace-level ✗ Not by default ✓ Bind-mount <1s ✓ Lightweight tasks
Docker Dev Container ✓ Namespace + cgroups ✓ nvidia-container-toolkit ✓ Bind-mount ~3 min (cold) / ~10s (warm) ✓✓ Recommended
QEMU/KVM VM ✓✓ Full hypervisor ✓ VFIO passthrough ✓✓ Full ~30s–2 min ✓ Heavy isolation

Bubblewrap (bwrap) is worth a special mention. It’s a lightweight sandboxing tool that uses Linux namespaces to isolate the agent’s filesystem view — the same engine that powers Flatpak for desktop applications. It starts in under a second and requires no Docker daemon. GPU access requires explicit bind-mount (e.g., --dev-bind /dev/dri); the isolation is namespace-level only (not a full VM). For CPU-only coding tasks, it’s an excellent lightweight alternative.

Note

Want the deep dive on Bubblewrap — how Pi configures it, why it’s safer than Docker, and its Flatpak heritage? Read our dedicated Bubblewrap post.

QEMU/KVM VMs provide the strongest isolation — a full hypervisor boundary — and support GPU passthrough via VFIO. We covered this approach in detail in our post on escaping CUDA dependency hell. For Coding Agents, VMs are overkill for most use cases, but they’re the right choice when you need the agent to work with kernel-level tools or when regulatory requirements demand full isolation.

Best Practices for Productive Agent Workflows

Sandboxing keeps you safe. But safety without productivity is just bureaucracy. Through extensive use of Coding Agents at Palaimon, we’ve found the following practices effective:

1. Break Tasks into Small, Reviewable Steps

Don’t ask the agent to “implement the entire authentication module.” Instead, break it down:

  • “Create the auth/models.py file with the User model”
  • “Add login and logout views to auth/views.py
  • “Write tests for the login flow in auth/tests.py
  • “Update the URL configuration to include the auth routes”

Smaller tasks mean smaller diffs, easier reviews, and faster course correction when the agent goes off track.

2. Review Every Generated Change

The agent is fast, but it’s not infallible. Before accepting any change:

  • Read the diff carefully
  • Run the test suite
  • Check for unintended side effects (extra files, modified configs)
  • Verify that the change matches the intent of your instruction

Treat agent-generated code with the same scrutiny you’d apply to a junior developer’s pull request. Beware of review fatigue: as you review more agent output, the temptation to rubber-stamp increases. Combat this by occasionally reviewing the final code without the diff view — read it as if a human wrote it.

3. Clear Working Memory Between Features

Pi maintains a working memory (conversation context) across the session. As the context grows, the agent’s performance can degrade — it may confuse earlier instructions with current ones, or lose track of the current task.

After completing a feature, start a fresh conversation or clear the context. This gives the agent a clean slate for the next task and prevents cross-contamination between features.

Pi’s status bar shows the current context size — use it as a gauge. The principle (clear context between features) is universal even if the mechanism differs by agent.

4. Avoid Full-Spectral Development

“Full-spectral development” — asking the agent to design, implement, test, and deploy a feature in one go — sounds appealing but has proven expensive and unreliable in practice. The agent’s context window fills up, errors compound, and the resulting code is harder to review because the diff is enormous.

Instead, use an iterative, supervised workflow:

  1. Define the task clearly
  2. Let the agent implement one step
  3. Review the result
  4. Provide feedback or move to the next step
  5. Repeat

This may seem to contradict the autonomy we celebrated in Part 1 — and in a sense, it does. The promise of Coding Agents is autonomous execution, but practice shows that supervised autonomy outperforms unsupervised autonomy. The agent is most powerful when you direct its autonomy toward well-scoped tasks. This is slower in wall-clock time per step, but faster in time to correct, production-ready code — because you catch errors early and keep the agent on track.

NoteThe Iteration Payoff

Think of it as a control system: frequent feedback keeps the agent’s output close to the desired trajectory. Without feedback, small errors accumulate — and by the time you review, the agent may have built an entire edifice on a flawed foundation. Short iterations are your error correction signal.

The Bottom Line

Coding Agents are powerful tools, but power without boundaries is a liability. Dev Containers provide the right balance of isolation and usability for most development workflows. Combined with disciplined task decomposition, regular reviews, and context management, they enable you to harness the agent’s capabilities without putting your system at risk.

You know what Coding Agents are, how they work, and how to run them safely. But for companies, the real question is strategic: ban them, buy them, or build internal expertise? In the next post, we’ll explore the three paths enterprises can take.