Agent-readable docs
sandboxed Documentation
Reference for the self-hosted Docker control plane (sandboxd + Traefik) that provisions isolated dev sandboxes, coding-agent tasks, and preview URLs for AI app-builder backends.
Pages
- OverviewWhat sandboxed exposes (sandboxd API, Traefik previews, runtimed tasks), runtime assumptions (Docker, Linux, SQLite), and the shortest create → task → preview path.
- InstallationPrerequisites (Docker Engine + Compose on Linux), ./install.sh steps, .env bootstrap, base-image and control-plane build, compose up, and healthz/readyz verification.
- QuickstartCopy-paste flow: POST /sandbox with ports, POST /v1/sandboxes/{id}/tasks, stream SSE events, open s-{id}-{port}.preview.{domain}, and optional env injection for provider keys.
- Sandbox lifecycleSQLite-backed status machine (creating, running, stopped, error), container naming (s-{ulid}), reconcile-on-boot, and destroy vs purge semantics.
- Preview routingTraefik Docker labels, Host rules (s-{id}-{port}.preview.{domain}), router priority 100 vs wake catch-all priority 1, PREVIEW_DOMAIN/ENTRYPOINT/TLS, and sandboxed.managed constraint.
- Wake, idle, and pressureStop-on-idle (SANDBOXD_IDLE_THRESHOLD_SECONDS), wake-on-preview (catch-all → sandboxd), memory admission/refusal, pressure reaper, keepalive, and warming-page behavior.
- Workspaces and isolationPer-sandbox bind mounts under SANDBOXED_DATA_DIR/workspaces, skeleton seeding, read-only rootfs and caps, memory/PID limits, userns=host default, and v1 storage trade-offs.
- Run coding agentsSubmit tasks via POST /v1/sandboxes/{id}/tasks (prompt, agent default opencode), wake-on-submit, SSE on /events, env injection at create, and runtimed socket contract.
- Manage sandboxesOperational workflows: create (ports, env, template), exec, keepalive, POST /v1/sandboxes/{id}/stop, DELETE vs POST purge, claim, and external-user purge hooks.
- API authenticationService-token auth (SANDBOXD_API_TOKENS, Authorization: Bearer), SANDBOXD_API_AUTH_DISABLED rollback, SIGHUP env reload, loopback exemptions, and LAN exposure of SANDBOXED_API_BIND.
- Private previewsvisibility=private sandboxes, Traefik forwardAuth to /forward-auth, preview tokens (SANDBOXD_PREVIEW_TOKEN_SECRETS), /preview-auth redirect flow, and deny modes.
- Production deploymentWildcard DNS, traefik websecure + cert resolver, PREVIEW_TLS=true, enable API auth, hardening checklist (isolation, egress, disk), and scaling boundaries from README.
Complete Markdown
# sandboxed Documentation
> Reference for the self-hosted Docker control plane (sandboxd + Traefik) that provisions isolated dev sandboxes, coding-agent tasks, and preview URLs for AI app-builder backends.
## Context Links
- [Agent index](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/llms.txt)
- [Human interactive docs](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0)
- [GitHub repository](https://github.com/tastyeffectco/sandboxes)
## Repository Metadata
- Repository: tastyeffectco/sandboxes
- Generated: 2026-06-04T22:47:33.412Z
- Updated: 2026-06-04T23:08:46.182Z
- Runtime: Grok CLI
- Format: Documentation
- Pages: 24
## Page Index
- 01. [Overview](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/01-overview.md) - What sandboxed exposes (sandboxd API, Traefik previews, runtimed tasks), runtime assumptions (Docker, Linux, SQLite), and the shortest create → task → preview path.
- 02. [Installation](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/02-installation.md) - Prerequisites (Docker Engine + Compose on Linux), ./install.sh steps, .env bootstrap, base-image and control-plane build, compose up, and healthz/readyz verification.
- 03. [Quickstart](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/03-quickstart.md) - Copy-paste flow: POST /sandbox with ports, POST /v1/sandboxes/{id}/tasks, stream SSE events, open s-{id}-{port}.preview.{domain}, and optional env injection for provider keys.
- 04. [Sandbox lifecycle](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/04-sandbox-lifecycle.md) - SQLite-backed status machine (creating, running, stopped, error), container naming (s-{ulid}), reconcile-on-boot, and destroy vs purge semantics.
- 05. [Preview routing](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/05-preview-routing.md) - Traefik Docker labels, Host rules (s-{id}-{port}.preview.{domain}), router priority 100 vs wake catch-all priority 1, PREVIEW_DOMAIN/ENTRYPOINT/TLS, and sandboxed.managed constraint.
- 06. [Wake, idle, and pressure](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/06-wake-idle-and-pressure.md) - Stop-on-idle (SANDBOXD_IDLE_THRESHOLD_SECONDS), wake-on-preview (catch-all → sandboxd), memory admission/refusal, pressure reaper, keepalive, and warming-page behavior.
- 07. [Workspaces and isolation](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/07-workspaces-and-isolation.md) - Per-sandbox bind mounts under SANDBOXED_DATA_DIR/workspaces, skeleton seeding, read-only rootfs and caps, memory/PID limits, userns=host default, and v1 storage trade-offs.
- 08. [Run coding agents](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/08-run-coding-agents.md) - Submit tasks via POST /v1/sandboxes/{id}/tasks (prompt, agent default opencode), wake-on-submit, SSE on /events, env injection at create, and runtimed socket contract.
- 09. [Manage sandboxes](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/09-manage-sandboxes.md) - Operational workflows: create (ports, env, template), exec, keepalive, POST /v1/sandboxes/{id}/stop, DELETE vs POST purge, claim, and external-user purge hooks.
- 10. [API authentication](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/10-api-authentication.md) - Service-token auth (SANDBOXD_API_TOKENS, Authorization: Bearer), SANDBOXD_API_AUTH_DISABLED rollback, SIGHUP env reload, loopback exemptions, and LAN exposure of SANDBOXED_API_BIND.
- 11. [Private previews](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/11-private-previews.md) - visibility=private sandboxes, Traefik forwardAuth to /forward-auth, preview tokens (SANDBOXD_PREVIEW_TOKEN_SECRETS), /preview-auth redirect flow, and deny modes.
- 12. [Production deployment](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/12-production-deployment.md) - Wildcard DNS, traefik websecure + cert resolver, PREVIEW_TLS=true, enable API auth, hardening checklist (isolation, egress, disk), and scaling boundaries from README.
- 13. [Control plane API (legacy)](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/13-control-plane-api-legacy.md) - Internal /sandbox* routes: create/list/get, exec, keepalive, wake JSON, per-sandbox snapshots, purge/claim, healthz/readyz, metrics, GET /llm.txt integrator contract.
- 14. [v1 API reference](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/14-v1-api-reference.md) - Public /v1/sandboxes and /v1/snapshots: request/response shapes, error envelope (code, message, retryable), files CRUD, export zip, task lifecycle states, and template spin-up.
- 15. [Configuration reference](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/15-configuration-reference.md) - Compose-backed env keys: preview domain/ports, SANDBOXED_DATA_DIR, API bind, auth tokens, idle/reaper/memory wake tuning, templates/library paths, and advanced cgroup toggles.
- 16. [Preview URL reference](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/16-preview-url-reference.md) - Hostname pattern s-{ulid}-{port}.preview.{PREVIEW_DOMAIN}, HTTP_PORT suffix rules, localhost vs production HTTPS, and Traefik router/service naming.
- 17. [runtimed reference](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/17-runtimed-reference.md) - In-sandbox supervisor HTTP over Unix socket: GET /status, POST /tasks, GET /tasks/{id}/events (SSE), POST /tasks/{id}/cancel; workspace paths and sandboxd runtime.Client bridge.
- 18. [Health and metrics](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/18-health-and-metrics.md) - GET /healthz and /readyz semantics, Prometheus GET /metrics labels, audit/access logging paths, and docker compose logs for sandboxd and Traefik.
- 19. [Build a todo app with an agent](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/19-build-a-todo-app-with-an-agent.md) - End-to-end recipe: create sandbox on port 3000, submit opencode task prompt, stream task events, verify preview URL, optional ANTHROPIC_API_KEY via env at create.
- 20. [Exec a dev server preview](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/20-exec-a-dev-server-preview.md) - Recipe without tasks API: POST /sandbox/{id}/exec to start a server on an exposed port, wake stopped sandboxes via preview hit, and curl with Host header locally.
- 21. [Troubleshooting](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/21-troubleshooting.md) - readyz/docker socket failures, port 80 conflicts (HTTP_PORT), ULID validation, warming-page stalls, userns-remap seed errors, preview spin-up, and compose log probes.
- 22. [Control plane development](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/22-control-plane-development.md) - Go 1.22+ build/test/vet in control-plane/, CGO sqlite note, compose --build loop, package map (docker, store, reaper, wake, api), and image build cache behavior.
- 23. [Uninstall and maintenance](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/23-uninstall-and-maintenance.md) - uninstall.sh flags (--images, --data, --all), managed-container cleanup, workspace retention defaults, docker compose ps/logs/restart sandboxd, and backup paths for SQLite and workspaces.
- 24. [Contributing](https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/24-contributing.md) - Project layout, design constraints (Docker-only core, sqlite truth, docker CLI shell-out), issue report fields, and extension boundaries for integrators.
## Source File Index
- `.env.example`
- `AGENTS.md`
- `ARCHITECTURE.md`
- `CONTRIBUTING.md`
- `control-plane/cmd/runtimed/server.go`
- `control-plane/cmd/runtimed/task.go`
- `control-plane/cmd/sandboxd/main.go`
- `control-plane/Dockerfile`
- `control-plane/go.mod`
- `control-plane/internal/api/api.go`
- `control-plane/internal/api/external_purge.go`
- `control-plane/internal/api/forward_auth.go`
- `control-plane/internal/api/handlers.go`
- `control-plane/internal/api/llmtxt.go`
- `control-plane/internal/api/preview_auth.go`
- `control-plane/internal/api/taskwatch.go`
- `control-plane/internal/api/v1_files_write.go`
- `control-plane/internal/api/v1_files.go`
- `control-plane/internal/api/v1_snapshots.go`
- `control-plane/internal/api/v1_tasks.go`
- `control-plane/internal/api/v1.go`
- `control-plane/internal/audit/audit.go`
- `control-plane/internal/auth/config.go`
- `control-plane/internal/auth/middleware.go`
- `control-plane/internal/auth/preview_token.go`
- `control-plane/internal/auth/token.go`
- `control-plane/internal/docker/docker.go`
- `control-plane/internal/egress/nftables.go`
- `control-plane/internal/loopback/loopback.go`
- `control-plane/internal/metrics/metrics.go`
- `control-plane/internal/reaper/idle.go`
- `control-plane/internal/reaper/pressure.go`
- `control-plane/internal/reconcile/reconcile.go`
- `control-plane/internal/runtime/client.go`
- `control-plane/internal/store/store.go`
- `control-plane/internal/store/writer.go`
- `control-plane/internal/traefik/traefik.go`
- `control-plane/internal/wake/handler.go`
- `control-plane/migrations/0001_init.sql`
- `control-plane/migrations/0005_tasks.sql`
- `control-plane/migrations/0009_snapshots.sql`
- `control-plane/README.md`
- `docker-compose.yml`
- `image/build.sh`
- `image/Dockerfile`
- `image/HOME_LAYOUT.md`
- `image/README.md`
- `image/skel/.profile`
- `install.sh`
- `LICENSE`
- `README.md`
- `traefik/dynamic/auth.yml`
- `traefik/dynamic/wake.yml`
- `traefik/traefik.yml`
- `uninstall.sh`
---
## 01. Overview
> What sandboxed exposes (sandboxd API, Traefik previews, runtimed tasks), runtime assumptions (Docker, Linux, SQLite), and the shortest create → task → preview path.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/01-overview.md
- Generated: 2026-06-04T22:41:42.156Z
### Source Files
- `README.md`
- `ARCHITECTURE.md`
- `docker-compose.yml`
- `AGENTS.md`
- `control-plane/cmd/sandboxd/main.go`
- `control-plane/internal/api/api.go`
---
title: "Overview"
description: "What sandboxed exposes (sandboxd API, Traefik previews, runtimed tasks), runtime assumptions (Docker, Linux, SQLite), and the shortest create → task → preview path."
---
sandboxed is a single-host stack: **sandboxd** (Go control plane) shells out to the Docker daemon, **Traefik** routes preview hostnames to sandbox containers, and each **sandbox** is a sibling container built from `sandboxed-base:1.0.0` with **runtimed** as its main process. SQLite at `${SANDBOXED_DATA_DIR}/state/sandboxd.db` is the source of truth; workspaces persist under `${SANDBOXED_DATA_DIR}/workspaces/<id>/`. The default API bind is `127.0.0.1:9090` (container listens on `:9000`).
## What you integrate against
Three surfaces matter for product backends and agents:
| Surface | Default reachability | Role |
|---|---|---|
| **sandboxd HTTP API** | `http://127.0.0.1:9090` (`SANDBOXED_API_BIND`) | Create sandboxes, exec, files, stop/destroy, tasks (v1), health |
| **Traefik preview URLs** | `http://s-<id>-<port>.preview.<domain>[:HTTP_PORT]` | Browser traffic to dev servers inside sandboxes; wake when stopped |
| **runtimed (in-sandbox)** | Unix socket at `workspaces/<id>/.runtimed/sock` | Task supervisor; reached by sandboxd via `internal/runtime.Client`, not directly from the host API |
Legacy routes (`POST /sandbox`, `GET /sandboxes`, …) and the public **v1** layer (`POST /v1/sandboxes`, `POST /v1/sandboxes/{id}/tasks`, …) share one listener. Service-token auth wraps the API mux only; the Traefik wake catch-all stays unauthenticated (private sandboxes gate inside the wake handler).
<Info>
Auth is open by default (`SANDBOXD_API_AUTH_DISABLED=true`). Loopback callers still work when tokens are required; enable tokens before binding the API on a LAN.
</Info>
## Runtime assumptions
| Requirement | Detail |
|---|---|
| **Host OS** | Linux (install and compose target a Linux Docker host) |
| **Container runtime** | Docker Engine + Compose plugin (`docker compose`) |
| **Network** | Shared bridge `${SANDBOXED_NETWORK:-sandboxed_net}`; sandboxes join it so Traefik can route |
| **State** | SQLite WAL at `state/sandboxd.db`; boot **reconciler** converges Docker to DB rows |
| **Data dir** | Absolute `SANDBOXED_DATA_DIR` (default `/var/lib/sandboxed`), bind-mounted host:container symmetrically |
| **Preview DNS** | `PREVIEW_DOMAIN=localhost` works locally (`*.localhost` → 127.0.0.1); production uses a wildcard domain + optional TLS |
Infra containers (`traefik`, `sandboxd`) use `userns_mode: host` so workspace ownership stays deterministic when the daemon uses `userns-remap`. Sandboxes default to `SANDBOXED_USERNS=host` for the same reason.
## Stack layout
```mermaid
flowchart TB
subgraph host["Host — Docker daemon"]
subgraph edge["Edge"]
Traefik["traefik:v3\nDocker + file providers"]
end
subgraph cp["Control plane"]
sandboxd["sandboxd\n:9000 internal\npublished as SANDBOXED_API_BIND"]
SQLite[("sandboxd.db\nSQLite WAL")]
end
subgraph sb["Per-sandbox (runtime)"]
Container["s-{ulid}\nsandboxed-base:1.0.0"]
runtimed["runtimed\n.runtimed/sock"]
WS["workspace bind mount\n/home/sandbox"]
end
Data["SANDBOXED_DATA_DIR\nworkspaces/ + state/"]
end
Browser --> Traefik
API["API / CLI"] --> sandboxd
Traefik -->|"priority 100 Host rule"| Container
Traefik -->|"priority 1 catch-all → sandboxd"| sandboxd
sandboxd --> SQLite
sandboxd -->|"docker CLI"| Container
sandboxd -->|"runtime.Client unix"| runtimed
Container --> WS
Data --- WS
Data --- SQLite
```
**sandboxd** owns lifecycle (create, exec, stop, destroy, purge), workspace seeding from `/opt/sandbox-skel`, Traefik label emission, idle and memory **reapers**, and the **wake** path. **runtimed** supervises dev servers and runs coding tasks (OpenCode is the supported v1 agent). **Traefik** scopes routing to containers labeled `sandboxed.managed=true`.
## Sandbox status model
Rows in the `sandbox` table use:
| Status | Meaning |
|---|---|
| `creating` | Provision in flight; `container_id` may be NULL |
| `running` | Container up; preview routers at Traefik priority 100 |
| `stopped` | `docker stop` succeeded; workspace retained; wake on next preview or task |
| `error` | Last failure recorded in `error_message`; needs operator attention |
Tasks have their own lifecycle (`running` → `succeeded` | `failed` | `cancelled`) in SQLite, independent of sandbox stop.
## Shortest path: create → task → preview
<Steps>
<Step title="Install the stack">
On a Linux host with Docker: clone the repo, run `./install.sh`, then verify:
```bash
curl -s http://127.0.0.1:9090/healthz # ok
curl -s http://127.0.0.1:9090/readyz # ready
```
</Step>
<Step title="Create a sandbox with an exposed port">
<RequestExample>
```bash
curl -s -XPOST http://127.0.0.1:9090/sandbox \
-H 'content-type: application/json' \
-d '{"ports":[3000]}'
```
</RequestExample>
Omit `id` to auto-generate a ULID. Optional `env` injects variables (e.g. provider API keys) into the container for agents and shells.
</Step>
<Step title="Submit a coding task">
<RequestExample>
```bash
curl -s -XPOST http://127.0.0.1:9090/v1/sandboxes/$ID/tasks \
-H 'content-type: application/json' \
-d '{"prompt":"create a Vite todo app and run it on port 3000","agent":"opencode"}'
```
</RequestExample>
<ResponseExample>
```json
{
"id": "<taskId>",
"sandbox_id": "<id>",
"status": "running",
"agent": "opencode",
"events_url": "/v1/sandboxes/<id>/tasks/<taskId>/events"
}
```
</ResponseExample>
If the sandbox is `stopped`, sandboxd **wake-on-task-submit** runs the internal wake path before calling runtimed. Default agent is `opencode` when omitted.
</Step>
<Step title="Stream task progress (optional)">
```bash
curl -N http://127.0.0.1:9090/v1/sandboxes/$ID/tasks/$TASK_ID/events
```
Server-Sent Events; the API mux flushes streaming responses for live output.
</Step>
<Step title="Open the preview URL">
```
http://s-<id>-3000.preview.localhost
```
Add `:${HTTP_PORT}` when `HTTP_PORT` is not `80`. First hit to a stopped sandbox may show the warming page until Traefik switches from the catch-all (priority 1) to the container router (priority 100).
</Step>
</Steps>
```mermaid
sequenceDiagram
participant Client as API client
participant SD as sandboxd
participant DB as SQLite
participant RT as runtimed
participant TR as Traefik
participant BR as Browser
Client->>SD: POST /sandbox {"ports":[3000]}
SD->>DB: insert row (creating → running)
SD-->>Client: id, status
Client->>SD: POST /v1/sandboxes/{id}/tasks
alt sandbox stopped
SD->>SD: wake (docker start)
end
SD->>RT: POST /tasks (unix socket)
SD->>DB: task row running
SD-->>Client: task id, events_url
Client->>SD: GET .../events (SSE)
SD->>RT: stream events
BR->>TR: GET s-{id}-3000.preview.localhost
TR->>SD: catch-all if stopped
SD->>SD: docker start, warming page
TR->>RT: proxy to dev server :3000
```
<Tip>
You can skip the tasks API and use `POST /sandbox/{id}/exec` to start a dev server, then open the same preview hostname. See the exec-based example page.
</Tip>
## Control-plane API map (high level)
| Method & path | Purpose |
|---|---|
| `POST /sandbox` or `POST /v1/sandboxes` | Create sandbox (`ports`, optional `env`, optional `id`) |
| `POST /v1/sandboxes/{id}/tasks` | Run coding agent via runtimed |
| `GET /v1/sandboxes/{id}/tasks/{taskId}/events` | SSE task stream |
| `POST /sandbox/{id}/exec` | Non-interactive command in container |
| `POST /v1/sandboxes/{id}/stop` | Stop now (free RAM); wake on next preview |
| `DELETE /sandbox/{id}` | Destroy container, keep workspace |
| `POST /sandbox/{id}/purge` | Destroy container and delete workspace |
| `GET /healthz`, `GET /readyz` | Liveness / readiness (ready checks Docker reachability) |
v1 responses use a structured error envelope (`code`, `message`, `retryable`) for integrators; legacy routes remain for internal and operator tooling.
## Preview routing in one glance
- **Hostname pattern:** `s-{ulid}-{port}.preview.{PREVIEW_DOMAIN}`
- **Running sandbox:** Docker labels from sandboxd register a Traefik router at **priority 100** with `Host(...)` matching that name.
- **Stopped sandbox:** No priority-100 router → file-provider catch-all (`traefik/dynamic/wake.yml`, **priority 1**) forwards to `http://sandboxd:9000`, which starts the container and serves a warming page until the app port is ready.
## Persistence and isolation (summary)
| Class | Location | Survives stop? |
|---|---|---|
| Workspace files | `workspaces/<id>/` bind-mounted at `/home/sandbox` | Yes |
| Control-plane state | `state/sandboxd.db` | Yes |
| Container writable layer | None (`--read-only` rootfs, tmpfs for `/tmp`) | No |
Sandboxes run with hardened defaults: cap-drop ALL, `no-new-privileges`, read-only rootfs, memory and PID limits. The v1 threat model targets **authenticated, accountable users** on a dedicated host—not anonymous multi-tenant hostile code without stronger isolation (VM, gVisor, etc.).
## Default configuration knobs
| Variable | Default | Effect |
|---|---|---|
| `PREVIEW_DOMAIN` | `localhost` | Preview hostname suffix |
| `HTTP_PORT` | `80` | Host port Traefik publishes |
| `SANDBOXED_API_BIND` | `127.0.0.1:9090` | Published control-plane API |
| `SANDBOXED_DATA_DIR` | `/var/lib/sandboxed` | Workspaces + SQLite |
| `SANDBOXD_IDLE_THRESHOLD_SECONDS` | `2100` | Idle stop threshold (~35 min) |
| `SANDBOXD_API_AUTH_DISABLED` | `true` | API tokens optional locally |
## Related pages
<CardGroup>
<Card title="Installation" href="/installation">
Prerequisites, install.sh, compose up, and healthz/readyz checks.
</Card>
<Card title="Quickstart" href="/quickstart">
Copy-paste create, task, SSE, and preview flow.
</Card>
<Card title="Run coding agents" href="/run-coding-agents">
Tasks API, wake-on-submit, env injection, and runtimed contract.
</Card>
<Card title="Preview routing" href="/preview-routing">
Traefik priorities, Host rules, and wake catch-all.
</Card>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
Status machine, reconcile-on-boot, destroy vs purge.
</Card>
</CardGroup>
---
## 02. Installation
> Prerequisites (Docker Engine + Compose on Linux), ./install.sh steps, .env bootstrap, base-image and control-plane build, compose up, and healthz/readyz verification.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/02-installation.md
- Generated: 2026-06-04T22:41:28.343Z
### Source Files
- `install.sh`
- `.env.example`
- `docker-compose.yml`
- `image/build.sh`
- `image/Dockerfile`
- `control-plane/Dockerfile`
---
title: "Installation"
description: "Prerequisites (Docker Engine + Compose on Linux), ./install.sh steps, .env bootstrap, base-image and control-plane build, compose up, and healthz/readyz verification."
---
`./install.sh` brings up the full sandboxed stack on a single Linux host: it validates Docker and Compose, bootstraps `.env`, builds the `sandboxed-base` image and the `sandboxd` control-plane image, creates the data directory, and runs `docker compose up -d` for Traefik plus `sandboxd`. The control-plane API is published at `SANDBOXED_API_BIND` (default `127.0.0.1:9090`); preview traffic enters Traefik on `HTTP_PORT` (default `80`).
## Prerequisites
| Requirement | Details |
|---|---|
| Host OS | Linux with a standard Docker Engine daemon |
| Compose | Compose v2 plugin (`docker compose`) preferred; `docker-compose` standalone is accepted as a fallback |
| Docker access | Your user can run `docker info`, or `install.sh` falls back to `sudo docker` |
| Git checkout | Clone the repository; the installer only modifies files inside the repo and under `SANDBOXED_DATA_DIR` |
<Warning>
sandboxed targets a **single Docker host**. The control plane shells out to the host daemon over `/var/run/docker.sock` (mounted into both Traefik and `sandboxd`). macOS or Windows Docker Desktop is not the supported install path in this repository.
</Warning>
No Go toolchain is required on the host: the sandbox base image compiles `runtimed` inside Docker, and the control-plane image compiles `sandboxd` with CGO for SQLite.
## Install command
<Steps>
<Step title="Clone the repository">
```bash
git clone https://github.com/tastyeffectco/sandboxes.git
cd sandboxes
```
</Step>
<Step title="Run the installer">
```bash
./install.sh
```
The script is idempotent: re-running leaves an existing `.env` untouched, rebuilds images as needed, and restarts the compose stack.
</Step>
<Step title="Verify health endpoints">
```bash
curl -s http://127.0.0.1:9090/healthz
curl -s http://127.0.0.1:9090/readyz
```
<Check>
Expected responses: `ok` from `/healthz` and `ready` from `/readyz` (both HTTP 200). Adjust the host/port if you changed `SANDBOXED_API_BIND` in `.env`.
</Check>
</Step>
</Steps>
On success, the installer prints the API bind address, preview URL pattern, a sample `POST /sandbox` command, and `docker compose logs -f sandboxd` for follow-up.
## What install.sh does
`install.sh` runs six phases in order:
1. **Docker / Compose detection** — Probes `docker info`; if the current user cannot reach the daemon, retries with `sudo docker`. Selects `docker compose` when available, otherwise `docker-compose`.
2. **`.env` bootstrap** — Copies `.env.example` → `.env` only when `.env` is missing; never overwrites an existing file.
3. **Environment load** — Sources `.env` to resolve `SANDBOXED_DATA_DIR`, `SANDBOXED_LOG_DIR`, and `SANDBOXED_IMAGE` (default `sandboxed-base:1.0.0`).
4. **Data directory** — Creates `SANDBOXED_DATA_DIR` and `SANDBOXED_LOG_DIR` (with `sudo` when the parent is not writable). Sets log dir mode `0777` so Traefik can write the access log.
5. **Base image build** — Invokes `image/build.sh` with the image tag suffix (for example `1.0.0`). First build typically takes several minutes; later runs use the Docker layer cache.
6. **Stack build and start** — Runs `docker compose build` then `docker compose up -d`.
<Note>
The installer never deletes workspaces or SQLite state. It only creates the configured data paths and starts compose services.
</Note>
## Stack layout after compose up
```mermaid
flowchart TB
subgraph host["Linux host — Docker daemon"]
subgraph compose["docker compose stack"]
traefik["traefik:v3\nHTTP :80 → host HTTP_PORT"]
sandboxd["sandboxd\nAPI :9000 → host SANDBOXED_API_BIND"]
end
sock["/var/run/docker.sock"]
data["SANDBOXED_DATA_DIR\nworkspaces/ · state/ · log/"]
traefik --> sock
sandboxd --> sock
sandboxd --> data
traefik --> data
end
browser["Browser / API client"] --> traefik
browser --> sandboxd
sandboxd -.->|"docker run (runtime)"| sandbox["s-{ulid} containers\nfrom SANDBOXED_IMAGE"]
sandbox --> traefik
```
| Service | Image / build | Host port mapping | Key mounts |
|---|---|---|---|
| `traefik` | `traefik:v3` | `${HTTP_PORT:-80}:80` | Docker socket (ro), `./traefik/`, `${SANDBOXED_LOG_DIR}` |
| `sandboxd` | `build: ./control-plane` → `sandboxed-control-plane:1.0.0` | `${SANDBOXED_API_BIND:-127.0.0.1:9090}:9000` | Docker socket, `${SANDBOXED_DATA_DIR}`, `${SANDBOXED_LOG_DIR}` |
Both services set `userns_mode: host` so infrastructure containers keep working when the daemon uses `userns-remap`. Per-sandbox containers are launched at runtime by `sandboxd`, not declared as compose services.
Sandboxes join the `${SANDBOXED_NETWORK:-sandboxed_net}` bridge network so Traefik can route preview hostnames to them.
## `.env` bootstrap
`install.sh` creates `.env` from `.env.example` when absent. All keys have defaults; an empty `.env` is valid.
Edit `.env` **before** re-running install (or restart the stack with `docker compose up -d` after changes). Keys most often adjusted on first install:
| Variable | Default | Effect |
|---|---|---|
| `HTTP_PORT` | `80` | Host port published for Traefik HTTP previews |
| `SANDBOXED_API_BIND` | `127.0.0.1:9090` | Where `sandboxd` is reachable on the host |
| `PREVIEW_DOMAIN` | `localhost` | Hostname suffix for preview URLs (`s-{id}-{port}.preview.{domain}`) |
| `SANDBOXED_DATA_DIR` | `/var/lib/sandboxed` | Workspaces, SQLite (`state/sandboxd.db`), and logs |
| `SANDBOXED_IMAGE` | `sandboxed-base:1.0.0` | Image tag passed to `image/build.sh` and sandbox `docker run` |
| `SANDBOXD_API_AUTH_DISABLED` | `true` | API open on loopback by default; set `false` + tokens for LAN exposure |
<ParamField body="SANDBOXED_DATA_DIR" type="absolute path" required>
Must be an absolute path. Compose bind-mounts the same host path into `sandboxd` so workspace paths written by the control plane resolve correctly when sibling sandboxes are created on the host daemon.
</ParamField>
Additional keys (`PREVIEW_ENTRYPOINT`, `PREVIEW_TLS`, `SANDBOXD_IDLE_THRESHOLD_SECONDS`, auth tokens, and cgroup toggles) are documented on the configuration reference page.
## Image builds
### Sandbox base image (`sandboxed-base`)
`image/build.sh` builds from the **repository root** (not `image/` alone) so the Dockerfile can compile `runtimed` from `control-plane/`:
```bash
DOCKER="${DOCKER:-docker}" SANDBOXED_IMAGE="${SANDBOXED_IMAGE:-sandboxed-base:1.0.0}" \
bash image/build.sh 1.0.0
```
The Dockerfile uses two stages:
- **Stage 1** — `golang:1.22-bookworm` builds a static `runtimed` binary (`CGO_ENABLED=0`).
- **Stage 2** — `debian:stable-slim` installs Node, pnpm, uv, bun, Claude Code, OpenCode, and sets `CMD ["/usr/local/bin/runtimed"]` under `tini`.
Native architecture is built automatically (arm64 and amd64 hosts are supported). Override the tag with `SANDBOXED_IMAGE` or the version argument to `build.sh`.
### Control plane image (`sandboxd`)
`docker compose build` uses `control-plane/Dockerfile`:
- **Stage 1** — `CGO_ENABLED=1` build of `sandboxd` (SQLite via `mattn/go-sqlite3`).
- **Stage 2** — `debian:stable-slim` plus `docker-ce-cli`; migrations copied to `/usr/local/share/sandboxd/migrations/`.
Published compose image name: `sandboxed-control-plane:1.0.0`. Container listens on **9000** internally; the host mapping comes from `SANDBOXED_API_BIND`.
<Info>
You do not need Go installed on the host for either image. Rebuild only the control plane after API changes with `docker compose build sandboxd && docker compose up -d sandboxd`.
</Info>
## Health and readiness
After the stack is up, probe the control plane on the configured API bind:
:::endpoint GET /healthz
Liveness: process is serving HTTP. No dependency checks.
<ResponseExample>
```http
HTTP/1.1 200 OK
ok
```
</ResponseExample>
:::
:::endpoint GET /readyz
Readiness: SQLite is reachable **and** `docker info` succeeds against the mounted host socket.
<ResponseExample>
```http
HTTP/1.1 200 OK
ready
```
</ResponseExample>
On failure returns HTTP 503 with a JSON error body (`sqlite ping: …` or `docker info: …`). Auth middleware exempts both endpoints.
:::
<CodeGroup>
```bash title="healthz"
curl -s -w "\nHTTP %{http_code}\n" http://127.0.0.1:9090/healthz
```
```bash title="readyz"
curl -s -w "\nHTTP %{http_code}\n" http://127.0.0.1:9090/readyz
```
</CodeGroup>
| Endpoint | Pass signal | Typical failure |
|---|---|---|
| `/healthz` | Body `ok`, status 200 | Stack not started or wrong `SANDBOXED_API_BIND` |
| `/readyz` | Body `ready`, status 200 | Docker socket not mounted, daemon down, or SQLite path not writable |
<Warning>
`/healthz` succeeding while `/readyz` returns 503 usually means `sandboxd` is running but cannot talk to Docker or open its database under `SANDBOXED_DATA_DIR`. Check `docker compose logs sandboxd` and that `/var/run/docker.sock` is mounted in the `sandboxd` service.
</Warning>
## Post-install URLs
With defaults from `.env.example`:
| Surface | URL pattern |
|---|---|
| Control-plane API | `http://127.0.0.1:9090` |
| Preview (port 3000 example) | `http://s-{id}-3000.preview.localhost` (append `:${HTTP_PORT}` when `HTTP_PORT` ≠ `80`) |
Browsers resolve `*.localhost` to `127.0.0.1` without extra DNS. For a first sandbox, omit `id` in `POST /sandbox` to receive an auto-generated ULID.
## Common install adjustments
<Tabs>
<Tab title="Port 80 in use">
Set `HTTP_PORT=8088` (or another free port) in `.env`, then:
```bash
docker compose up -d
```
Preview URLs must include the port suffix, for example `http://s-{id}-3000.preview.localhost:8088`.
</Tab>
<Tab title="Expose API on LAN">
Set `SANDBOXED_API_BIND=0.0.0.0:9090` and enable token auth:
```env
SANDBOXD_API_AUTH_DISABLED=false
SANDBOXD_API_TOKENS=myapp:your-secret-token
```
Clients must send `Authorization: Bearer your-secret-token` on protected routes.
</Tab>
<Tab title="Custom data directory">
Set `SANDBOXED_DATA_DIR` and matching `SANDBOXED_LOG_DIR` in `.env` before install. The directory must be absolute and writable (installer uses `sudo` when needed).
</Tab>
</Tabs>
## Manual compose (without install.sh)
Equivalent steps if you manage the process yourself:
```bash
cp .env.example .env # if missing
bash image/build.sh 1.0.0
docker compose build
docker compose up -d
```
Ensure `SANDBOXED_DATA_DIR` and `SANDBOXED_LOG_DIR` exist with Traefik log dir permissions before `up`.
## Install troubleshooting
| Symptom | Likely cause | Action |
|---|---|---|
| `Docker is not available` | Daemon stopped or not installed | Start Docker Engine; add user to `docker` group or rely on installer's `sudo docker` path |
| `Docker Compose not found` | Missing Compose plugin | Install `docker-compose-plugin` (Compose v2) |
| `/readyz` → `docker info: exit status 1` | Socket not reachable from `sandboxd` container | Confirm `docker compose ps`, daemon running, socket mount in `docker-compose.yml` |
| Preview URLs timeout locally | Wrong port or domain | Match `HTTP_PORT` and `PREVIEW_DOMAIN`; use `Host` header when curling Traefik directly |
| Base image build slow | Cold cache | Expected on first run; subsequent `install.sh` runs reuse layers |
| Workspace seed errors on create | `userns-remap` on daemon | Keep default `SANDBOXED_USERNS=host` (compose passes it to `sandboxd`) |
```bash
docker compose logs -f sandboxd
docker compose ps
```
## Related pages
<CardGroup>
<Card title="Overview" href="/overview">
What the stack exposes after install: API, Traefik previews, and the shortest create → task → preview path.
</Card>
<Card title="Quickstart" href="/quickstart">
Copy-paste sandbox creation, agent task submission, SSE events, and opening a preview URL.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
Full `.env` / compose environment keys beyond the install-time subset.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Deeper diagnosis for readyz failures, port conflicts, ULID validation, and warming-page stalls.
</Card>
<Card title="Uninstall and maintenance" href="/uninstall-maintenance">
`uninstall.sh` flags, workspace retention, and stack maintenance commands.
</Card>
</CardGroup>
---
## 03. Quickstart
> Copy-paste flow: POST /sandbox with ports, POST /v1/sandboxes/{id}/tasks, stream SSE events, open s-{id}-{port}.preview.{domain}, and optional env injection for provider keys.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/03-quickstart.md
- Generated: 2026-06-04T22:41:36.208Z
### Source Files
- `README.md`
- `AGENTS.md`
- `control-plane/internal/api/api.go`
- `control-plane/internal/api/v1_tasks.go`
- `control-plane/internal/traefik/traefik.go`
- `traefik/dynamic/wake.yml`
---
title: "Quickstart"
description: "Copy-paste flow: POST /sandbox with ports, POST /v1/sandboxes/{id}/tasks, stream SSE events, open s-{id}-{port}.preview.{domain}, and optional env injection for provider keys."
---
The default integration path is three HTTP calls against `sandboxd` on `SANDBOXED_API_BIND` (default `http://127.0.0.1:9090`): create an isolated sandbox with `POST /sandbox` and exposed ports, submit a headless coding task with `POST /v1/sandboxes/{id}/tasks`, then open the Traefik-registered preview host `s-{id}-{port}.preview.{PREVIEW_DOMAIN}` while streaming task progress from `GET /v1/sandboxes/{id}/tasks/{taskId}/events`.
<Note>
This flow requires a **Linux host** with **Docker Engine** and the **Compose plugin**. Run `./install.sh` from the repo root before the commands below.
</Note>
## Prerequisites
| Requirement | Detail |
|---|---|
| Host OS | Linux with Docker Engine + `docker compose` |
| Install | `./install.sh` — idempotent; copies `.env.example` → `.env`, builds `sandboxed-base:1.0.0`, starts the compose stack |
| API base URL | `http://127.0.0.1:9090` unless you changed `SANDBOXED_API_BIND` in `.env` |
| Auth (local) | `SANDBOXD_API_AUTH_DISABLED=true` by default — no `Authorization` header required |
## Verify the stack
```bash
curl -s http://127.0.0.1:9090/healthz # ok
curl -s http://127.0.0.1:9090/readyz # ready
```
If `readyz` is not `ready`, the control plane cannot reach Docker — see [Troubleshooting](/troubleshooting).
## End-to-end flow
```mermaid
sequenceDiagram
participant Client
participant sandboxd
participant Docker
participant runtimed
participant Traefik
Client->>sandboxd: POST /sandbox {"ports":[3000]}
sandboxd->>Docker: create container s-{id}
Docker-->>Traefik: Docker labels Host(s-{id}-3000.preview.{domain})
Client->>sandboxd: POST /v1/sandboxes/{id}/tasks
alt sandbox stopped
sandboxd->>sandboxd: wake via POST /wake/{id}
end
sandboxd->>runtimed: StartTask (Unix socket)
Client->>sandboxd: GET .../tasks/{taskId}/events (SSE)
runtimed-->>Client: status, message, tool, build, done events
Client->>Traefik: GET s-{id}-3000.preview.{domain}
Traefik->>Docker: proxy to sandbox :3000
```
<Steps>
<Step title="Set API and create a sandbox">
Expose the port your dev server will listen on (commonly `3000`). Omit `id` to auto-generate a ULID.
```bash
API=http://127.0.0.1:9090
ID=$(curl -s -XPOST "$API/sandbox" \
-H 'content-type: application/json' \
-d '{"ports":[3000]}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
echo "sandbox=$ID"
```
</Step>
<Step title="Submit a coding task">
`POST /v1/sandboxes/{id}/tasks` runs **OpenCode** headlessly inside the sandbox via `runtimed`. If the sandbox is **stopped**, `sandboxd` wakes it first (wake-on-task-submit).
```bash
TASK_JSON=$(curl -s -XPOST "$API/v1/sandboxes/$ID/tasks" \
-H 'content-type: application/json' \
-d '{
"prompt": "create a Vite app that shows a todo list and run it on port 3000",
"agent": "opencode"
}')
echo "$TASK_JSON"
TASK_ID=$(echo "$TASK_JSON" | sed -E 's/.*"id":"([^"]+)".*/\1/')
```
</Step>
<Step title="Stream task events (SSE)">
Follow the `events_url` from the task response. Use `curl -N` so the connection stays open.
```bash
curl -N "$API/v1/sandboxes/$ID/tasks/$TASK_ID/events"
```
Resume after disconnect with `Last-Event-ID` or `?since=<index>`.
</Step>
<Step title="Open the live preview">
Once a process listens on the exposed port, Traefik routes the preview hostname to the sandbox container.
| Setting | Default | Preview URL |
|---|---|---|
| `PREVIEW_DOMAIN` | `localhost` | `http://s-{id}-3000.preview.localhost` |
| `HTTP_PORT` | `80` | Append `:{HTTP_PORT}` when not `80` (e.g. `:8088`) |
Modern browsers resolve `*.localhost` to `127.0.0.1` with no DNS setup. A stopped sandbox **wakes** on the first preview request (Traefik catch-all → `sandboxd`).
</Step>
</Steps>
## Create sandbox (`POST /sandbox`)
:::endpoint POST /sandbox
Create an isolated Linux sandbox container with optional preview ports and injected environment variables.
:::
<ParamField body="ports" type="integer[]">
TCP ports to expose via Traefik. Each port gets a router `s-{id}-{port}` with `Host(\`s-{id}-{port}.preview.{PREVIEW_DOMAIN}\`)` at priority **100** (above the wake catch-all at priority **1**).
</ParamField>
<ParamField body="id" type="string">
Optional ULID. Omit to auto-generate. Non-ULID values return `400` with `id must be a ULID`.
</ParamField>
<ParamField body="env" type="object">
Key/value map injected at container create via `docker run --env`. Visible to `runtimed` and agent processes (e.g. `ANTHROPIC_API_KEY`). Keys must be non-empty; values must not contain `=` or newlines.
</ParamField>
<RequestExample>
```bash
curl -s -XPOST http://127.0.0.1:9090/sandbox \
-H 'content-type: application/json' \
-d '{"ports":[3000]}'
```
</RequestExample>
<ResponseExample>
```json
{
"id": "01HX…",
"status": "running",
"image": "sandboxed-base:1.0.0",
"memory_high": "4G"
}
```
</ResponseExample>
Create is asynchronous: the row may show `creating` until the container and workspace are ready. Poll `GET /sandbox/{id}` if you need to wait before submitting a task.
## Submit task (`POST /v1/sandboxes/{id}/tasks`)
:::endpoint POST /v1/sandboxes/{id}/tasks
Start a headless coding agent task in the sandbox. Returns **202 Accepted** with `events_url`.
:::
<ParamField body="prompt" type="string" required>
Natural-language instruction for the agent.
</ParamField>
<ParamField body="agent" type="string">
Defaults to `opencode`. Only `opencode` is supported in this release; other values return `400 invalid_request`.
</ParamField>
<ResponseField name="id" type="string">
Task ULID.
</ResponseField>
<ResponseField name="status" type="string">
Initial value `running`.
</ResponseField>
<ResponseField name="events_url" type="string">
Relative path, e.g. `/v1/sandboxes/{id}/tasks/{taskId}/events`.
</ResponseField>
| Condition | HTTP | Code |
|---|---|---|
| Sandbox not found | 404 | `not_found` |
| Sandbox not `running` after wake attempt | 409 | `conflict` |
| Empty `prompt` | 400 | `invalid_request` |
| Task already in progress in runtimed | 409 | `task_in_progress` |
| runtimed unreachable | 502 | `sandbox_unavailable` |
Fetch the durable result later with `GET /v1/sandboxes/{id}/tasks/{taskId}` (works after the sandbox stops or is destroyed).
## Stream events (SSE)
:::endpoint GET /v1/sandboxes/{id}/tasks/{taskId}/events
Server-Sent Events stream proxied from in-sandbox `runtimed`. `Content-Type: text/event-stream`.
:::
Each event line uses the form:
```text
id: <n>
event: <type>
data: <json>
```
| Event type | Role |
|---|---|
| `status` | Phase updates from runtimed |
| `message` | Agent text (provider-derived, best-effort) |
| `tool` | Tool invocations (best-effort) |
| `build` | Build step progress |
| `done` | Terminal event; `data` carries the canonical `TaskResult` |
Resume options:
- Header `Last-Event-ID: <n>` — resumes after event `n`
- Query `?since=<index>` — start at a given index
```bash
curl -N "http://127.0.0.1:9090/v1/sandboxes/$ID/tasks/$TASK_ID/events"
```
Cancel an in-flight task: `POST /v1/sandboxes/{id}/tasks/{taskId}/cancel`.
## Preview URL
Hostname pattern (from Traefik Docker labels):
```text
s-{ulid}-{port}.preview.{PREVIEW_DOMAIN}
```
| Variable | Default | Effect |
|---|---|---|
| `PREVIEW_DOMAIN` | `localhost` | DNS suffix for preview hosts |
| `HTTP_PORT` | `80` | Host port Traefik binds; non-80 URLs need `:HTTP_PORT` |
| `PREVIEW_ENTRYPOINT` | `web` | Traefik entrypoint (`websecure` + `PREVIEW_TLS=true` in production) |
Local example (default `.env`):
```text
http://s-01HXABC-3000.preview.localhost
```
With `HTTP_PORT=8088`:
```text
http://s-01HXABC-3000.preview.localhost:8088
```
Test from the shell without a browser:
```bash
curl -s -H "Host: s-$ID-3000.preview.localhost" "http://127.0.0.1:${HTTP_PORT:-80}/"
```
<Warning>
A warming page (`Spinning up your app…`) appears when the sandbox is stopped and waking, or when nothing is listening on the requested port yet. Wait for the task to start the dev server, then reload.
</Warning>
## Inject provider keys at create
Inject API keys once at sandbox create so both the tasks API and any `exec` shell see them:
```bash
ID=$(curl -s -XPOST "$API/sandbox" \
-H 'content-type: application/json' \
-d '{
"ports": [3000],
"env": {"ANTHROPIC_API_KEY": "sk-ant-..."}
}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
curl -s -XPOST "$API/v1/sandboxes/$ID/tasks" \
-H 'content-type: application/json' \
-d '{"prompt":"build a Vite todo app and run it on port 3000","agent":"opencode"}'
```
OpenCode ships in the base image and can run on its default plan without a key; inject your own key to bill against your provider account. The design is provider-neutral — use whichever key names your chosen agent expects.
<Tip>
`POST /v1/sandboxes` is the multi-tenant create path (requires `project.id` and `project.user_id`) and always exposes port `3000`. For the shortest OSS quickstart, prefer `POST /sandbox` with explicit `ports`.
</Tip>
## Complete copy-paste script
```bash
API=http://127.0.0.1:9090
HTTP_PORT="${HTTP_PORT:-80}"
PREVIEW_DOMAIN="${PREVIEW_DOMAIN:-localhost}"
PORTSUFFIX=""
[ "$HTTP_PORT" != "80" ] && PORTSUFFIX=":$HTTP_PORT"
ID=$(curl -s -XPOST "$API/sandbox" -H 'content-type: application/json' \
-d '{"ports":[3000]}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
echo "sandbox=$ID"
TASK_JSON=$(curl -s -XPOST "$API/v1/sandboxes/$ID/tasks" -H 'content-type: application/json' -d '{
"prompt":"create a Vite app that shows a todo list and run it on port 3000",
"agent":"opencode"
}')
TASK_ID=$(echo "$TASK_JSON" | sed -E 's/.*"id":"([^"]+)".*/\1/')
echo "task=$TASK_ID"
echo "events: $API/v1/sandboxes/$ID/tasks/$TASK_ID/events"
echo "preview: http://s-$ID-3000.preview.$PREVIEW_DOMAIN$PORTSUFFIX"
curl -N "$API/v1/sandboxes/$ID/tasks/$TASK_ID/events"
```
## Related pages
<CardGroup>
<Card title="Installation" href="/installation">
Prerequisites, `./install.sh`, `.env` bootstrap, and health checks in depth.
</Card>
<Card title="Run coding agents" href="/run-coding-agents">
Task lifecycle, runtimed contract, wake-on-submit, and SSE semantics.
</Card>
<Card title="Preview URL reference" href="/preview-url-reference">
Hostname rules, port suffixes, and localhost vs production HTTPS.
</Card>
<Card title="Build a todo app with an agent" href="/example-agent-todo">
End-to-end recipe with verification steps.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
readyz failures, port 80 conflicts, warming-page stalls, and compose logs.
</Card>
</CardGroup>
---
## 04. Sandbox lifecycle
> SQLite-backed status machine (creating, running, stopped, error), container naming (s-{ulid}), reconcile-on-boot, and destroy vs purge semantics.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/04-sandbox-lifecycle.md
- Generated: 2026-06-04T22:41:38.707Z
### Source Files
- `control-plane/migrations/0001_init.sql`
- `control-plane/internal/store/writer.go`
- `control-plane/internal/reconcile/reconcile.go`
- `control-plane/internal/api/handlers.go`
- `control-plane/internal/docker/docker.go`
- `ARCHITECTURE.md`
---
title: "Sandbox lifecycle"
description: "SQLite-backed status machine (creating, running, stopped, error), container naming (s-{ulid}), reconcile-on-boot, and destroy vs purge semantics."
---
Every sandbox is a row in SQLite (`sandbox` table) keyed by a ULID. `sandboxd` shells out to Docker for a sibling container named `s-{id}`; Traefik preview hostnames use the same id (`s-{id}-{port}.preview.{domain}`). The database is the source of truth: the boot reconciler converges Docker to SQLite, not the reverse.
## Status machine
Four string statuses are stored on each row. There is no separate enum type in code — callers and SQL use these literals exactly.
| Status | Meaning | Typical `container_id` | Workspace on disk |
|--------|---------|------------------------|-------------------|
| `creating` | Row inserted; loopback provision and `docker run` in progress | `NULL` | Directory created or being seeded |
| `running` | Container up; row records short Docker id and cgroup path | Set (12-char prefix) | Bind-mounted into container |
| `stopped` | Container removed or `docker stop` succeeded; row kept for wake / id-reuse | Often preserved for audit | **Retained** under `workspaces/<id>/` |
| `error` | Create or reconcile failure; `error_message` set | May be partial | Left for operator review; reconciler does not auto-heal |
```mermaid
stateDiagram-v2
[*] --> creating: POST /sandbox\nStore.Create
creating --> running: MarkRunning\n(container up)
creating --> error: abort()\nMarkError
running --> stopped: idle/pressure reaper\nor POST /v1/.../stop
stopped --> running: wake\nMarkRunningWoke
running --> error: egress reconcile failure
creating --> error: reconcile\nstale > 5 min
stopped --> stopped: reconcile\ncontainer missing
error --> [*]: operator fixes\nor purge
stopped --> [*]: DELETE /sandbox\n(row gone, disk kept)
[*] --> [*]: POST .../purge\n(rows + disk gone)
```
### Transitions by component
| From | To | Writer | Trigger |
|------|-----|--------|---------|
| — | `creating` | `Store.Create` | `POST /sandbox` after validation |
| `creating` | `running` | `MarkRunning` | Successful `docker run` + optional egress install |
| `creating` | `error` | `MarkError` via `abort()` | Any failure after row insert (provision, run, inspect, egress) |
| `running` | `stopped` | `MarkStoppedAt` | Idle reaper, pressure reaper, or `POST /v1/sandboxes/{id}/stop` |
| `stopped` | `running` | `MarkRunningWoke` | Wake handler after `docker start` |
| `running` / `creating` / `stopped` | `stopped` | `MarkStopped` | Reconciler: container missing or not running |
| `creating` | `error` | `MarkError` | Reconciler: row in `creating` older than 5 minutes |
| `running` | `error` | `MarkError` | Reconciler: egress repopulation failed (when egress enabled) |
`MarkRunning` and `MarkRunningWoke` clear `error_message`. `MarkStoppedAt` also sets `stopped_at` (Unix seconds) for audit; the idle path uses this, while reconciler `MarkStopped` does not set `stopped_at`.
<Note>
Wake refuses `creating` and `error` rows. For `running`, it returns success so the client can refresh onto the live Traefik route. Only `stopped` rows proceed through admission and `docker start`.
</Note>
## Identity and container naming
<ParamField body="id" type="ULID" required>
Primary key on `sandbox.id`. Omit on create to auto-generate; if supplied, must parse as a ULID (`400` with `id must be a ULID` otherwise).
</ParamField>
Docker containers use a fixed prefix:
```text
Container --name / --hostname: s-{ulid}
Example: s-01ARZ3NDEKTSV4RRFFQ69G5FAV
Preview Host (port 3000): s-01ARZ3NDEKTSV4RRFFQ69G5FAV-3000.preview.localhost
```
`internal/docker.RunSpec.Name` and `Hostname` are both set to `s-` + id on create. All lifecycle paths (`inspect`, `exec`, `stop`, `rm`, wake `start`) use that name.
The row also stores `workspace_img` and `workspace_mnt` paths (historical loopback naming). In the OSS directory layout, both resolve to `SANDBOXED_DATA_DIR/workspaces/<id>/`.
## Create path
<Steps>
<Step title="Insert row">
`POST /sandbox` inserts `status='creating'` with `container_id` and `cgroup_path` NULL, plus `sandbox_port` rows for each exposed port.
</Step>
<Step title="Provision workspace">
`Loopback.Provision` (or `ProvisionFromTemplate`) creates `workspaces/<id>/` and seeds from the base image skeleton when the directory is empty. Idempotent for id-reuse.
</Step>
<Step title="Run container">
`docker run -d` with hardened flags, Traefik labels, env injection, and bind mount `workspaces/<id>:/home/sandbox`.
</Step>
<Step title="Mark running">
`MarkRunning` records container id and cgroup path; optional egress adds bridge IP to nftables and `container_ip` on the row. `BumpLastActive` seeds activity for the idle reaper.
</Step>
</Steps>
On any failure after the row exists, `abort()` runs: best-effort `docker rm s-{id}`, `Loopback.Release` (no-op for directory storage), and `MarkError` with a diagnostic message.
## Reconcile on boot
`reconcile.Once` runs **once** at process startup, before the HTTP listener accepts traffic. It implements: **SQLite is truth; Docker is converged to match.**
```text
sandboxd main
│
├─ store.Open + migrations
├─ BackfillRunningActivity (legacy last_active_at)
├─ reconcile.Once ◄── blocks listener until complete
└─ HTTP server
```
For each row in `running`, `creating`, or `stopped`:
1. **Loopback** — If workspace data exists and status ≠ `error`, `Provision` re-establishes storage (directory storage: ensure mount path exists; idempotent).
2. **Inspect `s-{id}`** — Missing container → `MarkStopped`, clear `container_ip`. Present but not running → `MarkStopped`. Running → `MarkRunning` + re-apply `memory.high` when enabled + refresh egress IP when configured.
3. **Stale `creating`** — Older than 5 minutes → `MarkError` with `interrupted while creating; reconciler timeout`; mount left for manual review.
Orphan detection **logs only** — it does not delete:
| Orphan type | Detection | Action |
|-------------|-----------|--------|
| Container `s-*` | No matching sandbox row | Warn + metric |
| Mount under workspaces | No matching row (legacy loopback builds) | Warn; OSS reports zero mounts |
| `workspace_owner` without workspace dir | `.img`/directory missing | Warn; manual disposition |
<Warning>
The reconciler never auto-recreates a missing container for a `stopped` row and never adopts orphan containers into SQLite. Wake or an explicit create path must bring compute back.
</Warning>
## Stop vs destroy vs purge
Stopping frees RAM but keeps metadata and disk. Destroy removes the container and DB row but keeps the workspace for id-reuse. Purge is irreversible end-to-end deletion.
| Operation | HTTP | Container | SQLite row | `workspace_owner` | Workspace dir | Snapshots dir |
|-----------|------|-----------|------------|-------------------|---------------|---------------|
| **Stop** | `POST /v1/sandboxes/{id}/stop` | `docker stop` (10s) | `running` → `stopped`, sets `stopped_at` | Unchanged | Kept | Kept |
| **Destroy (soft)** | `DELETE /sandbox/{id}` | `docker rm` | Row deleted (`Store.Delete`) | **Survives** | **Kept** | Kept |
| **Purge (hard)** | `POST /sandbox/{id}/purge` | stop + `docker rm` if present | `PurgeSandbox` (row + owner) | Deleted | `RemoveAll` workspace | `RemoveAll` `_snapshots/<id>/` if configured |
### Destroy — `DELETE /sandbox/{id}`
Audit action: `sandbox.destroy`. Response: `204 No Content`.
Order matters when egress is enabled: remove nftables source **before** `docker rm`. Then `Loopback.Release` (no-op for directory workspaces) and `Store.Delete`. Ports cascade via foreign key.
<Info>
**Id-reuse:** A workspace directory can exist without a row (for example after destroy). `POST /sandbox` with the same `id` is allowed if no row exists; Phase 8 checks `workspace_owner` so `external.user_id` matches before reattaching.
</Info>
### Purge — `POST /sandbox/{id}/purge`
Shared implementation: `purgeOne` (also used by `POST /external-users/{id}/purge` and `POST /external-projects/{id}/purge`).
<ResponseExample>
```json
{
"purged": true,
"freed_bytes": 12345678
}
```
</ResponseExample>
Holds per-id lock for the whole operation. Returns `500` on first failure in scope purges (partial purges stay purged; caller retries).
### v1 `DELETE /v1/sandboxes/{id}`
Public destroy delegates to **purge**, not soft delete:
```text
DELETE /v1/sandboxes/{id} → POST /sandbox/{id}/purge → 204
```
v1 “delete project sandbox” means remove disk and ownership binding, not preserve workspace for reuse.
## Persistence model
```text
SANDBOXED_DATA_DIR/
├── state/sandboxd.db ← SQLite (WAL), single-writer goroutine
└── workspaces/<ulid>/ ← bind mount; survives stop & soft destroy
```
| Column | Role while `creating` | While `running` | While `stopped` / `error` |
|--------|----------------------|-----------------|---------------------------|
| `container_id` | NULL | Short Docker id | Often retained |
| `cgroup_path` | NULL | Relative cgroup path | Often retained |
| `container_ip` | — | Bridge IP (egress builds) | Cleared on stop/reconcile |
| `error_message` | NULL until failure | NULL | Set on `error` |
| `last_active_at` | Bumped at create | Bumped by traffic/exec/wake | Used by idle logic |
| `stopped_at` | NULL | NULL | Set by explicit stop / reapers |
| `keepalive_until` | — | Idle reaper skips while > now | — |
`workspace_owner` (Phase 8) maps `sandbox_id` → upstream `external_user_id` and survives soft destroy so reattach and snapshot authorization stay consistent; only `PurgeSandbox` deletes it.
## API surface (lifecycle)
| Method | Path | Effect on lifecycle |
|--------|------|---------------------|
| `POST` | `/sandbox` | Create row → `creating` → `running` |
| `GET` | `/sandbox/{id}`, `/sandboxes` | Read row + optional live `docker inspect` |
| `DELETE` | `/sandbox/{id}` | Soft destroy (workspace kept) |
| `POST` | `/sandbox/{id}/purge` | Hard purge |
| `POST` | `/v1/sandboxes/{id}/stop` | `running` → `stopped` (409 if task active) |
| `DELETE` | `/v1/sandboxes/{id}` | Hard purge via delegation |
| `POST` | `/sandbox/{id}/keepalive` | Extends `keepalive_until` (idle exemption) |
Preview wake does not change status by itself until `docker start` succeeds; see wake and idle docs for admission and reaper interaction.
## Operator signals
| Symptom | Likely row state | What to check |
|---------|------------------|---------------|
| Preview shows “warming” forever | `stopped` waking, or nothing listening on port | `GET /sandbox/{id}`, container logs |
| `id must be a ULID` | — | Pass valid ULID or omit `id` |
| Create stuck | `creating` > 5 min → reconciler sets `error` | `error_message`, workspace dir, `docker ps -a --filter name=s-` |
| Row exists, create 409 | Prior row not destroyed | `DELETE /sandbox/{id}` or purge |
| Orphan `s-*` container | No SQLite row | Manual `docker rm`; reconciler will not remove it |
Startup logs include reconcile summary: `rows`, `reapplied`, `stopped`, `errored`, `orphan_containers`, `orphan_mounts`.
## Related pages
<CardGroup>
<Card title="Workspaces and isolation" href="/workspaces-persistence">
Per-sandbox directories, seeding, bind mounts, and what survives stop vs purge.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
How sandboxes move between `running` and `stopped` without deleting rows or disk.
</Card>
<Card title="Manage sandboxes" href="/sandbox-operations">
Operational workflows: create, exec, keepalive, stop, destroy, purge, and claim.
</Card>
<Card title="Control plane API (legacy)" href="/legacy-api-reference">
`/sandbox*` routes including destroy, purge, and health endpoints.
</Card>
<Card title="v1 API reference" href="/v1-api-reference">
`POST /v1/sandboxes/{id}/stop` and `DELETE /v1/sandboxes/{id}` (purge semantics).
</Card>
<Card title="Uninstall and maintenance" href="/uninstall-maintenance">
What uninstall scripts remove vs retained workspaces and SQLite state.
</Card>
</CardGroup>
---
## 05. Preview routing
> Traefik Docker labels, Host rules (s-{id}-{port}.preview.{domain}), router priority 100 vs wake catch-all priority 1, PREVIEW_DOMAIN/ENTRYPOINT/TLS, and sandboxed.managed constraint.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/05-preview-routing.md
- Generated: 2026-06-04T22:42:35.609Z
### Source Files
- `control-plane/internal/traefik/traefik.go`
- `traefik/traefik.yml`
- `traefik/dynamic/wake.yml`
- `traefik/dynamic/auth.yml`
- `docker-compose.yml`
- `.env.example`
---
title: "Preview routing"
description: "Traefik Docker labels, Host rules (s-{id}-{port}.preview.{domain}), router priority 100 vs wake catch-all priority 1, PREVIEW_DOMAIN/ENTRYPOINT/TLS, and sandboxed.managed constraint."
---
Traefik is the edge router: each sandbox container carries Docker labels that register one Host-matched router and service per exposed port, while a file-provider catch-all at priority 1 forwards stopped-sandbox preview traffic to `sandboxd` on the internal network. `sandboxd` builds the label set in `control-plane/internal/traefik` at create time and attaches it with `docker run --label`; Traefik’s docker provider only considers containers labelled `sandboxed.managed=true`.
## Preview hostname and router naming
Every exposed port gets a dedicated Traefik router and service whose names are identical: `s-{id}-{port}` (for example `s-01ARZ3NDEKTSV4RRFFQ69G5FAV-3000`). The Host rule targets the browser-facing hostname:
```text
s-{id}-{port}.preview.{PREVIEW_DOMAIN}
```
| Piece | Source | Example (defaults) |
| --- | --- | --- |
| Sandbox id | ULID from create (uppercase in DB) | `01ARZ3NDEKTSV4RRFFQ69G5FAV` |
| Port | `ports` array on `POST /sandbox` | `3000` |
| Domain | `PREVIEW_DOMAIN` on `sandboxd` | `localhost` |
| Full host | `Host(\`s-{id}-{port}.preview.{domain}\`)` | `s-01ARZ3NDEKTSV4RRFFQ69G5FAV-3000.preview.localhost` |
When `HTTP_PORT` is not `80`, append `:{HTTP_PORT}` to the URL (browsers still resolve `*.localhost` to `127.0.0.1`). The wake handler and forward-auth parsers accept an optional `:port` suffix on the Host header.
<Note>
If `ports` is empty or omitted at create, `traefik.Labels` returns `nil` and the sandbox has no Traefik exposure. Combined with `exposedByDefault: false`, the container is invisible to the edge until you recreate it with ports.
</Note>
## Docker provider: `sandboxed.managed` constraint
Static Traefik config scopes the docker provider so only stack-owned sandboxes become routes:
| Setting | Value | Effect |
| --- | --- | --- |
| `providers.docker.exposedByDefault` | `false` | Containers need `traefik.enable=true` |
| `providers.docker.constraints` | ``Label(`sandboxed.managed`,`true`)`` | Ignore unrelated labelled containers on the shared daemon |
| `providers.docker.network` | `sandboxed_net` (from `SANDBOXED_NETWORK`) | Backend targets use the compose network IP |
Every sandbox `docker run` emits:
```text
traefik.enable=true
sandboxed.managed=true
```
Sandboxes join `${SANDBOXED_NETWORK:-sandboxed_net}` so Traefik and the sandbox share L3 reachability on that network.
## Per-port label contract
`traefik.Labels(id, ports, domain, visibility, entrypoint, tls)` emits two base labels plus four lines per port (five with TLS, six for `visibility=private`):
| Label key pattern | Value | Role |
| --- | --- | --- |
| `traefik.http.routers.{router}.rule` | `Host(\`s-{id}-{port}.preview.{domain}\`)` | Match preview hostname |
| `traefik.http.routers.{router}.entrypoints` | `PREVIEW_ENTRYPOINT` (default `web`) | `:80` or `:443` entry |
| `traefik.http.routers.{router}.priority` | `100` | Beats wake catch-all |
| `traefik.http.services.{router}.loadbalancer.server.port` | container port | Upstream port inside sandbox |
| `traefik.http.routers.{router}.tls` | `true` when `PREVIEW_TLS=true` | TLS on router; no per-router `certresolver` |
| `traefik.http.routers.{router}.middlewares` | `sandbox-preview-auth@file` | Only when `visibility=private` |
`entrypoint` defaults to `web` when empty. TLS routers rely on a single wildcard `*.preview.{domain}` certificate in Traefik’s default TLS store (operator-supplied in production), not per-host ACME on each sandbox.
<RequestExample>
```bash
# Create with port 3000 — labels attached at docker run
curl -s -X POST http://127.0.0.1:9090/sandbox \
-H 'content-type: application/json' \
-d '{"ports":[3000]}'
```
</RequestExample>
<ParamField body="PREVIEW_DOMAIN" type="string">
Hostname suffix after `.preview.`. Default `localhost` (no DNS). Production: real wildcard domain (for example `yourdomain.com`).
</ParamField>
<ParamField body="PREVIEW_ENTRYPOINT" type="string">
Traefik entrypoint name on preview routers. Default `web` (`:80`). Production TLS: `websecure` after enabling `:443` in `traefik/traefik.yml`.
</ParamField>
<ParamField body="PREVIEW_TLS" type="boolean">
When `true`, adds `traefik.http.routers.{router}.tls=true` on every preview router. Default `false`.
</ParamField>
## Priority 100 vs wake catch-all priority 1
Two router layers compete on the same Host shape:
| Router | Provider | Priority | When it matches |
| --- | --- | --- | --- |
| `s-{id}-{port}` | Docker (labels on running container) | `100` | Container running and Traefik has observed labels |
| `sandbox-wake` | File (`traefik/dynamic/wake.yml`) | `1` | No higher-priority router for that Host |
The catch-all rule is domain-agnostic:
```text
HostRegexp(`^s-[0-9A-Za-z]+-[0-9]+\.preview\..+
sandboxed Documentation · Grok Docs
)
```
It forwards to `http://sandboxd:9000` with `passHostHeader: true`. `sandboxd`’s `hostDispatch` middleware inspects the Host header: if it matches `^s-([0-9A-Za-z]+)-([0-9]+)\.preview\.{PREVIEW_DOMAIN}(?::\d+)?
sandboxed Documentation · Grok Docs
, the request goes to the wake handler (HTML warming page); otherwise it hits the API mux.
```mermaid
sequenceDiagram
participant Browser
participant Traefik
participant Sandbox as s-id container
participant Sandboxd as sandboxd:9000
Note over Browser,Sandboxd: Sandbox stopped — no Docker router
Browser->>Traefik: GET Host s-id-3000.preview.domain
Traefik->>Sandboxd: sandbox-wake priority 1
Sandboxd->>Sandbox: docker start + TCP ready
Sandboxd-->>Browser: 200 warming HTML meta-refresh
Note over Browser,Sandbox: Container running — labels published
Browser->>Traefik: refresh same Host
Traefik->>Sandbox: s-id-3000 priority 100
Sandbox-->>Browser: dev server on :3000
```
After `docker start`, Traefik’s docker provider typically registers the priority-100 route before the browser’s meta-refresh fires, so the second request proxies directly to the app.
<Warning>
`traefik/dynamic/wake.yml` hardcodes `entryPoints: [web]`. For HTTPS-only production, align the wake router’s entrypoints with `PREVIEW_ENTRYPOINT` (for example `websecure`) and publish `:443` in compose — otherwise stopped-sandbox wakes never reach `sandboxd`.
</Warning>
## Static Traefik stack layout
```text
traefik/
traefik.yml # entryPoints, docker+file providers, access log path
dynamic/
wake.yml # sandbox-wake catch-all → sandboxd:9000
auth.yml # sandbox-preview-auth forwardAuth (private sandboxes)
api.yml # optional api.preview.* → sandboxd (priority 100)
```
| File | Router | Priority | Backend |
| --- | --- | --- | --- |
| `wake.yml` | `sandbox-wake` | `1` | `http://sandboxd:9000` |
| `api.yml` | `sandbox-api` | `100` | `http://sandboxd:9000` (Host `api.preview.*`) |
| Docker labels | `s-{id}-{port}` | `100` | sandbox container port |
Access logs land at `${SANDBOXED_LOG_DIR}/traefik-access.log` (bind-mounted to both Traefik and `sandboxd`) so the idle reaper can bump `last_active_at` from `RequestHost`.
## Configuration wiring
Compose passes preview env into `sandboxd`; Traefik reads static YAML and watches `traefik/dynamic/`:
| Variable | Default | Consumed by |
| --- | --- | --- |
| `PREVIEW_DOMAIN` | `localhost` | Label Host rules; wake/forward-auth regex in `sandboxd` |
| `PREVIEW_ENTRYPOINT` | `web` | Label `entrypoints=` |
| `PREVIEW_TLS` | `false` | Label `tls=true` |
| `HTTP_PORT` | `80` | Host publish `:${HTTP_PORT}:80` on Traefik |
| `SANDBOXED_NETWORK` | `sandboxed_net` | `docker run --network` and Traefik `providers.docker.network` |
Production TLS (from README): enable `websecure` in `traefik/traefik.yml`, add a cert resolver or load a wildcard into the default TLS store, set `PREVIEW_DOMAIN`, `PREVIEW_ENTRYPOINT=websecure`, `PREVIEW_TLS=true`, and enable API auth.
## Private previews and forward-auth
Public sandboxes use only the label set above. `visibility=private` adds `traefik.http.routers.{router}.middlewares=sandbox-preview-auth@file`, defined in `traefik/dynamic/auth.yml` as forwardAuth to `http://sandboxd:9000/forward-auth`. Traefik calls that endpoint before proxying to the sandbox; a 2xx allows the request through.
Stopped private sandboxes still hit the wake catch-all; the wake handler runs the same preview-token check as forward-auth before `docker start`.
## Verify routing locally
<Steps>
<Step title="Confirm the edge is up">
```bash
curl -s http://127.0.0.1:9090/healthz
curl -s http://127.0.0.1:9090/readyz
```
</Step>
<Step title="Create a sandbox on port 3000">
```bash
API=http://127.0.0.1:9090
ID=$(curl -s -X POST "$API/sandbox" -H 'content-type: application/json' \
-d '{"ports":[3000]}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
```
</Step>
<Step title="Hit the preview with Host header">
```bash
curl -s -H "Host: s-${ID}-3000.preview.localhost" \
"http://127.0.0.1:${HTTP_PORT:-80}/"
```
Expect either the app (running) or the warming page (stopped, wake path).
</Step>
<Step title="Inspect labels on the container">
```bash
docker inspect "s-${ID}" --format '{{json .Config.Labels}}' | jq .
```
Look for `traefik.http.routers.s-${ID}-3000.rule` and `priority=100`.
</Step>
</Steps>
## Failure modes
| Symptom | Likely cause |
| --- | --- |
| Connection refused on preview URL | Nothing listening on the exposed port inside the sandbox yet |
| Endless warming page | Dev server not bound to the declared port, or Traefik has not registered the Docker router after wake |
| Preview works via API wake but not browser | Host id case mismatch — browser sends lowercase; wake normalizes to uppercase ULID |
| Traefik routes wrong container | Missing `sandboxed.managed=true` or constraint disabled |
| Wake never fires on HTTPS | `wake.yml` still on `web` while previews use `websecure` only |
## Related pages
<CardGroup>
<Card title="Preview URL reference" href="/preview-url-reference">
Hostname pattern, `HTTP_PORT` suffix rules, and localhost vs production HTTPS.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Stop-on-idle, wake admission, warming page behavior, and keepalive.
</Card>
<Card title="Private previews" href="/private-previews">
Forward-auth middleware, preview tokens, and deny modes.
</Card>
<Card title="Production deployment" href="/production-deployment">
Wildcard DNS, `websecure`, cert resolver, and `PREVIEW_TLS=true`.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
Full env key list for preview domain, ports, and network.
</Card>
</CardGroup>
---
## 06. Wake, idle, and pressure
> Stop-on-idle (SANDBOXD_IDLE_THRESHOLD_SECONDS), wake-on-preview (catch-all → sandboxd), memory admission/refusal, pressure reaper, keepalive, and warming-page behavior.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/06-wake-idle-and-pressure.md
- Generated: 2026-06-04T22:42:41.017Z
### Source Files
- `control-plane/internal/wake/handler.go`
- `control-plane/internal/reaper/idle.go`
- `control-plane/internal/reaper/pressure.go`
- `control-plane/cmd/sandboxd/main.go`
- `traefik/dynamic/wake.yml`
- `ARCHITECTURE.md`
---
title: "Wake, idle, and pressure"
description: "Stop-on-idle (SANDBOXD_IDLE_THRESHOLD_SECONDS), wake-on-preview (catch-all → sandboxd), memory admission/refusal, pressure reaper, keepalive, and warming-page behavior."
---
`sandboxd` frees host RAM by stopping idle sandboxes and reclaiming memory under pressure, then restarts containers on the next preview hit or programmatic wake. Traefik’s priority-1 catch-all (`traefik/dynamic/wake.yml`) forwards preview traffic for stopped sandboxes to `sandboxd:9000`, where `internal/wake` runs admission, `docker start`, optional TCP readiness, and HTML warming pages; `internal/reaper` runs the idle and pressure loops configured from `SANDBOXD_*` env vars in `control-plane/cmd/sandboxd/main.go`.
## Activity signals
Idle and pressure reapers only stop sandboxes whose SQLite row has `status='running'` and `last_active_at` older than the cutoff. Activity that postpones a stop comes from four signals:
| Signal | Mechanism | Effect on reapers |
|--------|-----------|-------------------|
| HTTP preview traffic | `activity.Tailer` bumps `last_active_at` from Traefik access logs | Rows stay above idle cutoff |
| WebSocket / SSE | `activity.Poller` (when `SANDBOXD_POLLER_METRIC_RE` is set) bumps `last_active_at` | Same; without poller, widen `SANDBOXD_WAKE_GRACE_SECONDS` |
| In-flight `exec` | `activity.InflightExec` per-id counter | Idle and pressure **skip** that id |
| Explicit keepalive | `keepalive_until` column via `POST /sandbox/{id}/keepalive` | Idle and pressure **skip** while `keepalive_until > now` |
| Running coding task | `SandboxHasRunningTask` | Idle reaper **skips** |
<Note>
The pressure reaper’s **<5% emergency** branch stops the heaviest-RSS cgroup **even if** the sandbox has an in-flight exec or active task. Advisory bands (10–15% and 5–10%) only stop the oldest *idle-running* candidate and never kill active work to defend a soft threshold.
</Note>
## Stop-on-idle
The idle reaper (`internal/reaper/idle.go`) ticks every `SANDBOXD_IDLE_REAP_INTERVAL_SECONDS` (default **30s**). Each tick calls `Store.ListIdleCandidates` with cutoff `now - SANDBOXD_IDLE_THRESHOLD_SECONDS` (default **2100s / 35 min**), ordered by `last_active_at ASC`.
For each candidate it applies skip rules (in-flight exec, `keepalive_until`, running task), then `docker stop` with a 10s grace, optional egress IP cleanup, and `MarkStoppedAt`. The workspace bind mount is untouched; the next preview or API wake runs `docker start` again.
Set `SANDBOXD_IDLE_REAP_INTERVAL_SECONDS` to **0** to disable the loop entirely.
```bash
# Postpone automatic idle stop for 2 hours (unix seconds)
curl -s -XPOST http://127.0.0.1:9090/sandbox/$ID/keepalive \
-H 'content-type: application/json' \
-d '{"until":'$(date -u -v+2H +%s)'}'
```
<ParamField body="until" type="integer" required>
Unix timestamp (seconds). Capped to `now + SANDBOXD_KEEPALIVE_MAX_SECONDS` (default **86400** / 24h). Must be in the future.
</ParamField>
## Wake-on-preview routing
Running sandboxes register Traefik Docker labels with **priority 100**. The file-provider catch-all uses **priority 1** so it only matches when no live per-sandbox router exists:
```text
Browser → s-{ULID}-{port}.preview.{PREVIEW_DOMAIN}
│
├─ container running → Traefik priority-100 router → dev server :port
│
└─ container stopped → catch-all (wake.yml) → sandboxd:9000
hostDispatch → ServeCatchAll
```
`hostDispatch` in `main.go` inspects the `Host` header with the same regex as `wake.Handler` (`^s-([0-9A-Za-z]+)-([0-9]+)\.preview\.{domain}`). Browser hosts are uppercased to canonical ULID before DB lookup.
```mermaid
sequenceDiagram
participant Browser
participant Traefik
participant sandboxd
participant Docker
participant DevServer
Browser->>Traefik: GET preview Host (stopped sandbox)
Traefik->>sandboxd: catch-all priority 1
sandboxd->>sandboxd: Admit (memory)
sandboxd->>Docker: docker start s-{id}
sandboxd->>DevServer: TCP probe :port (optional)
sandboxd-->>Browser: 200 HTML meta-refresh (2s)
Docker-->>Traefik: labels → priority-100 router
Browser->>Traefik: refresh
Traefik->>DevServer: proxy to container
```
## Wake handler behavior
`internal/wake/handler.go` serves two entry points:
| Entry | Path / trigger | Response |
|-------|----------------|----------|
| Catch-all (HTML) | Any method; preview `Host` via Traefik | Warming / busy / error HTML |
| Programmatic (JSON) | `POST /wake/{id}` on API mux | JSON `{"id","status","wake_duration_ms"}` or error |
**Status handling:**
| Row status | Behavior |
|------------|----------|
| `running` | Success immediately (refresh or JSON) |
| `stopped` | Admission → `docker start` → optional TCP probe → `MarkRunningWoke` |
| `creating` | Error `creating` — do not start under half-built row |
| `error` | Error with `error_message` when set |
| not found | `not_found` |
**Concurrency:** Per-id inflight map deduplicates concurrent wakes; `idlock.Registry` excludes snapshot/restore/destroy races.
**Private sandboxes (HTML only):** Stopped `visibility=private` sandboxes run the same preview-token check as `/forward-auth` before start. Service/operator actors skip the cookie gate (enables wake-on-task-submit). JSON `POST /wake/{id}` is not cookie-gated.
**TCP readiness:** Catch-all path probes `bridgeIP:port` for up to `SANDBOXD_WAKE_TCP_READY_TIMEOUT_SECONDS` (default **8s**). Timeout increments metric `tcp_ready_timeout` but still returns the refresh page — the next browser retry may hit the live route.
## Memory admission and refusal
`wake.Admit` (`internal/wake/admit.go`) is shared by preview wake, `POST /wake/{id}`, and **`POST /sandbox` create**:
1. Read `/proc/meminfo` → `MemAvailable` percent.
2. `cost_pct = SANDBOXD_WAKE_COST_MB (800) / MemTotal × 100`.
3. If `(avail_pct - cost_pct) >= SANDBOXD_MEM_REFUSE_WAKES_PCT (10)` → admit.
4. Else if pressure reaper’s `Refused` flag is set → deny `wakes_refused` (no sync tick).
5. Else run one synchronous `pressure.Tick` (may stop oldest idle sandbox).
6. Re-read meminfo; if still below floor → deny `low_memory`.
HTML denial serves the busy page with `Retry-After: 30` and `X-Retry-After-Reason`. JSON returns **503** with `mem_available_percent` and the same `Retry-After`.
## Host memory pressure reaper
The pressure reaper ticks every `SANDBOXD_PRESSURE_INTERVAL_SECONDS` (default **10s**), reading `MemAvailable` from `/proc/meminfo` (not `MemFree`).
| MemAvailable % | Action |
|----------------|--------|
| ≥ `SANDBOXD_MEM_HEADROOM_PCT` (15) | No stop |
| 10–15 | Stop **one** oldest idle-running sandbox (`reason=memory_pressure`, band `10-15`) |
| 5–10 | Same + set `Refused` → new wakes denied; log warning |
| < 5 | **Emergency:** stop single heaviest `cgroup.memory.current` sandbox, active or idle |
Wake refusal uses hysteresis: `Refused` clears when availability rises to **`RefuseWakesPct + 2`** (default **12%**) to avoid flapping at 10%.
On startup, if `MemAvailable` is already below headroom, `main.go` runs one synchronous pressure tick before the HTTP listener accepts traffic.
```mermaid
stateDiagram-v2
[*] --> Healthy: avail >= 15%
Healthy --> Band1015: avail 10-15%
Band1015 --> Healthy: avail >= 15%
Band1015 --> Band510: avail < 10%
Band510 --> RefusingWakes: Refused=true
RefusingWakes --> Band510: avail >= 12% clears Refused
Band510 --> Emergency: avail < 5%
Emergency --> [*]: stop heaviest RSS once per tick
```
<Warning>
Emergency stops can interrupt active agent work. Monitor `sandboxd_pressure_reaper_stops_total` and `sandboxd_wakes_refused_active` via Prometheus (`GET /metrics`).
</Warning>
## Warming and error pages
`internal/wake/html.go` returns white-label HTML (no sandbox id in body). Machine-readable reasons use response headers:
| Page | HTTP | User-visible title | Headers |
|------|------|-------------------|---------|
| Success / warming | 200 | “Spinning up your app!” | `meta refresh` every **2s** (`RefreshSeconds` in handler config) |
| Admission denied | 503 | “Almost ready…” | `Retry-After`, `X-Retry-After-Reason` (`wakes_refused`, `low_memory`) |
| Other failure | 503 | “We couldn't load your app” | `X-Wake-Error` (`not_found`, `start_failed`, `creating`, …) |
After the refresh, Traefik’s Docker provider usually observes the started container and the priority-100 router serves the dev server directly.
<Info>
If the warming page loops indefinitely, the dev server may not be listening on the exposed port yet, or wake failed — check `X-Wake-Error`, `sandboxd` logs (`component=wake`), and the troubleshooting page.
</Info>
## Programmatic wake and task submit
`POST /wake/{id}` returns JSON on success:
```json
{"id":"<ULID>","status":"running","wake_duration_ms":1234}
```
`POST /v1/sandboxes/{id}/tasks` delegates to the same wake path when `status=stopped` (service-token auth satisfies private-sandbox policy for API callers).
Manual stop without waiting for idle:
:::endpoint POST /v1/sandboxes/{id}/stop
Stop the container now (idempotent if already stopped). Rejects when a runtimed task is active. Next preview hit runs the wake path.
:::
## Configuration
| Variable | Default | Role |
|----------|---------|------|
| `SANDBOXD_IDLE_THRESHOLD_SECONDS` | `2100` | Idle window before `docker stop` |
| `SANDBOXD_IDLE_REAP_INTERVAL_SECONDS` | `30` | Idle reaper period; `0` disables |
| `SANDBOXD_PRESSURE_INTERVAL_SECONDS` | `10` | Pressure reaper period; `0` disables |
| `SANDBOXD_MEM_HEADROOM_PCT` | `15` | Healthy band floor |
| `SANDBOXD_MEM_REFUSE_WAKES_PCT` | `10` | Wake admission floor + refusal band |
| `SANDBOXD_MEM_EMERGENCY_PCT` | `5` | Emergency RSS kill band |
| `SANDBOXD_WAKE_COST_MB` | `800` | Estimated RAM per wake in admission |
| `SANDBOXD_WAKE_TCP_READY_TIMEOUT_SECONDS` | `8` | TCP probe timeout (catch-all only) |
| `SANDBOXD_WAKE_GRACE_SECONDS` | `60` | Activity grace when poller is in fallback |
| `SANDBOXD_KEEPALIVE_MAX_SECONDS` | `86400` | Max `keepalive` extension |
| `SANDBOXED_SET_MEMORY_HIGH` | `false` | Re-apply cgroup `memory.high` after wake |
Optional activity tuning: `SANDBOXD_ACCESS_LOG`, `SANDBOXD_TAILER_OFFSET`, `SANDBOXD_POLLER_METRIC_RE`, `SANDBOXD_POLLER_URL`, `SANDBOXD_POLLER_INTERVAL_SECONDS`.
<Steps>
<Step title="Verify idle stop">
Create a sandbox, wait longer than `SANDBOXD_IDLE_THRESHOLD_SECONDS` with no preview traffic, then `GET /sandbox/{id}` — expect `status: "stopped"`.
</Step>
<Step title="Verify wake-on-preview">
`curl -H "Host: s-$ID-3000.preview.localhost" http://127.0.0.1:${HTTP_PORT:-80}/` — first response is HTML warming; repeat until the app body appears.
</Step>
<Step title="Verify keepalive">
`POST /sandbox/{id}/keepalive` with a future `until`; confirm the sandbox stays `running` through the idle threshold.
</Step>
</Steps>
## Related pages
<CardGroup>
<Card title="Preview routing" href="/preview-routing">
Traefik priority 100 vs catch-all priority 1, `PREVIEW_DOMAIN`, and label constraints.
</Card>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
SQLite status machine (`running` / `stopped` / `creating` / `error`) and reconcile-on-boot.
</Card>
<Card title="Manage sandboxes" href="/sandbox-operations">
`keepalive`, `POST /v1/sandboxes/{id}/stop`, exec, and purge workflows.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
Full compose-backed env catalog including reaper and wake knobs.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Warming-page stalls, admission denial, and compose log probes.
</Card>
</CardGroup>
---
## 07. Workspaces and isolation
> Per-sandbox bind mounts under SANDBOXED_DATA_DIR/workspaces, skeleton seeding, read-only rootfs and caps, memory/PID limits, userns=host default, and v1 storage trade-offs.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/07-workspaces-and-isolation.md
- Generated: 2026-06-04T22:43:02.341Z
### Source Files
- `control-plane/internal/loopback/loopback.go`
- `image/HOME_LAYOUT.md`
- `image/skel/.profile`
- `control-plane/internal/docker/docker.go`
- `ARCHITECTURE.md`
- `control-plane/migrations/0001_init.sql`
---
title: "Workspaces and isolation"
description: "Per-sandbox bind mounts under SANDBOXED_DATA_DIR/workspaces, skeleton seeding, read-only rootfs and caps, memory/PID limits, userns=host default, and v1 storage trade-offs."
---
Each sandbox’s durable state lives in a host directory under `SANDBOXED_DATA_DIR/workspaces/<id>/`, provisioned once by `sandboxd` (`internal/loopback`), bind-mounted into the container at `/home/sandbox`, while the container itself runs with a hardened `docker run` flag set (read-only rootfs, dropped capabilities, memory and PID ceilings). SQLite still records `workspace_img` and `workspace_mnt` columns for compatibility; in the OSS directory-storage build both resolve to the same path.
## On-disk layout
The data root defaults to `/var/lib/sandboxed` (`SANDBOXED_DATA_DIR`). Per-sandbox trees and control-plane state sit beside each other:
```text
${SANDBOXED_DATA_DIR}/
├── workspaces/
│ └── <ulid>/ # bind-mounted → /home/sandbox in container s-<ulid>
├── state/
│ └── sandboxd.db # SQLite WAL — sandbox rows, ports, tasks
├── _snapshots/ # optional hourly zstd archives (legacy path naming)
├── templates/ # golden templates for fast spin-up
└── library/ # v1 snapshot images when configured
```
<Note>
`docker-compose.yml` bind-mounts `${SANDBOXED_DATA_DIR}` **symmetrically** host→`sandboxd` container. The path `sandboxd` writes must be the same absolute path the host Docker daemon uses for `docker run -v <path>:/home/sandbox`. Do not mount only one side to a different path.
</Note>
| Path | Owner | Survives `docker stop`? | Survives host reboot? |
|------|-------|-------------------------|----------------------|
| `workspaces/<id>/` | Host filesystem | Yes | Yes |
| `state/sandboxd.db` | Host filesystem | Yes | Yes |
| Container writable layer | — | No (`--read-only`) | No |
| `/tmp`, `/var/tmp` inside sandbox | tmpfs | No | No |
## Provisioning and skeleton seeding
`loopback.Manager.Provision` is **idempotent**:
1. `mkdir` `workspaces/<id>/` if missing.
2. If the directory is **empty** (ignoring `lost+found` from older loopback workspaces), run a one-shot seed container from `SANDBOXD_IMAGE` (default `sandboxed-base:1.0.0`) that copies `/opt/sandbox-skel/` into the workspace and `chown`s to `sandbox:sandbox`.
3. If the directory already has content, seeding is skipped — safe for id reuse and reconciler boot passes.
Seeding uses `--user 0` and, by default, `--userns host` on the seed container so root inside the seed maps predictably on hosts with `userns-remap` enabled.
**Template path:** `ProvisionFromTemplate` clones a populated directory with `cp -a` when the workspace is empty (templates under `templates/`, or library snapshots via the v1 API).
**Id reuse:** A workspace directory may exist without a SQLite row. `POST /sandbox` can attach to that directory if no row exists; Phase 8 checks `workspace_owner` so another `external_user_id` cannot resurrect the id.
## In-container home contract
Inside the running container, user state is entirely under `/home/sandbox` (the bind mount). Project code is expected at **`/home/sandbox/workspace`** (and often `workspace/app/` for agent flows).
```text
/home/sandbox/
├── workspace/ # user project — dev servers, git, builds
├── .bashrc, .profile
├── .config/ # agent/tool configs
├── .cache/ # pnpm, pip, uv, bun caches (persistent)
├── .local/, .bun/
└── .runtimed/ # runtimed Unix socket (platform-reserved)
```
`image/skel/.profile` sources `/etc/profile.d/sandbox-env.sh` and `~/.bashrc`. The runtime entrypoint does **not** re-seed; users restore dotfiles manually from `/opt/sandbox-skel/` if needed.
`runtimed` binds its control socket at `/home/sandbox/.runtimed/sock`, visible on the host at `workspaces/<id>/.runtimed/sock`. The v1 files API refuses writes under `.runtimed/` and `lost+found/`.
## Persistence operations
| API | Effect on container | Effect on `workspaces/<id>/` | SQLite row |
|-----|---------------------|------------------------------|------------|
| `POST /v1/sandboxes/{id}/stop` | `docker stop` | Kept | Kept (`stopped`) |
| `DELETE /sandbox/{id}` | `docker rm` | **Kept** | Deleted |
| `POST /sandbox/{id}/purge` | Stop + remove | **`RemoveAll`** | Purged (+ `_snapshots/<id>/`) |
Backup a workspace by copying its directory; backup control-plane state with `state/sandboxd.db` (WAL files alongside it if present).
<Warning>
`DELETE` removes the DB row but leaves disk. A later create with the same ULID can reattach only if ownership rules allow it. Use `purge` when you intend to free disk and erase tenant data.
</Warning>
## Container isolation flag set
`sandboxd` passes an explicit `docker.RunSpec` on create — no hidden defaults inside `internal/docker`:
| Docker flag | Value at create | Role |
|-------------|-----------------|------|
| `--read-only` | `true` | No persistent writes outside mounts/tmpfs |
| `--cap-drop` | `ALL` | Drop all capabilities |
| `--security-opt` | `no-new-privileges` | Block privilege escalation |
| `--memory` / `--memory-swap` | `10g` / `10g` | Hard RSS ceiling per sandbox |
| `--pids-limit` | `1024` | Process count cap |
| `--cpu-shares` | `100` | Relative CPU weight |
| `--ulimit` | `nofile=65536:65536` | FD limit |
| `--tmpfs` | `/tmp:size=512m`, `/var/tmp:size=128m` | Ephemeral writable dirs |
| `-v` | `<workspace>:/home/sandbox` | Only durable writable tree |
<Info>
**Threat model:** Isolation targets **authenticated, accountable users running their own code**, not anonymous multi-tenant adversaries. Hardened containers mitigate misconfiguration and casual abuse; kernel escape against a determined attacker is a host-patch and trust-boundary problem — use VM-per-tenant or stronger runtimes if you need that bar.
</Info>
## Memory policy: hard ceiling vs soft throttle
Two layers apply:
| Layer | Default | Config | Behavior |
|-------|---------|--------|----------|
| Hard limit | `--memory=10g` | Fixed in create handler | OOM kill at ceiling |
| Soft throttle | `memory_high` = `4G` on row | Request field `memory_high`; DB default `4G` | cgroup v2 `memory.high` write |
`SANDBOXED_SET_MEMORY_HIGH=false` by default. When false, create/wake/reconcile skip writing `memory.high`; the 10g hard limit still applies. When true, `cgroup.SetMemoryHigh` discovers the path via `/proc/<pid>/cgroup` — failures are logged but do not fail create/wake.
Host memory pressure and wake admission are separate concerns (idle/pressure reapers, `SANDBOXD_MEM_*` knobs).
## User namespaces (`SANDBOXED_USERNS`)
| Component | Default | Purpose |
|-----------|---------|---------|
| `sandboxd`, Traefik | `userns_mode: host` in compose | Infrastructure can access Docker socket and data dir on remapped daemons |
| Sandboxes + seed | `SANDBOXED_USERNS=host` → `--userns host` | Deterministic ownership on workspace bind mounts |
Set `SANDBOXED_USERNS=` to empty to omit `--userns` on sandbox containers and seed runs, opting back into the Docker daemon’s default user namespace mode.
On `userns-remap` hosts, keeping `host` avoids seed containers mapping root to a high subuid that cannot write host-owned workspace directories — a common source of seed permission errors (see troubleshooting).
## v1 storage trade-offs
sandboxed v1 optimizes for **single-host, Docker-only install** with no extra host modules:
| Area | v1 choice | Trade-off | Hardening direction |
|------|-----------|-----------|---------------------|
| Workspace storage | Plain directory per sandbox | **No per-workspace disk quota** — shared host filesystem | Filesystem quotas, dedicated volumes, sharding |
| Snapshots (hourly) | zstd of path `workspaces/<id>.img` in `internal/snapshot` | Legacy `.img` naming; expects a **file** at that path — mismatched with directory workspaces in OSS | Use directory-aware snapshot backend or v1 library snapshots |
| v1 `POST /v1/snapshots` | `cp --reflink=auto` workspace → `library/<snapId>.img` | Works with directory trees on reflink-capable FS; `.img` suffix is naming legacy | Prefer this path for templates; verify `SANDBOXD_LIBRARY_DIR` |
| Templates | `cp -a` clone into empty workspace | Fast cold start; golden template git untouched | — |
| Egress | Default allow (OSS) | No per-sandbox egress logging | Host firewall / proxy |
| Control plane | Docker socket access | Root-equivalent on host | Dedicated VM, auth, network isolation |
ARCHITECTURE.md documents these as conscious v1 compromises, not oversights.
## Writing into workspaces from outside
Integrators can inject files without `exec`:
:::endpoint PUT /v1/sandboxes/{id}/files?path=<relative>
Atomic write under the workspace mount root. Body: `{"path","content","append"}`. 25 MiB cap; path traversal and `.runtimed/` blocked; atomic rename; optional chown to mount owner uid/gid.
:::
List/read use `GET /v1/sandboxes/{id}/files` and `.../files/content?path=`.
## Verification
<Steps>
<Step title="Inspect workspace on host">
```bash
ls -la "${SANDBOXED_DATA_DIR:-/var/lib/sandboxed}/workspaces/"
```
After create, a new ULID directory should exist with `workspace/`, dotfiles, and caches after first use.
</Step>
<Step title="Confirm bind mount in container">
```bash
API="${SANDBOXED_API_BIND:-http://127.0.0.1:9090}"
ID=<your-ulid>
curl -s -XPOST "$API/sandbox/$ID/exec" -H 'content-type: application/json' \
-d '{"cmd":["bash","-lc","df -h /home/sandbox; mount | grep sandbox"]}'
```
`/home/sandbox` should show the host-bound filesystem, not the read-only root layer.
</Step>
<Step title="Confirm destroy keeps data">
```bash
curl -s -XDELETE "$API/sandbox/$ID"
test -d "${SANDBOXED_DATA_DIR:-/var/lib/sandboxed}/workspaces/$ID" && echo workspace retained
```
</Step>
</Steps>
## Related pages
<CardGroup>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
SQLite status machine, container naming, destroy vs purge semantics, and reconcile-on-boot.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Stop-on-idle preserves workspaces; wake admission uses host memory headroom.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
`SANDBOXED_DATA_DIR`, `SANDBOXED_USERNS`, `SANDBOXED_SET_MEMORY_HIGH`, and related compose env keys.
</Card>
<Card title="v1 API reference" href="/v1-api-reference">
Files CRUD, snapshots-as-templates, tasks, and error envelopes.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
userns-remap seed errors, warming-page stalls, and workspace permission failures.
</Card>
<Card title="Uninstall and maintenance" href="/uninstall-maintenance">
`--data` / `--all` flags, workspace retention defaults, and backup paths.
</Card>
</CardGroup>
---
## 08. Run coding agents
> Submit tasks via POST /v1/sandboxes/{id}/tasks (prompt, agent default opencode), wake-on-submit, SSE on /events, env injection at create, and runtimed socket contract.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/08-run-coding-agents.md
- Generated: 2026-06-04T22:42:58.008Z
### Source Files
- `control-plane/internal/api/v1_tasks.go`
- `control-plane/internal/api/taskwatch.go`
- `control-plane/cmd/runtimed/task.go`
- `control-plane/internal/runtime/client.go`
- `control-plane/migrations/0005_tasks.sql`
- `image/Dockerfile`
---
title: "Run coding agents"
description: "Submit tasks via POST /v1/sandboxes/{id}/tasks (prompt, agent default opencode), wake-on-submit, SSE on /events, env injection at create, and runtimed socket contract."
---
Headless coding agents run inside each sandbox through the v1 tasks API: `sandboxd` accepts `POST /v1/sandboxes/{id}/tasks`, wakes stopped sandboxes when needed, proxies to in-container `runtimed` over a Unix socket at `<workspace>.mnt/.runtimed/sock`, streams progress as Server-Sent Events, and persists canonical results in SQLite so `GET` still works after stop or destroy.
## How tasks fit the stack
Every sandbox container runs `runtimed` as its main process (under `tini`). `sandboxd` never talks to the agent directly; it uses `runtime.Client` to call `runtimed` HTTP routes on the workspace loopback socket. The public v1 layer in `control-plane/internal/api/v1_tasks.go` translates NDJSON from runtimed into SSE for integrators.
```mermaid
sequenceDiagram
participant Client
participant sandboxd as sandboxd (v1_tasks)
participant SQLite
participant runtimed as runtimed (UDS)
participant Agent as opencode
Client->>sandboxd: POST /v1/sandboxes/{id}/tasks
alt sandbox stopped
sandboxd->>sandboxd: POST /wake/{id}
end
sandboxd->>runtimed: POST /tasks (task_id, prompt, agent)
runtimed->>Agent: opencode run --format json
sandboxd->>SQLite: INSERT task (running)
sandboxd-->>Client: 202 Accepted + events_url
par SSE to client
Client->>sandboxd: GET .../tasks/{taskId}/events
sandboxd->>runtimed: GET /tasks/{id}/events?since=
runtimed-->>sandboxd: NDJSON events
sandboxd-->>Client: SSE (id, event, data)
and Background watcher
sandboxd->>runtimed: GET /tasks/{id}/events?since=0
runtimed-->>sandboxd: terminal done event
sandboxd->>SQLite: UPDATE result_json
end
```
<Note>
Provider API keys are not passed on the task body. Inject them at sandbox **create** via `POST /sandbox` `env` so they land in the container environment; `opencode` inherits `os.Environ()` when the agent runs.
</Note>
## Prerequisites
- A running `sandboxd` stack (`GET /healthz` → `ok`, `GET /readyz` → `ready`).
- A sandbox with port **3000** exposed (default app template serves the Vite dev server there).
- For a custom model account, create the sandbox with `env` on `POST /sandbox` (see [Env injection at create](#env-injection-at-create)).
## Submit a task
<Steps>
<Step title="Create or resolve a sandbox">
Use `POST /sandbox` (legacy create) or an existing sandbox id. Expose the preview port your app will use (typically `3000`).
</Step>
<Step title="POST the task">
```bash
API=http://127.0.0.1:9090
ID=<sandbox-ulid>
curl -s -XPOST "$API/v1/sandboxes/$ID/tasks" \
-H 'Content-Type: application/json' \
-d '{
"prompt": "create a Vite todo app and run it on port 3000",
"agent": "opencode"
}'
```
Omit `agent` to use the default **`opencode`**. Only `opencode` is accepted today; any other value returns `400 invalid_request`.
</Step>
<Step title="Stream events">
Use the `events_url` from the response (or build `/v1/sandboxes/{id}/tasks/{taskId}/events`):
```bash
TASK_ID=<task-ulid>
curl -N "$API/v1/sandboxes/$ID/tasks/$TASK_ID/events"
```
</Step>
<Step title="Fetch the canonical result">
Poll until `status` is no longer `running`:
```bash
curl -s "$API/v1/sandboxes/$ID/tasks/$TASK_ID"
```
The full `TaskResult` (files changed, build check, tokens, preview state) is returned once the background watcher has persisted it.
</Step>
</Steps>
### Wake-on-submit
If the sandbox row is **`stopped`**, `v1SubmitTask` calls the internal wake path (`POST /wake/{id}`) before submitting. Wake failures surface as v1 errors (for example `503 sandbox_capacity`). After wake, the sandbox must be **`running`**; other statuses yield `409 conflict`.
<Warning>
Wake-on-submit for **private** sandboxes that require a preview-token cookie on the browser wake path is **not** covered for unauthenticated callers. Service/operator-authenticated API callers skip private wake gating so task submit can still wake private sandboxes.
</Warning>
## Agent selection
| Field | Default | Constraint |
|-------|---------|------------|
| `prompt` | — | Required non-empty string |
| `agent` | `opencode` | Only `opencode` is implemented |
Inside the container, `runtimed` drives `opencode run --format json --dangerously-skip-permissions`, maps stdout NDJSON into canonical `message` and `tool` events, and enforces **one active task per sandbox** (`409 task_in_progress` if another task is running).
The base image also installs **Claude Code** (`claude`); use `POST /sandbox/{id}/exec` for ad hoc CLI runs. The tasks API does not yet expose a `claude` agent adapter.
## Env injection at create
Task submit does not accept an `env` map. Credentials and provider configuration must be present in the **container environment** at create time:
<ParamField body="env" type="object">
Map of environment variables passed to `docker run --env`. Keys must be non-empty and must not contain `=` or newlines; values must not contain newlines. Visible to `runtimed` and any agent process it spawns.
</ParamField>
<RequestExample>
```bash
curl -s -XPOST "$API/sandbox" -H 'Content-Type: application/json' \
-d '{"ports":[3000],"env":{"ANTHROPIC_API_KEY":"sk-ant-..."}}'
```
</RequestExample>
`POST /v1/sandboxes` (project-scoped create) does not currently expose `env`; use legacy `POST /sandbox` when you need key injection before calling the v1 tasks API.
`runtimed`’s `StartTaskRequest` supports optional per-task `env`, but `sandboxd` does not forward it on v1 submit—the container env from create is the supported path.
## v1 task endpoints
| Method | Path | Purpose |
|--------|------|---------|
| `POST` | `/v1/sandboxes/{id}/tasks` | Submit task; wake if stopped |
| `GET` | `/v1/sandboxes/{id}/tasks/{taskId}` | Canonical result (SQLite) |
| `GET` | `/v1/sandboxes/{id}/tasks/{taskId}/events` | Live SSE stream |
| `POST` | `/v1/sandboxes/{id}/tasks/{taskId}/cancel` | Cancel in-flight task |
:::endpoint POST /v1/sandboxes/{id}/tasks
Submit a headless coding task. Returns **202 Accepted** with task metadata.
<ParamField path="id" type="string" required>
Sandbox ULID.
</ParamField>
<ParamField body="prompt" type="string" required>
Natural-language instruction for the agent (works in `~/workspace/app`).
</ParamField>
<ParamField body="agent" type="string">
Coding agent id. Defaults to `opencode`. Only `opencode` is supported.
</ParamField>
<ResponseExample>
```json
{
"id": "01JABCDEF...",
"sandbox_id": "01JXYZ...",
"status": "running",
"agent": "opencode",
"events_url": "/v1/sandboxes/01JXYZ.../tasks/01JABCDEF.../events"
}
```
</ResponseExample>
| HTTP | `error.code` | When |
|------|----------------|------|
| 404 | `not_found` | Unknown sandbox |
| 409 | `conflict` | Sandbox not `running` after wake attempt |
| 409 | `task_in_progress` | Another task is active in runtimed |
| 502 | `sandbox_unavailable` | Cannot reach runtimed (socket down, etc.) |
| 503 | `sandbox_capacity` | Wake refused (memory admission) |
:::
:::endpoint GET /v1/sandboxes/{id}/tasks/{taskId}
Read the durable task outcome from SQLite. Works while the sandbox is running, after **stop**, and after **delete** (workspace gone; result retained).
While `status` is `running` or `result_json` is unset:
<ResponseExample>
```json
{
"id": "01JABCDEF...",
"sandbox_id": "01JXYZ...",
"status": "running"
}
```
</ResponseExample>
When finished, the response promotes fields from `runtime.TaskResult` (`status`, `files_changed`, `build_ok`, `agent_message_final`, `tokens`, `failure_reason`, `preview_status_after`, etc.).
:::
:::endpoint GET /v1/sandboxes/{id}/tasks/{taskId}/events
Server-Sent Events stream proxied from runtimed’s newline-delimited JSON event log.
**Resume:**
- `Last-Event-ID: <n>` — continue after event id `n` (`since = n + 1`).
- `?since=<n>` — start at event index `n` (query wins when both are set).
Each SSE record:
```text
id: <monotonic_index>
event: <type>
data: <json>
```
Event types from runtimed:
| `event` | Role |
|---------|------|
| `status` | Phase updates (`phase` in data) |
| `message` | Agent text (`role`: `agent`, `agent_error`, …) |
| `tool` | Tool progress (`name`, `status`, `path`) |
| `build` | Post-task `pnpm build` outcome |
| `done` | Terminal; `data` is the full `TaskResult` |
Requires a **running** sandbox and a reachable runtimed socket (`502 sandbox_unavailable` otherwise).
:::
:::endpoint POST /v1/sandboxes/{id}/tasks/{taskId}/cancel
Ask runtimed to cancel the task (kills the agent process group). Idempotent at the runtimed layer.
<ResponseExample>
```json
{
"id": "01JABCDEF...",
"status": "cancelling"
}
```
</ResponseExample>
Cancel finalizes as `cancelled` with `failure_reason` `cancelled`. Timeout inside runtimed (default **10 minutes**) finalizes as `failed` / `agent_timeout`.
:::
## Task lifecycle inside runtimed
```mermaid
stateDiagram-v2
[*] --> queued: POST /tasks
queued --> checkpoint: runTask
checkpoint --> agent_running: git checkpoint
agent_running --> build_check: opencode finishes
build_check --> health_check: pnpm build
health_check --> done: preview probes
done --> [*]: emit done + result.json
agent_running --> cancelled: POST cancel
agent_running --> failed: agent_error / timeout
build_check --> failed: build failure path
```
Phases surfaced on `status` events include `starting`, `checkpoint`, `agent_running`, `build_check`, and `health_check`. The terminal `done` event carries the canonical `TaskResult`.
**Terminal `status` values:** `succeeded`, `failed`, `cancelled` (plus `running` while in flight).
**Common `failure_reason` values:** `agent_timeout`, `agent_error`, `cancelled`, `sandbox_unavailable`, `internal`.
Per-task artifacts under `.runtimed/tasks/<taskId>/`:
```text
.runtimed/tasks/<taskId>/
events.jsonl # append-only event log
result.json # canonical outcome (written at finish)
agent.log # agent stderr
```
## Durability and the background watcher
On accept, `sandboxd` inserts a row into the `task` table (`running`, `result_json` NULL) and starts `watchTask`, which tails runtimed’s event stream for up to **15 minutes** (three connect retries). When it sees `event: done`, it marshals `TaskResult` into SQLite via `FinishTask`.
<Info>
Clients do not need to stay connected to SSE for the result to be saved. A disconnected integrator can still `GET` the task once the watcher finishes.
</Info>
On `sandboxd` restart, `ReconcileTasks` finalizes orphaned `running` rows from `result.json`, re-attaches a watcher if the sandbox is still up, or marks `failed` / `sandbox_unavailable`.
The **idle reaper** skips sandboxes with a running task row so agents are not stopped mid-run; reaping resumes after the task ends.
**Retention trade-off:** SQLite keeps the canonical **result** after destroy. The full **event log** lives in the workspace and is **not** retained past sandbox destroy.
## runtimed socket contract
Transport is HTTP/1.1 over a Unix domain socket—no TCP port inside the sandbox.
| Property | Value |
|----------|--------|
| In-container path | `/home/sandbox/.runtimed/sock` (override `RUNTIMED_SOCKET`) |
| Host path | `<SANDBOXED_DATA_DIR>/workspaces/<id>.mnt/.runtimed/sock` |
| Client | `runtime.NewClient(socketPath)` — 5s timeout for control calls; unbounded for event streams |
|runtimed route|Method|Purpose|
|--------------|------|---------|
| `/status` | `GET` | `runtime.Status` — preview + `active_task` |
| `/tasks` | `POST` | Start task (`task_id`, `prompt`, `agent`, optional `env`, `timeout_s`) |
| `/tasks/{id}/events` | `GET` | NDJSON stream (`?since=` index) |
| `/tasks/{id}/cancel` | `POST` | Cancel task |
`sandboxd` assigns `task_id` (ULID) on v1 submit and passes it to runtimed so ids align across SQLite, SSE, and on-disk task dirs.
## Operational constraints
- **One task at a time** per sandbox (enforced in runtimed and surfaced as `task_in_progress`).
- **`POST /v1/sandboxes/{id}/stop`** is rejected while runtimed reports an active task; cancel the task first.
- **Interrupted tasks** (sandbox stop, runtimed crash) are finalized as `failed` / `sandbox_unavailable`, not resumed.
- **Claude Code / Codex** adapters are not wired through the tasks API yet; only OpenCode runs headlessly via `/tasks`.
## Verify end-to-end
1. Create sandbox with `ports: [3000]` (and optional `env` for your provider).
2. `POST /v1/sandboxes/{id}/tasks` with a concrete app prompt.
3. `curl -N` the `events` URL until you see `event: done`.
4. `GET /v1/sandboxes/{id}/tasks/{taskId}` — expect `status: succeeded` and `build_ok: true` for a healthy template build.
5. Open `http://s-{id}-3000.preview.localhost` (add `:$HTTP_PORT` if not using port 80).
## Related pages
<CardGroup>
<Card title="Quickstart" href="/quickstart">
Create → task → SSE → preview URL in one flow.
</Card>
<Card title="Example: agent todo app" href="/example-agent-todo">
End-to-end recipe with env injection and preview check.
</Card>
<Card title="runtimed reference" href="/runtimed-reference">
In-sandbox supervisor routes, env defaults, and protocol types.
</Card>
<Card title="v1 API reference" href="/v1-api-reference">
Full v1 request/response shapes and error envelope.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Stop-on-idle, wake-on-preview, and task-aware reaping.
</Card>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
Status machine, destroy vs purge, and reconcile-on-boot.
</Card>
</CardGroup>
---
## 09. Manage sandboxes
> Operational workflows: create (ports, env, template), exec, keepalive, POST /v1/sandboxes/{id}/stop, DELETE vs POST purge, claim, and external-user purge hooks.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/09-manage-sandboxes.md
- Generated: 2026-06-04T22:43:40.997Z
### Source Files
- `control-plane/internal/api/handlers.go`
- `control-plane/internal/api/v1.go`
- `control-plane/internal/api/external_purge.go`
- `AGENTS.md`
- `control-plane/internal/api/api.go`
- `control-plane/internal/store/store.go`
---
title: "Manage sandboxes"
description: "Operational workflows: create (ports, env, template), exec, keepalive, POST /v1/sandboxes/{id}/stop, DELETE vs POST purge, claim, and external-user purge hooks."
---
The `sandboxd` control plane exposes legacy `/sandbox*` routes and a narrower `/v1/sandboxes` layer over the same SQLite-backed lifecycle: create provisions a workspace and starts container `s-{ulid}`, exec and keepalive influence idle reaping, stop frees RAM while preserving disk, and destroy vs purge split on whether the workspace directory and `workspace_owner` row survive.
## Endpoint map
| Method | Path | Effect on container | Effect on workspace dir | Effect on DB row | Effect on `workspace_owner` |
| --- | --- | --- | --- | --- | --- |
| `POST` | `/sandbox` | Starts `s-{id}` | Creates or reuses under data dir | Inserts `sandbox` (+ owner on create) | Upsert on create |
| `POST` | `/sandbox/{id}/exec` | Runs command in running container | Unchanged | Bumps `last_active_at` | Unchanged |
| `POST` | `/sandbox/{id}/keepalive` | Unchanged | Unchanged | Sets `keepalive_until` | Unchanged |
| `POST` | `/v1/sandboxes/{id}/stop` | `docker stop` | Preserved | `status` → `stopped`, `stopped_at` | Unchanged |
| `DELETE` | `/sandbox/{id}` | `docker rm` | **Kept** on disk | Deletes `sandbox` row only | **Kept** |
| `POST` | `/sandbox/{id}/purge` | Stop + remove | **Deleted** (`RemoveAll`) | Deletes `sandbox` + owner | **Deleted** |
| `DELETE` | `/v1/sandboxes/{id}` | Same as purge | Same as purge | Same as purge | Same as purge |
| `POST` | `/sandbox/{id}/claim` | Unchanged | Unchanged | Updates external identity | Upsert owner |
| `POST` | `/external-users/{id}/purge` | Purge each owned sandbox | Delete each | Purge each | Delete each |
| `POST` | `/external-projects/{id}/purge` | Purge by project | Delete each | Purge each | Delete each |
<Note>
Loopback `Release` is a no-op in the OSS directory-storage build; workspace persistence is the bind-mounted directory under `SANDBOXED_DATA_DIR/workspaces/{id}/`, not a separate `.img` file.
</Note>
## Create a sandbox
`POST /sandbox` is the internal create path. `POST /v1/sandboxes` delegates to it after mapping `project.user_id` / `project.id` into `external` and fixing port `3000`.
### Request body (`POST /sandbox`)
<ParamField body="ports" type="int[]">
TCP ports exposed for Traefik preview routing. Each port must be 1–65535.
</ParamField>
<ParamField body="id" type="string">
Optional ULID. Omitted → auto-generated. Non-ULID values return `400` with `id must be a ULID`. Existing row → `409` (`DELETE` first for id-reuse with a new row).
</ParamField>
<ParamField body="env" type="object">
Key/value pairs passed to `docker run --env`. Keys must be non-empty and must not contain `=` or newlines; values must not contain newlines. Injected at create (e.g. `ANTHROPIC_API_KEY` for coding agents).
</ParamField>
<ParamField body="template" type="string">
Golden template name → `{SANDBOXD_TEMPLATES_DIR}/{name}.img` clone. Lowercase `[a-z0-9-]`, max 64 chars. Requires templates dir configured; unknown template → `400`.
</ParamField>
<ParamField body="template_path" type="string">
Internal only: pre-resolved absolute path under `LibraryRoot` or `TemplatesDir` (v1 `from_snapshot` spin-up). Mutually exclusive with `template`.
</ParamField>
<ParamField body="external" type="object">
`user_id` (defaults to `"local"` for OSS quickstart), optional `project_id`, `workspace_id`. IDs ≤256 chars, no control codes or commas.
</ParamField>
<ParamField body="visibility" type="string">
`public` (default) or `private` (forward-auth previews).
</ParamField>
<ParamField body="memory_high" type="string">
Soft cgroup throttle target; default `4G`. Applied only when `SANDBOXED_SET_MEMORY_HIGH=true`.
</ParamField>
<ParamField body="git_remote_url" type="string">
Optional `https://` remote for auto-git-push on task finish.
</ParamField>
### Create flow
```mermaid
sequenceDiagram
participant Client
participant sandboxd
participant SQLite
participant Loopback
participant Docker
Client->>sandboxd: POST /sandbox
sandboxd->>sandboxd: admit check (memory floor)
sandboxd->>SQLite: Create row (creating)
sandboxd->>Loopback: Provision or ProvisionFromTemplate
sandboxd->>Docker: run s-{id} + Traefik labels
sandboxd->>SQLite: MarkRunning + BumpLastActive
sandboxd-->>Client: 201 sandbox row
```
<Warning>
**Id-reuse guard:** If a `workspace_owner` row already exists for `id` (workspace on disk, no active `sandbox` row), `external.user_id` on create must match `workspace_owner.external_user_id` or the API returns `409` with `workspace_owner_mismatch`.
</Warning>
**Capacity:** Low host memory can return `503` with `Retry-After: 30` and `mem_available_percent` before any row is created.
**v1 idempotency:** `POST /v1/sandboxes` returns `200` with the existing sandbox when `external_project_id` already has a non-`error` row.
<RequestExample>
```bash
API=http://127.0.0.1:9090
curl -s -XPOST "$API/sandbox" \
-H 'Content-Type: application/json' \
-d '{
"ports": [3000, 5173],
"env": {"ANTHROPIC_API_KEY": "sk-ant-..."},
"template": "react-standard",
"external": {"user_id": "alice", "project_id": "proj-42"}
}'
```
</RequestExample>
<ResponseExample>
```json
{
"id": "01JXXXXXXXXXXXXXXXXXXXXXXX",
"status": "running",
"ports": [3000, 5173],
"visibility": "public",
"external_user_id": "alice",
"external_project_id": "proj-42",
"keepalive_until": 0,
"last_active_at": 1717500000
}
```
</ResponseExample>
## Run commands (`POST /sandbox/{id}/exec`)
Non-interactive `docker exec` into `s-{id}`.
<ParamField body="cmd" type="string[]" required>
Argv passed to exec (e.g. `["bash","-lc","cd ~/workspace && npm test"]`).
</ParamField>
<ParamField body="stream" type="boolean">
When `true`, response is `200` chunked `text/plain` (stdout, optional `---stderr---`, trailing `exit_code: N`). Default JSON: `{stdout, stderr, exit_code}`.
</ParamField>
Exec registers in-flight activity, bumps `last_active_at` at start and end, and audit-logs only `cmd[0]` (not the full command line). Requires a running container; docker errors → `500`.
## Postpone idle stop (`POST /sandbox/{id}/keepalive`)
<ParamField body="until" type="int" required>
Unix timestamp (seconds). Must be in the future. Capped to `now + SANDBOXD_KEEPALIVE_MAX_SECONDS` (default **86400**, 24h).
</ParamField>
While `keepalive_until > now`, the idle reaper skips the sandbox. Response: `{id, keepalive_until}`.
Idle stop itself is automatic: `SANDBOXD_IDLE_THRESHOLD_SECONDS` (default **2100**, 35 min) after `last_active_at`, unless exec, keepalive, active task, or in-flight exec applies.
## Stop without deleting workspace
| Route | When to use |
| --- | --- |
| `POST /v1/sandboxes/{id}/stop` | Explicit stop; returns v1 sandbox object |
| Idle / pressure reapers | Automatic `docker stop` when thresholds hit |
`v1StopSandbox` behavior:
- Idempotent if already `stopped` → `200`
- `409 conflict` if status is not `running`
- `409 task_in_progress` if runtimed reports an active task (cancel task first)
- `docker stop` with 10s timeout, then `MarkStoppedAt`
Workspace files and `workspace_owner` remain. The next preview request wakes the sandbox (see wake routing).
<RequestExample>
```bash
curl -s -XPOST "$API/v1/sandboxes/$ID/stop"
```
</RequestExample>
## Destroy vs purge
```text
DELETE /sandbox/{id} POST /sandbox/{id}/purge
| |
v v
docker rm s-{id} purgeOne():
(workspace dir KEPT) stop + rm container
DELETE sandbox row RemoveAll workspace dir
workspace_owner SURVIVES Remove _snapshots/{id}/
PurgeSandbox (row + owner)
```
### `DELETE /sandbox/{id}` — soft destroy
- Holds per-id lock for the operation
- Removes container, no-ops `Loopback.Release`, deletes **only** the `sandbox` table row
- Response: **`204 No Content`**
- **Id-reuse:** `POST /sandbox` with the same `id` reattaches the existing workspace if `workspace_owner` matches
Use before manual snapshot/restore when the sandbox must not be `running` (`409` if running).
### `POST /sandbox/{id}/purge` — hard teardown
`purgeOne` performs, in order:
1. Resolve `workspace_owner.external_user_id` for audit (before row delete)
2. Stop and remove `s-{id}` if present
3. Egress rule cleanup
4. `Loopback.Release` (no-op in OSS)
5. `os.RemoveAll` workspace directory
6. `os.RemoveAll` `{SnapshotsRoot}/{id}/` when configured
7. `Store.PurgeSandbox` — deletes `sandbox` and `workspace_owner`
<ResponseExample>
```json
{
"purged": true,
"freed_bytes": 104857600
}
```
</ResponseExample>
Audit action: `sandbox.purge` with `freed_bytes` in detail.
### `DELETE /v1/sandboxes/{id}`
Delegates to `handlePurgeSandbox`, not soft `DELETE`. Integrators destroying a project sandbox should use this route; it returns **`204`** on success.
## Claim upstream identity (`POST /sandbox/{id}/claim`)
Reassigns a legacy or backfilled sandbox to a real upstream tenant. Updates `sandbox.external_*` and upserts `workspace_owner` in one transaction.
<ParamField body="external_user_id" type="string" required>
New owner user id (same validation as create external ids).
</ParamField>
<ParamField body="external_project_id" type="string">
Optional; empty leaves project unchanged on existing owner row.
</ParamField>
<ParamField body="external_workspace_id" type="string">
Optional; empty leaves workspace unchanged on existing owner row.
</ParamField>
<ResponseExample>
```json
{
"id": "01JXXXXXXXXXXXXXXXXXXXXXXX",
"external_user_id": "tenant-user-9",
"claimed": true
}
```
</ResponseExample>
Audit: `sandbox.claim` with `prior_external_user_id` / `new_external_user_id` in detail. Requires a `sandbox` row (`404` if missing).
<Tip>
Enable `SANDBOXD_API_AUTH_DISABLED=false` and bearer tokens for claim and purge on any non-loopback API exposure; local default bind `127.0.0.1:9090` bypasses the middleware as operator loopback.
</Tip>
## External-user and external-project purge
Bulk teardown for tenant offboarding. Both routes call `purgeScope`, which looks up sandbox IDs from `workspace_owner` and runs `purgeOne` per id.
:::endpoint POST /external-users/{external_user_id}/purge
Purges every sandbox whose `workspace_owner.external_user_id` matches. Per-sandbox `sandbox.purge` audit rows plus summary `external_user.purge`.
:::
:::endpoint POST /external-projects/{external_project_id}/purge
Purges every sandbox whose `workspace_owner.external_project_id` matches. Summary action `external_project.purge`.
:::
<ResponseExample>
```json
{
"purged_count": 3,
"freed_bytes": 314572800
}
```
</ResponseExample>
<Warning>
On the first per-sandbox failure, the handler returns `500` and stops (partial purge may have already completed). Retry to finish remaining sandboxes.
</Warning>
## List and inspect
| Route | Notes |
| --- | --- |
| `GET /sandboxes` | All rows, newest first |
| `GET /sandboxes?external_user_id=&external_project_id=` | Filtered list |
| `GET /sandbox/{id}` | Row + optional `live_state` (docker inspect) + `runtime` (runtimed status) |
## Operational checklist
<Steps>
<Step title="Create with ports and env">
`POST /sandbox` or `POST /v1/sandboxes`; record `id` from response.
</Step>
<Step title="Drive work">
Tasks API, `exec`, or preview traffic (bumps activity).
</Step>
<Step title="Extend idle window if needed">
`POST /sandbox/{id}/keepalive` with future `until`.
</Step>
<Step title="Stop to free RAM">
`POST /v1/sandboxes/{id}/stop` or wait for idle reaper.
</Step>
<Step title="Tear down">
Soft: `DELETE /sandbox/{id}` (keep files for reuse). Hard: `POST /sandbox/{id}/purge` or `DELETE /v1/sandboxes/{id}`.
</Step>
<Step title="Tenant purge">
`POST /external-users/{id}/purge` or `POST /external-projects/{id}/purge`.
</Step>
</Steps>
## v1 vs legacy quick reference
| Goal | Legacy | v1 |
| --- | --- | --- |
| Create | `POST /sandbox` | `POST /v1/sandboxes` |
| Get | `GET /sandbox/{id}` | `GET /v1/sandboxes/{id}` |
| Stop | — (use reaper or implement via docker) | `POST /v1/sandboxes/{id}/stop` |
| Destroy, keep workspace | `DELETE /sandbox/{id}` | — |
| Destroy + delete workspace | `POST /sandbox/{id}/purge` | `DELETE /v1/sandboxes/{id}` |
v1 errors use `{error: {code, message, retryable}}`; legacy routes use `{error: "message"}`.
## Related pages
<CardGroup>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
Status machine, reconcile-on-boot, and destroy vs purge semantics in depth.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Idle threshold, keepalive skip rules, wake-on-preview, and memory admission.
</Card>
<Card title="Control plane API (legacy)" href="/legacy-api-reference">
Full `/sandbox*` route inventory including snapshots and wake JSON.
</Card>
<Card title="v1 API reference" href="/v1-api-reference">
Public shapes, error envelope, tasks, and files CRUD.
</Card>
<Card title="API authentication" href="/api-authentication">
Bearer tokens, loopback exemption, and privileged-route exposure.
</Card>
<Card title="Workspaces and isolation" href="/workspaces-persistence">
Bind-mount layout, seeding, templates, and directory storage trade-offs.
</Card>
</CardGroup>
---
## 10. API authentication
> Service-token auth (SANDBOXD_API_TOKENS, Authorization: Bearer), SANDBOXD_API_AUTH_DISABLED rollback, SIGHUP env reload, loopback exemptions, and LAN exposure of SANDBOXED_API_BIND.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/10-api-authentication.md
- Generated: 2026-06-04T22:43:32.405Z
### Source Files
- `control-plane/internal/auth/config.go`
- `control-plane/internal/auth/middleware.go`
- `control-plane/internal/auth/token.go`
- `.env.example`
- `control-plane/cmd/sandboxd/main.go`
- `README.md`
---
title: "API authentication"
description: "Service-token auth (SANDBOXD_API_TOKENS, Authorization: Bearer), SANDBOXD_API_AUTH_DISABLED rollback, SIGHUP env reload, loopback exemptions, and LAN exposure of SANDBOXED_API_BIND."
---
The `sandboxd` control plane gates programmatic API access with named service bearer tokens (`SANDBOXD_API_TOKENS`), an emergency open-API rollback (`SANDBOXD_API_AUTH_DISABLED`), and a loopback operator path that never checks a token. External callers (Traefik-routed or any non-loopback TCP peer) must present `Authorization: Bearer <secret>` when auth is enabled; on-host scripts hitting `SANDBOXED_API_BIND` on loopback skip the gate.
## Trust model
Two caller classes are wired in `control-plane/internal/auth`:
| Trust class | How requests arrive | Token required |
|---|---|---|
| **Operator** | Direct TCP to the published API bind with a loopback `RemoteAddr` and **no** `X-Forwarded-For` | No |
| **Service (upstream backend)** | Traefik edge (`api.preview.*`), LAN bind (`0.0.0.0:9090`), or any forwarded client | Yes, when auth is enabled |
End users do not call `sandboxd` directly. Browser preview traffic uses the wake catch-all and optional preview JWTs (`/forward-auth`, `/preview-auth`) — a separate model documented on the private previews page.
```mermaid
sequenceDiagram
participant Op as On-host operator
participant Svc as Upstream service
participant Traefik as Traefik
participant Auth as auth.Middleware
participant API as api.Server
Op->>API: GET /sandboxes (loopback, no XFF)
Note over Op,API: Actor: operator / loopback
Svc->>Traefik: POST /sandbox + Bearer
Traefik->>Auth: forwarded (X-Forwarded-For set)
Auth->>Auth: MatchToken constant-time
Auth->>API: Actor: service / token name
API-->>Svc: 200 JSON
Svc->>Traefik: POST /sandbox (no Bearer)
Traefik->>Auth: forwarded
Auth-->>Svc: 401 {"error":"unauthorized"}
```
## Configuration
Auth keys live in `.env` (copied from `.env.example` on install) and are passed into the `sandboxd` container via `docker-compose.yml`.
| Variable | Default (OSS compose) | Effect |
|---|---|---|
| `SANDBOXD_API_AUTH_DISABLED` | `true` | When **true** (or any value other than `false` / `0` / `no` / empty), all external API routes run without a bearer check. Set to `false` for production. |
| `SANDBOXD_API_TOKENS` | *(empty)* | Comma-separated `name=secret` pairs. The **name** is audit metadata; the **secret** is the bearer value. |
| `SANDBOXED_API_BIND` | `127.0.0.1:9090` | Host port published to container `:9000`. Controls who can reach the API on the host network. |
| `SANDBOXD_ENV_FILE` | `/etc/sandboxed/sandboxd.env` | File re-read on `SIGHUP` for token rotation (systemd-style deployments). |
<ParamField body="SANDBOXD_API_TOKENS" type="string">
Comma-separated list of `name=secret` entries. Whitespace around names and secrets is trimmed. Entries without `=` are skipped. Example: `backend=super-secret,ci-runner=other-secret`.
</ParamField>
<ParamField body="SANDBOXD_API_AUTH_DISABLED" type="boolean string">
Parsed case-insensitively. Values `false`, `0`, `no`, or empty → auth **enforced** on external paths. Any other value (including `true`, `1`, `yes`) → auth **disabled** (rollback / local dev).
</ParamField>
<Warning>
With `SANDBOXD_API_AUTH_DISABLED=false` and an empty `SANDBOXD_API_TOKENS`, every external request returns **401**; loopback still works. `sandboxd` logs a startup warning in that configuration.
</Warning>
## Enable service-token auth
<Steps>
<Step title="Set tokens and turn auth on">
Edit `.env`:
```bash
SANDBOXD_API_AUTH_DISABLED=false
SANDBOXD_API_TOKENS=my-backend=replace-with-long-random-secret
```
</Step>
<Step title="Recreate the control plane">
```bash
docker compose up -d sandboxd
```
Compose reads `.env` at container start. Startup logs include `api_tokens`, `preview_secrets`, and `auth_disabled` counts.
</Step>
<Step title="Verify from loopback (no token)">
```bash
curl -s "http://127.0.0.1:9090/healthz"
```
Loopback health checks succeed without a bearer.
</Step>
<Step title="Verify external path requires Bearer">
From a non-loopback client (or simulate Traefik by adding `X-Forwarded-For`):
<RequestExample>
```bash
curl -s -o /dev/null -w "%{http_code}" \
-H "X-Forwarded-For: 203.0.113.1" \
http://127.0.0.1:9090/sandboxes
```
</RequestExample>
<ResponseExample>
```
401
```
</ResponseExample>
With a valid token:
<RequestExample>
```bash
curl -s -H "Authorization: Bearer replace-with-long-random-secret" \
-H "X-Forwarded-For: 203.0.113.1" \
http://127.0.0.1:9090/sandboxes
```
</RequestExample>
</Step>
</Steps>
## Bearer token format
Send the secret in the standard header:
```
Authorization: Bearer <secret>
```
Matching is case-insensitive on the `Bearer ` prefix. The middleware compares the presented string against every configured token using `crypto/subtle.ConstantTimeCompare` and does not short-circuit on first match, so comparison timing does not leak which token index matched.
On success, the request context carries `Actor{Kind: "service", Name: <token-name>, IP: <client-ip>}`. Handlers use that for audit rows (`auditAction` in the API package).
On failure:
<ResponseExample>
```json
{"error":"unauthorized"}
```
</ResponseExample>
HTTP status **401**. Failed attempts are audit-logged as `auth.token_invalid` with the client IP (no token value is stored).
## Loopback operator path
A request is treated as **operator** (no bearer) when:
1. `X-Forwarded-For` is **absent**, and
2. `RemoteAddr` parses to a loopback IP (`127.0.0.1`, `::1`, etc.).
<Info>
Traefik always sets `X-Forwarded-For`, so traffic through the edge never qualifies as loopback even if it hits the same host port. That prevents forwarded clients from bypassing service-token auth.
</Info>
Typical local workflows (`curl http://127.0.0.1:9090/...` from the host, `install.sh` examples, `AGENTS.md` runbooks) use this path and need no token while auth is enabled.
## Routes outside service-token auth
These paths are reachable on the **external** path without a bearer (they still require passing the middleware, which assigns `Actor{Kind: "system"}`):
| Path | Why exempt |
|---|---|
| `GET /healthz` | Liveness |
| `GET /readyz` | Readiness |
| `GET /preview-auth` | Sets preview cookie; validates preview JWTs internally |
| `GET /forward-auth` | Traefik forwardAuth for private sandboxes |
| `GET /llm.txt` | Public integrator contract |
`GET /metrics` is the inverse: **loopback only**. Non-loopback callers receive **404 Not Found** (not 401), so Prometheus is not exposed on the Traefik edge.
All other API routes (`/sandbox`, `/v1/sandboxes`, legacy handlers, etc.) require a valid bearer when auth is enabled.
The auth middleware wraps **only** the API mux. Preview wake traffic (`hostDispatch` → catch-all) is not bearer-gated; private sandboxes enforce preview tokens inside the wake handler instead.
## Emergency rollback: `SANDBOXD_API_AUTH_DISABLED`
Set `SANDBOXD_API_AUTH_DISABLED=true` (or `1`, `yes`, or any value other than the “off” literals) to disable bearer checks on external paths. External requests run with `Actor{Name: "auth-disabled"}`.
<Warning>
Use only for break-glass recovery. Every external caller can invoke the full control-plane API while rollback is active.
</Warning>
## Token rotation via SIGHUP
On startup, `sandboxd` loads auth config from the process environment (`auth.ParseConfig(os.Getenv)`). After startup, environment variables in the container are **stale** for rotation purposes.
Sending **SIGHUP** reloads auth atomically from `SANDBOXD_ENV_FILE` (default `/etc/sandboxed/sandboxd.env`):
1. Parse the file (`KEY=value`, `#` comments, optional quotes stripped).
2. Build a new `auth.Config` via `ParseConfig`.
3. `authMw.Reload(nc)` swaps the config without restarting the listener.
If the file cannot be read, the previous config is kept and an error is logged.
<Tip>
Docker Compose deployments usually rotate by editing `.env` and running `docker compose up -d sandboxd`. SIGHUP reload is aimed at systemd installs that mount a persistent `sandboxd.env` and signal the daemon (`kill -HUP <pid>`).
</Tip>
## `SANDBOXED_API_BIND` and LAN exposure
Compose publishes the API as:
```
"${SANDBOXED_API_BIND:-127.0.0.1:9090}:9000"
```
| Bind | Exposure |
|---|---|
| `127.0.0.1:9090` (default) | Only processes on the host can open the API port. Loopback `curl` works; remote LAN clients cannot connect. |
| `0.0.0.0:9090` | API listens on all host interfaces — any machine on the LAN can reach `:9090`. |
Inside the stack, Traefik also reaches `http://sandboxd:9000` on the internal network (`traefik/dynamic/api.yml` optional router `api.preview.*`). That path is **external** from the middleware’s perspective (forwarded client IP), so bearer auth applies when enabled.
<Check>
Before binding `0.0.0.0`, set `SANDBOXD_API_AUTH_DISABLED=false`, configure strong `SANDBOXD_API_TOKENS`, and restrict host firewall access to trusted backends only.
</Check>
## Traefik API edge
When `traefik/dynamic/api.yml` is present, the control plane is also reachable at `http://api.preview.<PREVIEW_DOMAIN>` (for example `http://api.preview.localhost`). The same service-token rules apply: enable auth and pass `Authorization: Bearer` from your upstream service. Health, preview-auth, forward-auth, and `llm.txt` stay open as listed above.
## Audit and actor kinds
| `Actor.Kind` | When set |
|---|---|
| `operator` | Loopback, no token |
| `service` | Valid bearer, or auth-disabled rollback |
| `system` | Exempt path |
| `unknown` | Default when context has no actor |
Privileged API actions record `ActorKind`, `ActorName`, and `ActorIP` in the SQLite audit log.
## Related pages
<CardGroup>
<Card title="Configuration reference" href="/configuration-reference">
All compose-backed env keys, including preview domain, idle tuning, and auth-related variables.
</Card>
<Card title="Production deployment" href="/production-deployment">
Wildcard DNS, TLS, and the checklist to enable API auth before exposing the host.
</Card>
<Card title="Private previews" href="/private-previews">
Preview JWTs, `/forward-auth`, and `/preview-auth` — separate from service bearer tokens.
</Card>
<Card title="v1 API reference" href="/v1-api-reference">
Public `/v1/sandboxes` shapes and error envelope once auth succeeds.
</Card>
<Card title="Legacy API reference" href="/legacy-api-reference">
Internal `/sandbox*` routes and health endpoints.
</Card>
</CardGroup>
---
## 11. Private previews
> visibility=private sandboxes, Traefik forwardAuth to /forward-auth, preview tokens (SANDBOXD_PREVIEW_TOKEN_SECRETS), /preview-auth redirect flow, and deny modes.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/11-private-previews.md
- Generated: 2026-06-04T22:44:13.372Z
### Source Files
- `traefik/dynamic/auth.yml`
- `control-plane/internal/api/forward_auth.go`
- `control-plane/internal/api/preview_auth.go`
- `control-plane/internal/auth/preview_token.go`
- `control-plane/internal/traefik/traefik.go`
- `control-plane/internal/api/handlers.go`
---
title: "Private previews"
description: "visibility=private sandboxes, Traefik forwardAuth to /forward-auth, preview tokens (SANDBOXD_PREVIEW_TOKEN_SECRETS), /preview-auth redirect flow, and deny modes."
---
Private previews gate browser traffic to sandboxes created with `visibility: "private"`. Traefik attaches the file-provider middleware `sandbox-preview-auth@file`, which calls `sandboxd` at `GET /forward-auth` on every preview request. Access requires a valid `sandbox_preview` cookie minted by `GET /preview-auth` from an upstream-signed HS256 JWT. Public sandboxes (the default) never attach this middleware; knowing the preview URL is sufficient for them.
## Public vs private
| `visibility` | Default | Preview gate | Traefik middleware |
|---|---|---|---|
| `public` | yes | URL only | none |
| `private` | no | forward-auth + preview cookie | `sandbox-preview-auth@file` on each port router |
Set visibility at create time on `POST /sandbox` or `POST /v1/sandboxes`. Omitting the field defaults to `public`. Any other value returns `400` with `visibility must be 'public' or 'private'`.
<RequestExample>
```bash
curl -s -X POST http://127.0.0.1:9090/sandbox \
-H 'Content-Type: application/json' \
-d '{
"ports": [3000],
"visibility": "private",
"external": { "user_id": "user-alice", "project_id": "proj-1" }
}'
```
</RequestExample>
The v1 create path forwards the same field:
```json
{ "project": { "id": "proj-1", "user_id": "user-alice" }, "visibility": "private" }
```
Private auth ties viewers to `workspace_owner.external_user_id` when that row exists (inserted on create from `external.user_id`). The OSS quickstart defaults `external.user_id` to `"local"` when omitted.
## Traefik wiring
`traefik/dynamic/auth.yml` defines the forward-auth middleware:
```yaml
sandbox-preview-auth:
forwardAuth:
address: "http://sandboxd:9000/forward-auth"
trustForwardHeader: true
authResponseHeaders:
- X-Sandbox-External-User-Id
```
When `visibility == "private"`, per-port Traefik labels add:
```
traefik.http.routers.s-<id>-<port>.middlewares=sandbox-preview-auth@file
```
Public sandboxes must not carry that label (enforced in tests). The middleware is inert for the default public workflow but must be present in the dynamic config before any private sandbox is created.
<Note>
`GET /forward-auth` and `GET /preview-auth` are exempt from API bearer-token auth. They validate their own JWTs. `/healthz` and `/readyz` are also exempt.
</Note>
## End-to-end browser flow
```mermaid
sequenceDiagram
participant Browser
participant Traefik
participant sandboxd
participant Upstream as Upstream app
Browser->>Traefik: GET https://s-{id}-{port}.preview.{domain}/
Traefik->>sandboxd: GET /forward-auth (X-Forwarded-Host, X-Forwarded-Uri)
alt no valid sandbox_preview cookie
sandboxd-->>Traefik: 302 Location: SANDBOXD_AUTH_REDIRECT_URL
Traefik-->>Browser: redirect to upstream sign-in
Upstream->>Browser: redirect with ?token=...&return=...
Browser->>sandboxd: GET /preview-auth?token=...&return=...
sandboxd->>sandboxd: Verify HS256 JWS, Set-Cookie sandbox_preview
sandboxd-->>Browser: 302 to allowlisted return URL
Browser->>Traefik: retry preview (cookie present)
end
sandboxd-->>Traefik: 200 + X-Sandbox-External-User-Id
Traefik->>Browser: proxied app response
```
<Steps>
<Step title="Configure secrets and redirect URL">
Set `SANDBOXD_PREVIEW_TOKEN_SECRETS` (comma-separated `kid=secret` pairs) and `SANDBOXD_AUTH_REDIRECT_URL` on `sandboxd`. The redirect template uses `{sandbox_id}` and `{return}` placeholders (URL-encoded on substitution). Reload via SIGHUP after editing `SANDBOXD_ENV_FILE` (default `/etc/sandboxed/sandboxd.env`).
</Step>
<Step title="Create a private sandbox">
`POST /sandbox` or `POST /v1/sandboxes` with `"visibility": "private"` and a stable `external.user_id` for the owner.
</Step>
<Step title="Mint a preview token upstream">
Your backend signs an HS256 compact JWS using the shared secret for the chosen `kid`. Claims must include `aud: "sandbox-preview"`, `sandbox_id`, `sub` (viewer id), and a future `exp`.
</Step>
<Step title="Send the user through /preview-auth">
Redirect the browser to your auth UI, then back to sandboxd:
`GET /preview-auth?token=<jws>&return=<https://s-{id}-{port}.preview.{domain}/...>`
On success, sandboxd sets `sandbox_preview` on domain `.preview.{PREVIEW_DOMAIN}` (`HttpOnly`, `Secure`, `SameSite=Lax`) and 302s to `return`.
</Step>
<Step title="Open the preview URL">
Subsequent requests hit `/forward-auth` with the cookie; Traefik forwards `X-Sandbox-External-User-Id` from the validated `sub` claim.
</Step>
</Steps>
## Preview token format
Upstream mints a standard compact JWS: `header.payload.signature` (base64url, no padding).
| JWS header | Required value |
|---|---|
| `alg` | `HS256` |
| `kid` | key id present in `SANDBOXD_PREVIEW_TOKEN_SECRETS` |
| Claim | Role |
|---|---|
| `aud` | must be `sandbox-preview` (`PreviewAudience`) |
| `sandbox_id` | must match the sandbox being viewed |
| `sub` | viewer identity; must match `workspace_owner.external_user_id` when that row exists |
| `exp` | Unix seconds; cookie `Max-Age` derived from remaining lifetime |
| `iss`, `iat` | stored; not used for authorization beyond signature/exp/aud |
Example claims shape (from tests):
```json
{
"iss": "upstream-prod",
"iat": 1710000000,
"exp": 1710003600,
"aud": "sandbox-preview",
"sub": "user-alice",
"sandbox_id": "01HXANYZ000000000000000000"
}
```
<ParamField body="SANDBOXD_PREVIEW_TOKEN_SECRETS" type="string">
Comma-separated `kid=secret` list (whitespace trimmed). Maps JWS `kid` to HMAC secret. Supports rotation by listing multiple kids. Parsed in `auth.ParseConfig`; reloadable on SIGHUP.
</ParamField>
<Warning>
The shipped `docker-compose.yml` does not pass `SANDBOXD_PREVIEW_TOKEN_SECRETS` or `SANDBOXD_AUTH_REDIRECT_URL` by default. Add them to `.env` and extend the `sandboxd` service `environment` block (or `SANDBOXD_ENV_FILE`) before private previews work outside a custom deployment.
</Warning>
## GET /forward-auth
Traefik invokes this on every request to a private sandbox router.
**Inputs (headers):**
| Header | Use |
|---|---|
| `X-Forwarded-Host` | Parsed as `s-<id>-<port>.preview.<PREVIEW_DOMAIN>` |
| `X-Forwarded-Uri` | Rebuilt into `return` on deny |
| `Cookie: sandbox_preview` | HS256 JWS from upstream |
**Outcomes:**
| HTTP | Meaning |
|---|---|
| `200` | Allow; sets `X-Sandbox-External-User-Id` to `claims.Sub` |
| `302` | Deny (default mode): redirect to `SANDBOXD_AUTH_REDIRECT_URL` |
| `401` | Deny (`meta-refresh` mode): `Location` + HTML meta refresh to same target |
| `401` | Unparseable `X-Forwarded-Host` (no redirect) |
| `404` | Sandbox id not found (HTML error page) |
| `500` | Store error |
If the sandbox row exists but `visibility != "private"`, the handler returns `200` as a safety net (public sandboxes should not route through forward-auth).
### Denial reasons
`auth.CheckPreviewAccess` returns a machine-readable reason audited as `preview.access_denied`:
| Reason | Cause |
|---|---|
| `no_cookie` | Missing `sandbox_preview` |
| `bad_signature` | Malformed JWS, wrong alg/kid, bad sig, or wrong `aud` |
| `expired` | `exp` in the past |
| `wrong_sandbox` | `sandbox_id` claim ≠ host-derived id |
| `wrong_user` | `sub` ≠ `workspace_owner.external_user_id` (when owner row exists) |
If `workspace_owner` is absent (legacy private sandbox), signature and sandbox-id checks still apply; the owner check is skipped.
## GET /preview-auth
Landing endpoint after upstream authentication.
<ParamField query="token" type="string" required>
Upstream-signed HS256 JWS (same format as the cookie value).
</ParamField>
<ParamField query="return" type="string" required>
HTTPS URL to redirect after cookie issuance. Allowlisted patterns only:
- `https://s-<id>-<port>.preview.<domain>(/…)?` — `id` must match JWT `sandbox_id`
- `https://api.preview.<domain>(/…)?`
Any other `return` → `400` `return url not allowed for this sandbox`.
</ParamField>
On invalid token, the handler does not expose validation details; it audits `preview_auth_token_invalid` and 302s to `SANDBOXD_AUTH_REDIRECT_URL` (or `401` if redirect URL unset). On success it audits `preview.session_issued` and sets the cookie on `.preview.{PREVIEW_DOMAIN}`.
## Redirect URL template
<ParamField body="SANDBOXD_AUTH_REDIRECT_URL" type="string">
Template for sending unauthenticated users to your sign-in UI. Substitutions: `{sandbox_id}` → query-escaped id, `{return}` → query-escaped full preview URL.
Example from tests:
```
https://app/x?sandbox_id={sandbox_id}&return={return}
```
→ `https://app/x?sandbox_id=sb1&return=https%3A%2F%2Fs-sb1-3000.preview.example.com%2F`
If empty, deny paths return plain `401` without redirect (forward-auth and wake).
</ParamField>
## Deny modes
<ParamField body="SANDBOXD_FORWARD_AUTH_DENY_MODE" type="string">
Default `redirect`. Only other supported value: `meta-refresh`.
</ParamField>
| Mode | forward-auth deny | wake deny (private, HTML) |
|---|---|---|
| `redirect` (default) | `302` to auth URL | `302` to auth URL |
| `meta-refresh` | `401` + `Location` + HTML `<meta http-equiv="refresh">` | same pattern |
Use `meta-refresh` when Traefik builds do not pass `3xx` from the auth service through forward-auth cleanly. Both modes set `Location` to the same `BuildRedirectURL` target.
## Stopped private sandboxes and wake
The catch-all wake path runs the same `CheckPreviewAccess` before `docker start` when:
- the request is HTML (browser preview hit), and
- `visibility == "private"`, and
- the actor is not `service` or `operator` (API bearer / loopback).
Service- and operator-authenticated callers skip the cookie gate so `POST /v1/sandboxes/{id}/tasks` can wake a stopped private sandbox without a browser cookie. End-user preview URLs still require the cookie flow.
Metrics: `sandboxd_preview_access_total{result="allowed|denied"}`, `sandboxd_forward_auth_duration_seconds`, wake counter `auth_denied` when gated.
## Operations checklist
<Check>
- `traefik/dynamic/auth.yml` mounted (compose mounts `./traefik/dynamic`)
- `SANDBOXD_PREVIEW_TOKEN_SECRETS` configured and matches upstream signing `kid`
- `SANDBOXD_AUTH_REDIRECT_URL` points at your token-minting UI
- `external.user_id` on create matches the `sub` you put in preview tokens
- For TLS previews, use `https://` in `return` URLs and cookie `Secure: true` (always set by sandboxd)
- Optional: `SANDBOXD_FORWARD_AUTH_DENY_MODE=meta-refresh` if redirects fail through Traefik
</Check>
<Tip>
Prometheus label `sandboxd_preview_access_total` and histogram `sandboxd_forward_auth_duration_seconds` instrument the forward-auth hot path (phase-9 capacity target: p95 under 50 ms).
</Tip>
## Related pages
<CardGroup>
<Card title="Preview routing" href="/preview-routing">
Host rules, router priority 100, and how Traefik discovers sandbox routers.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Catch-all wake path, private wake gating, and warming-page behavior.
</Card>
<Card title="API authentication" href="/api-authentication">
Service tokens vs preview endpoints exempt from bearer auth.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
Env keys for preview domain, TLS, auth, and forward-auth deny mode.
</Card>
<Card title="Production deployment" href="/production-deployment">
Wildcard DNS, TLS, and hardening when exposing previews on the internet.
</Card>
</CardGroup>
---
## 12. Production deployment
> Wildcard DNS, traefik websecure + cert resolver, PREVIEW_TLS=true, enable API auth, hardening checklist (isolation, egress, disk), and scaling boundaries from README.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/12-production-deployment.md
- Generated: 2026-06-04T22:44:24.233Z
### Source Files
- `README.md`
- `traefik/traefik.yml`
- `.env.example`
- `docker-compose.yml`
- `ARCHITECTURE.md`
- `control-plane/internal/egress/nftables.go`
---
title: "Production deployment"
description: "Wildcard DNS, traefik websecure + cert resolver, PREVIEW_TLS=true, enable API auth, hardening checklist (isolation, egress, disk), and scaling boundaries from README."
---
Production moves the stack off `*.localhost` plain HTTP to a real wildcard preview domain: Traefik terminates TLS on `websecure`, `sandboxd` emits `tls=true` preview routers from `PREVIEW_ENTRYPOINT` / `PREVIEW_TLS`, and the control-plane API must run with service-token auth before you widen `SANDBOXED_API_BIND` beyond loopback.
```mermaid
flowchart TB
subgraph edge["Host — Traefik v3"]
DNS["*.preview.yourdomain.com"]
WEB["entrypoint web :80"]
SEC["entrypoint websecure :443"]
WAKE["file: sandbox-wake priority 1"]
APIR["file: sandbox-api optional"]
end
subgraph cp["sandboxd container"]
SQLITE[(SQLite WAL)]
AUTH[auth middleware]
WAKEH[wake handler]
end
subgraph sandboxes["Per-sandbox containers s-{ulid}"]
RT[runtimed + dev server]
LBL[Traefik Docker labels priority 100]
end
Browser --> DNS
DNS --> SEC
SEC --> LBL
SEC --> WAKE
WAKE --> WAKEH
WAKEH --> RT
APIR --> AUTH
AUTH --> SQLITE
LBL --> RT
```
## Wildcard DNS
Preview hostnames are literal per sandbox and port:
`s-{ulid}-{port}.preview.{PREVIEW_DOMAIN}`
For `PREVIEW_DOMAIN=yourdomain.com` and port `3000`, browsers hit `https://s-01HXANYZ-3000.preview.yourdomain.com`. Point a **wildcard** record at the host running Traefik:
| Record | Target |
|--------|--------|
| `*.preview.yourdomain.com` | Host public IP or load balancer in front of Traefik |
The wake catch-all in `traefik/dynamic/wake.yml` matches `HostRegexp(\`^s-[0-9A-Za-z]+-[0-9]+\\.preview\\..+$\`)`, so you do not edit that file when changing `PREVIEW_DOMAIN` — `sandboxd` validates the exact domain on wake.
<Note>
Local dev uses `PREVIEW_DOMAIN=localhost`; browsers resolve `*.localhost` to `127.0.0.1` with no DNS or certificates. Production replaces only the domain and TLS path.
</Note>
## Traefik: websecure, certificates, and dynamic routes
The shipped `traefik/traefik.yml` exposes plain HTTP on entrypoint `web` (`:80`). For production:
1. Uncomment the `websecure` entrypoint (`:443`) in `traefik/traefik.yml`.
2. Add a certificate resolver (README recommends **Let's Encrypt DNS-01** so one wildcard cert covers every preview host and you avoid per-host ACME rate limits).
3. In `docker-compose.yml`, publish `443` (uncomment `"${HTTPS_PORT:-443}:443"`).
4. Align **file-provider** routers with TLS: `traefik/dynamic/wake.yml` and `traefik/dynamic/api.yml` default to `entryPoints: [web]` — change them to `[websecure]` when previews and optional API routing use HTTPS only.
`sandboxd` does **not** set a per-router `certresolver`. Preview routers get `traefik.http.routers.<name>.tls=true` when `PREVIEW_TLS=true`; Traefik serves them from the **default TLS store** using one shared wildcard `*.preview.<domain>` certificate (no per-sandbox ACME orders). Supply that cert via your resolver output or a file under `traefik/dynamic/` (Traefik watches that directory).
Traefik's Docker provider is constrained to `Label(\`sandboxed.managed\`,\`true\`)` so only stack-owned sandboxes are routed. Running sandboxes register priority **100** routers; the wake catch-all stays at priority **1**.
## sandboxd preview and compose env
Set these in `.env` (see `.env.example`) before `docker compose up -d`:
| Variable | Production value | Effect |
|----------|------------------|--------|
| `PREVIEW_DOMAIN` | `yourdomain.com` | Host rule suffix `*.preview.yourdomain.com` |
| `PREVIEW_ENTRYPOINT` | `websecure` | Docker label `entrypoints=websecure` |
| `PREVIEW_TLS` | `true` | Docker label `tls=true` on preview routers |
| `HTTP_PORT` | `80` (optional) | Host map for Traefik `web` |
| `HTTPS_PORT` | `443` | Host map for Traefik `websecure` when uncommented in compose |
| `SANDBOXED_API_BIND` | `127.0.0.1:9090` default | Loopback API; use `0.0.0.0:9090` only with auth enabled |
| `SANDBOXD_API_AUTH_DISABLED` | `false` | Require bearer tokens on external API paths |
| `SANDBOXD_API_TOKENS` | `name1:secret1,name2:secret2` | Service-token pairs |
Compose passes `PREVIEW_DOMAIN`, `PREVIEW_ENTRYPOINT`, `PREVIEW_TLS`, and auth vars into the `sandboxd` service environment.
<Steps>
<Step title="Configure DNS and Traefik TLS">
Point `*.preview.yourdomain.com` at the host. Enable `websecure`, add a cert resolver or mount a wildcard cert into `traefik/dynamic/`, publish port 443, and set file-provider `entryPoints` to `websecure` where needed.
</Step>
<Step title="Set production .env">
```bash
PREVIEW_DOMAIN=yourdomain.com
PREVIEW_ENTRYPOINT=websecure
PREVIEW_TLS=true
SANDBOXD_API_AUTH_DISABLED=false
SANDBOXD_API_TOKENS=prod:your-long-random-secret
SANDBOXED_API_BIND=127.0.0.1:9090
```
</Step>
<Step title="Redeploy the stack">
```bash
docker compose up -d
curl -s https://s-<id>-3000.preview.yourdomain.com/
```
</Step>
<Step title="Call the API with a bearer token">
```bash
curl -s -XPOST http://127.0.0.1:9090/sandbox \
-H 'Authorization: Bearer your-long-random-secret' \
-H 'content-type: application/json' \
-d '{"ports":[3000]}'
```
Loopback requests without `X-Forwarded-For` bypass auth (operator path). Traefik-forwarded calls must send `Authorization: Bearer <secret>`.
</Step>
</Steps>
## Enable API authentication
Default install keeps the API open (`SANDBOXD_API_AUTH_DISABLED=true` in `.env.example`) for local integration. Production should set `SANDBOXD_API_AUTH_DISABLED=false` and non-empty `SANDBOXD_API_TOKENS`.
<ParamField body="SANDBOXD_API_TOKENS" type="string">
Comma-separated `name:secret` pairs. The middleware matches the bearer token against secrets; the token **name** is recorded for audit.
</ParamField>
<ParamField body="SANDBOXD_API_AUTH_DISABLED" type="boolean">
When not `false` / `0` / `no`, every external request is treated as unauthenticated (`auth-disabled` actor). Intended as an emergency rollback only.
</ParamField>
Exempt paths (no bearer required): `/healthz`, `/readyz`, `/preview-auth`, `/forward-auth`, `/llm.txt`. `/metrics` is **loopback-only** and returns 404 externally.
Optional edge exposure: `traefik/dynamic/api.yml` routes `api.preview.<domain>` to `sandboxd`; the same token rules apply. Delete that file to keep the API off Traefik.
Send tokens on every integrator call:
```http
Authorization: Bearer <secret>
```
<Warning>
If `SANDBOXD_API_AUTH_DISABLED=false` but `SANDBOXD_API_TOKENS` is empty, `sandboxd` logs a startup warning and every external API call receives `401`.
</Warning>
<Info>
Send `SIGHUP` to `sandboxd` to reload auth config from `SANDBOXD_ENV_FILE` (default `/etc/sandboxed/sandboxd.env` in non-compose deployments) without restarting the process — token rotation path.
</Info>
## Hardening checklist
v1 optimizes for a single Docker host in one command. The README and `ARCHITECTURE.md` call out what to tighten for real users and revenue.
### Isolation
Each sandbox is created with hardened `docker run` flags:
| Control | Value |
|---------|--------|
| Capabilities | `--cap-drop=ALL` |
| Privilege escalation | `--security-opt=no-new-privileges` |
| Root filesystem | `--read-only` + `tmpfs` on `/tmp` and `/var/tmp` |
| Memory | hard `--memory=10g`, `--memory-swap=10g` |
| PIDs | `--pids-limit=1024` |
| Writable disk | bind-mounted workspace only (`/home/sandbox`) |
Default `memory_high` on create is **4G** (cgroup soft throttle); applying it requires `SANDBOXED_SET_MEMORY_HIGH=true` and host cgroup visibility from the control-plane container.
Threat model: **authenticated, accountable users running their own code** — not anonymous hostile multi-tenancy. For untrusted strangers' code, README recommends **VM-per-tenant** or stronger runtimes (gVisor, Kata, Firecracker).
`SANDBOXED_USERNS` defaults to `host` so workspace ownership stays deterministic with or without daemon `userns-remap`. Clear it to use the daemon default for sandboxes.
Idle and **host memory pressure** reapers `docker stop` sandboxes to free RAM; wake admission refuses starts when host memory is low.
### Egress
OSS `sandboxd` sets `egress.Manager` to **nil** — default-allow outbound network, no connection logging. The `control-plane/internal/egress` package implements nftables `sandbox_sources_v4` membership for a future host-level policy (metadata blocks, SMTP, abuse lists, RFC1918, cross-sandbox rules per rule comments). That path requires host `nft`, journald, and systemd timers not present in plain `docker compose`.
<Tip>
To harden egress today: add host firewall rules or an egress proxy; treat open egress as a known v1 trade-off in `ARCHITECTURE.md`.
</Tip>
### Disk and data
| Asset | Location | Quota |
|-------|----------|-------|
| Workspaces | `SANDBOXED_DATA_DIR/workspaces/<id>/` | **No** per-workspace hard quota (plain directories) |
| State | `SANDBOXED_DATA_DIR/state/sandboxd.db` | Host filesystem shared |
| Logs | `SANDBOXED_LOG_DIR/traefik-access.log` | Shared access log for activity tailing |
Back up workspaces by copying directories; back up SQLite for control-plane truth. Plan filesystem or volume quotas and multi-host sharding before heavy multi-tenant load.
### Previews and host trust
| Risk | v1 behavior | Mitigation |
|------|-------------|------------|
| Public preview URLs | Anyone with the link can load the app | Create with `visibility=private` + forward-auth (`traefik/dynamic/auth.yml`) |
| Control plane power | `sandboxd` mounts the Docker socket (root-equivalent on host) | Dedicated host, patching, no unrelated secrets co-located |
| API exposure | Open by default | Auth + bind API to loopback unless Traefik + tokens protect the edge |
## Scaling boundaries
| Dimension | v1 limit | Beyond v1 |
|-----------|----------|-----------|
| Hosts | **One server**, one Docker socket | Shard by tenant or region; control plane is a thin `docker` CLI boundary (K8s noted as interface swap in README, not shipped) |
| Density | Many stopped sandboxes, fewer running; idle stop + wake | Tune `SANDBOXD_IDLE_THRESHOLD_SECONDS` (default 2100s); monitor pressure reaper |
| State | Single SQLite (WAL) on disk | Backup `sandboxd.db`; no built-in HA |
| Previews | Traefik on same host as sandboxes | Wildcard cert + DNS; optional `api.preview.*` router |
| Snapshots/templates | API present; directory storage experimental | Prefer workspace copies until snapshot backend matures |
The README's scaling summary for fast growth: (1) stronger isolation if code is untrusted, (2) **API auth + host lockdown**, (3) plan **more than one machine** — other items are configuration or operational layers, not a rewrite of the create → build → preview loop.
## Verification
| Check | Command / signal |
|-------|------------------|
| Control plane live | `curl -s http://127.0.0.1:9090/healthz` → `ok` |
| Ready for sandboxes | `curl -s http://127.0.0.1:9090/readyz` → `ready` |
| TLS preview | Browser or `curl` to `https://s-<id>-<port>.preview.<PREVIEW_DOMAIN>/` |
| Auth enforced | API without bearer → `401` JSON `{"error":"unauthorized"}` |
| Managed sandboxes only | `docker ps --filter label=sandboxed.managed=true` |
## Related pages
<CardGroup>
<Card title="Installation" href="/installation">
Prerequisites, `install.sh`, and first health checks before going public.
</Card>
<Card title="Preview routing" href="/preview-routing">
Host rules, router priorities, and `PREVIEW_DOMAIN` / entrypoint behavior.
</Card>
<Card title="API authentication" href="/api-authentication">
Bearer tokens, loopback exemptions, SIGHUP reload, and LAN bind guidance.
</Card>
<Card title="Private previews" href="/private-previews">
`visibility=private`, forward-auth, and preview tokens for sensitive apps.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
Full `.env` keys including idle, memory, and advanced toggles.
</Card>
<Card title="Workspaces and isolation" href="/workspaces-persistence">
Bind mounts, seeding, caps, and storage trade-offs.
</Card>
</CardGroup>
---
## 13. Control plane API (legacy)
> Internal /sandbox* routes: create/list/get, exec, keepalive, wake JSON, per-sandbox snapshots, purge/claim, healthz/readyz, metrics, GET /llm.txt integrator contract.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/13-control-plane-api-legacy.md
- Generated: 2026-06-04T22:44:54.159Z
### Source Files
- `control-plane/internal/api/api.go`
- `control-plane/internal/api/handlers.go`
- `control-plane/internal/api/llmtxt.go`
- `AGENTS.md`
- `control-plane/README.md`
- `control-plane/internal/api/v1_snapshots.go`
---
title: "Control plane API (legacy)"
description: "Internal /sandbox* routes: create/list/get, exec, keepalive, wake JSON, per-sandbox snapshots, purge/claim, healthz/readyz, metrics, GET /llm.txt integrator contract."
---
The `sandboxd` HTTP server in `control-plane/internal/api` registers legacy `/sandbox*` routes on the same listener as `/v1/*`, wake catch-all preview traffic, and ops endpoints. Requests hit `hostDispatch` first (preview `Host` → wake HTML); everything else goes to the API mux, wrapped in optional service-token auth and per-route Prometheus counters (`sandboxd_api_requests_total`, `sandboxd_api_request_duration_seconds`). Legacy handlers return a flat JSON error envelope `{"error":"<message>"}` (or richer maps for a few conflict cases); integrators building new features should prefer `/v1/sandboxes` (structured `code` / `message` / `retryable`).
## Base URL and routing
| Surface | Default bind | Typical caller |
| --- | --- | --- |
| Control plane API | `127.0.0.1:9090` (`SANDBOXED_API_BIND`) | `curl`, automation on the host |
| Preview wake (HTML) | Traefik `HTTP_PORT` with `Host: s-{ulid}-{port}.preview.{PREVIEW_DOMAIN}` | Browser |
```text
Client sandboxd (one listener)
| |
| Host matches preview? |
+----------------------------->| hostDispatch -> wake.ServeCatchAll (HTML)
| |
| else |
+----------------------------->| auth.Wrap -> api.Server.Handler()
```
<Note>
Loopback clients (`RemoteAddr` is loopback and no `X-Forwarded-For`) skip bearer auth. Traefik-forwarded calls are treated as external and require `Authorization: Bearer <secret>` when `SANDBOXD_API_AUTH_DISABLED=false`.
</Note>
## Legacy route inventory
| Method | Path | Role |
| --- | --- | --- |
| `POST` | `/sandbox` | Create sandbox row + container |
| `GET` | `/sandboxes` | List sandboxes (optional filters) |
| `GET` | `/sandbox/{id}` | Row + Docker inspect + runtimed status |
| `DELETE` | `/sandbox/{id}` | Destroy container; **keep** workspace |
| `POST` | `/sandbox/{id}/exec` | Non-interactive `docker exec` |
| `POST` | `/sandbox/{id}/keepalive` | Extend idle-reaper exemption |
| `POST` | `/wake/{id}` | Programmatic wake (JSON) |
| `POST` | `/sandbox/{id}/snapshots` | Manual workspace snapshot |
| `GET` | `/sandbox/{id}/snapshots` | List on-disk snapshots |
| `POST` | `/sandbox/{id}/restore` | Restore snapshot to live `.img` |
| `POST` | `/sandbox/{id}/claim` | Attach upstream external identity |
| `POST` | `/sandbox/{id}/purge` | Irreversible teardown + disk delete |
| `POST` | `/external-users/{external_user_id}/purge` | Purge all sandboxes for user |
| `POST` | `/external-projects/{external_project_id}/purge` | Purge all sandboxes for project |
| `GET` | `/healthz` | Liveness (`ok`) |
| `GET` | `/readyz` | Readiness (SQLite + `docker info`) |
| `GET` | `/metrics` | Prometheus scrape (loopback-only when auth on) |
| `GET` | `/llm.txt` | Public integrator contract (no token) |
## Shared response shapes
### Sandbox row (`sandboxResp`)
Returned by create, list, and embedded in get as `row`:
| Field | Type | Notes |
| --- | --- | --- |
| `id` | string | ULID |
| `status` | string | `creating`, `running`, `stopped`, `error`, … |
| `image` | string | Base image tag |
| `workspace_img`, `workspace_mnt` | string | Host paths |
| `container_id`, `cgroup_path` | string | When running |
| `memory_high` | string | Soft cgroup limit (default `4G` on create) |
| `ports` | int[] | Exposed preview ports |
| `last_active_at`, `stopped_at`, `keepalive_until` | int64 | Unix seconds |
| `container_ip` | string | Bridge IP when running |
| `external_user_id`, `external_project_id`, `external_workspace_id` | string | Phase 8 identity |
| `visibility` | string | `public` (default) or `private` |
### GET `/sandbox/{id}` envelope (`getResp`)
```json
{
"row": { },
"live_state": { },
"runtime": { }
}
```
- `live_state` — Docker inspect JSON when container `s-{id}` exists; omitted on inspect miss.
- `runtime` — `runtimed` `GET /status` over `<workspace>.mnt/.runtimed/sock` (3s timeout). Absent when stopped or runtimed unreachable.
## Sandbox lifecycle endpoints
:::endpoint POST /sandbox
Create a sandbox: SQLite row (`creating` → provision workspace → `docker run` → `running`), Traefik labels, optional env injection.
<ParamField body="ports" type="int[]">
Preview ports (1–65535). Drives `s-{id}-{port}.preview.{domain}` routers.
</ParamField>
<ParamField body="id" type="string">
Optional ULID; auto-generated if omitted. Must be valid ULID when set.
</ParamField>
<ParamField body="memory_high" type="string">
Cgroup soft limit; default `4G`.
</ParamField>
<ParamField body="visibility" type="string">
`public` or `private`.
</ParamField>
<ParamField body="external" type="object">
`user_id` (required for strict multi-tenant; defaults to `"local"` if empty), optional `project_id`, `workspace_id`. IDs ≤256 chars, no control chars or commas.
</ParamField>
<ParamField body="template" type="string">
Golden template name (`[a-z0-9-]`, ≤64). Requires `SANDBOXD_TEMPLATES_DIR` and `{name}.img` on disk.
</ParamField>
<ParamField body="env" type="object">
Container env map (e.g. provider API keys). Keys must not contain `=` or newlines.
</ParamField>
<ParamField body="git_remote_url" type="string">
HTTPS remote for auto-git-push on task finish.
</ParamField>
<ResponseField name="body" type="sandboxResp">
`201 Created` with full row including `container_id` when successful.
</ResponseField>
**Errors:** `400` invalid JSON/ports/ULID/visibility/env/template; `409` row exists or `workspace_owner_mismatch` on id reuse; `503` memory admission refused (`Retry-After: 30`, `mem_available_percent` in body).
:::
<RequestExample>
```bash
curl -s -X POST http://127.0.0.1:9090/sandbox \
-H 'Content-Type: application/json' \
-d '{"ports":[3000],"env":{"ANTHROPIC_API_KEY":"sk-ant-..."}}'
```
</RequestExample>
:::endpoint GET /sandboxes
List all sandbox rows, or filter with query params `external_user_id` and/or `external_project_id`.
:::
:::endpoint GET /sandbox/{id}
Fetch one sandbox with optional `live_state` and `runtime`. `404` if no SQLite row.
:::
:::endpoint DELETE /sandbox/{id}
Stop/remove container `s-{id}`, release loopback mount, delete SQLite row. Workspace `.img` **remains** for id reuse. `204 No Content` on success.
:::
<Tip>
Use `DELETE /sandbox/{id}` before manual snapshot/restore while status is `running`; snapshot endpoints return `409` for a running row.
</Tip>
## Exec and keepalive
:::endpoint POST /sandbox/{id}/exec
Run a command inside the sandbox via `docker exec` (no TTY/stdin).
<ParamField body="cmd" type="string[]" required>
Argv passed to exec.
</ParamField>
<ParamField body="stream" type="boolean">
If true: `200` chunked `text/plain` with stdout, optional `---stderr---`, and `exit_code: N`. Default false returns JSON `execResp`.
</ParamField>
Bumps `last_active_at` on entry and exit; counts toward idle-reaper activity when `Inflight` is wired.
:::
<RequestExample>
```bash
curl -s -X POST http://127.0.0.1:9090/sandbox/$ID/exec \
-H 'Content-Type: application/json' \
-d '{"cmd":["bash","-lc","python3 -m http.server 3000"]}'
```
</RequestExample>
:::endpoint POST /sandbox/{id}/keepalive
Postpone idle stop until a wall-clock instant.
<ParamField body="until" type="int64" required>
Unix seconds; must be in the future. Capped at now + `KeepaliveMax` (default 24h).
</ParamField>
<ResponseExample>
```json
{"id":"01J...","keepalive_until":1717531200}
```
</ResponseExample>
:::
## Wake (JSON)
:::endpoint POST /wake/{id}
Start or confirm a stopped sandbox without a preview `Host` header. Delegates to `wake.Handler.ServeJSON` (same core as preview wake, JSON response shape).
<ResponseField name="body" type="object">
`200`: `{"id":"...","status":"running","wake_duration_ms":N}`.
</ResponseField>
**Errors:** `404` `{"error":"not_found"}`; `503` with `error` and optional `mem_available_percent` (admission); `Retry-After: 30` on admission denial.
:::
<Info>
Browser preview traffic uses the catch-all wake path (HTML warming page), not `POST /wake/{id}`. See the preview-routing and wake-idle-reapers pages.
</Info>
## Per-sandbox snapshots (`_snapshots/`)
Distinct from tenant-scoped `POST /v1/snapshots` library templates. These endpoints operate on `_snapshots/<id>/` zstd archives; they do **not** require a DB row for list/take (except running-row guard).
:::endpoint POST /sandbox/{id}/snapshots
Manual snapshot. `202 Accepted` with `ts` and `size_bytes` (compressed). `409` if status is `running`; `404` if no workspace `.img`; `503` if snapshot subsystem unset.
:::
:::endpoint GET /sandbox/{id}/snapshots
Array of `{ts, size_bytes, compressed_size_bytes, auto}` newest-first; `[]` if directory missing.
:::
:::endpoint POST /sandbox/{id}/restore
Body: `{"snapshot":"<YYYY-MM-DD-HHMMSS>"}` (timestamp from list). `200` with `size_bytes`, `restored_at`. `409` running; `404` unknown snapshot.
:::
## Claim and purge
:::endpoint POST /sandbox/{id}/claim
Move a sandbox (often legacy/back-filled) to a real upstream identity. Updates `sandbox` and durable `workspace_owner`.
<ParamField body="external_user_id" type="string" required>
New owner id (same validation as create `external`).
</ParamField>
<ResponseExample>
```json
{"id":"01J...","external_user_id":"user-42","claimed":true}
```
</ResponseExample>
:::
:::endpoint POST /sandbox/{id}/purge
Irreversible: stop/remove container, release loopback, delete workspace `.img`, `_snapshots/<id>/`, and SQLite `sandbox` + `workspace_owner` rows (per-id lock held throughout).
<ResponseExample>
```json
{"purged":true,"freed_bytes":12345678}
```
</ResponseExample>
:::
:::endpoint POST /external-users/{external_user_id}/purge
Purge every sandbox owned by that external user. Stops on first failure after auditing successful purges.
<ResponseField name="purged_count" type="int">
Number of sandboxes removed.
</ResponseField>
<ResponseField name="freed_bytes" type="int64">
Aggregated allocated disk freed.
</ResponseField>
:::
:::endpoint POST /external-projects/{external_project_id}/purge
Same as user purge, scoped by `external_project_id`.
:::
<Warning>
`DELETE /sandbox/{id}` preserves disk; `POST /sandbox/{id}/purge` deletes it. There is no undo for purge.
</Warning>
## Health, readiness, and metrics
:::endpoint GET /healthz
Always `200` with body `ok\n`. No dependency checks.
:::
:::endpoint GET /readyz
`200` + `ready\n` only if SQLite pings and `docker info` succeeds. Otherwise `503` JSON `{"error":"sqlite ping: ..."}` or `{"error":"docker info: ..."}`.
:::
:::endpoint GET /metrics
Prometheus exposition from `metrics.Registry` (API counters/histograms, sandbox gauges, reaper/wake metrics, docker timings). When auth middleware is active, external requests receive `404`; scrape from loopback on `SANDBOXED_API_BIND`.
:::
## GET /llm.txt — integrator contract
Public, tokenless API documentation for third-party integrators and coding agents.
| Property | Behavior |
| --- | --- |
| Path | `GET /llm.txt` |
| Auth | Listed in `exemptPaths`; no bearer required |
| Source file | Host path `SANDBOXD_LLM_TXT_PATH` (default `/etc/sandboxed/llm.txt`) |
| Cache | `Cache-Control: public, max-age=300` |
| Missing | `404` if env path empty or file absent |
| Reload | Read per request (no redeploy needed for edits) |
<Note>
The repository does not ship a default `llm.txt` file. Operators mount or copy the contract onto the host at the configured path. The v1 layer references sections of this contract (e.g. backpressure `Retry-After` on create `503`).
</Note>
<RequestExample>
```bash
curl -s http://127.0.0.1:9090/llm.txt
```
</RequestExample>
## Authentication quick reference
| Path class | Bearer required (auth enabled) |
| --- | --- |
| `/sandbox*`, `/wake/*`, purge, claim | Yes (external) |
| `/healthz`, `/readyz`, `/llm.txt` | No |
| `/metrics` | Loopback only |
| Direct `127.0.0.1:9090` curl | No (loopback operator path) |
Configure `SANDBOXD_API_TOKENS=name:secret` and `SANDBOXD_API_AUTH_DISABLED=false` for LAN or production exposure.
## Legacy vs v1
| Concern | Legacy | v1 |
| --- | --- | --- |
| Create | `POST /sandbox` | `POST /v1/sandboxes` |
| Stop (idle) | — | `POST /v1/sandboxes/{id}/stop` |
| Destroy (keep disk) | `DELETE /sandbox/{id}` | `DELETE /v1/sandboxes/{id}` |
| Purge disk | `POST /sandbox/{id}/purge` | Delegates to same purge handler |
| Tasks / files | — | `/v1/sandboxes/{id}/tasks`, `/files` |
| Reusable templates | `template` on create | `POST /v1/snapshots` library |
| Errors | `{"error":"..."}` | `{code, message, retryable}` |
```mermaid
flowchart LR
subgraph legacy [Legacy /sandbox*]
C[POST /sandbox]
E[POST /exec]
D[DELETE /sandbox/id]
P[POST /purge]
end
subgraph v1 [Public /v1]
V1C[POST /v1/sandboxes]
T[POST /tasks]
S[POST /stop]
end
C --> Store[(SQLite)]
V1C --> C
T --> Runtimed[runtimed UDS]
```
## Related pages
<CardGroup>
<Card title="v1 API reference" href="/v1-api-reference">
Public `/v1/sandboxes` surface, tasks, files, and structured errors — preferred for new integrations.
</Card>
<Card title="API authentication" href="/api-authentication">
Service tokens, loopback exemption, SIGHUP reload, and preview-token paths.
</Card>
<Card title="Sandbox operations" href="/sandbox-operations">
Operational workflows: exec, keepalive, stop, delete vs purge, claim.
</Card>
<Card title="Observability" href="/observability">
Health semantics, Prometheus labels, and compose log probes.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Preview wake HTML, idle reaper, memory admission, and keepalive interaction.
</Card>
</CardGroup>
---
## 14. v1 API reference
> Public /v1/sandboxes and /v1/snapshots: request/response shapes, error envelope (code, message, retryable), files CRUD, export zip, task lifecycle states, and template spin-up.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/14-v1-api-reference.md
- Generated: 2026-06-04T22:44:38.304Z
### Source Files
- `control-plane/internal/api/v1.go`
- `control-plane/internal/api/v1_tasks.go`
- `control-plane/internal/api/v1_files.go`
- `control-plane/internal/api/v1_files_write.go`
- `control-plane/internal/api/v1_snapshots.go`
- `control-plane/migrations/0005_tasks.sql`
- `control-plane/migrations/0009_snapshots.sql`
---
title: "v1 API reference"
description: "Public /v1/sandboxes and /v1/snapshots: request/response shapes, error envelope (code, message, retryable), files CRUD, export zip, task lifecycle states, and template spin-up."
---
The public `/v1` surface in `control-plane/internal/api/v1*.go` is a translation layer over the internal `POST /sandbox` machinery and the in-sandbox `runtimed` Unix socket. Integrators call `sandboxd` at `SANDBOXED_API_BIND` (default `http://127.0.0.1:9090`). When API auth is enabled, every `/v1/*` route expects `Authorization: Bearer <token>` except documented exemptions on loopback.
## Error envelope
Failed responses use a single JSON shape. HTTP status codes carry semantics; the `error.code` string is stable for programmatic handling.
```json
{
"error": {
"code": "not_found",
"message": "no such sandbox",
"retryable": false
}
}
```
<ResponseField name="error.code" type="string">
Machine-readable error identifier. Common values: `invalid_request`, `not_found`, `conflict`, `sandbox_capacity`, `internal`, plus explicit codes `task_in_progress` and `sandbox_unavailable`.
</ResponseField>
<ResponseField name="error.message" type="string">
Human-readable detail, often forwarded from the internal handler or runtimed.
</ResponseField>
<ResponseField name="error.retryable" type="boolean">
`true` only for HTTP `502` and `503`. On `503` capacity refusal, the server also sets `Retry-After: 30`.
</ResponseField>
| HTTP | Typical `error.code` | When |
|------|----------------------|------|
| 400 | `invalid_request` | Bad JSON, validation, path traversal, file size caps |
| 404 | `not_found` | Missing sandbox, task, snapshot, or directory (cross-tenant snapshots return 404) |
| 409 | `conflict` or `task_in_progress` | Wrong sandbox status, active task, running snapshot source |
| 413 | `invalid_request` | PUT body over 25 MiB |
| 502 | `sandbox_unavailable` | Cannot reach runtimed (socket absent, task stream/cancel failures) |
| 503 | `sandbox_capacity` | Memory admission refused on create/wake |
| 500 | `internal` | Store, Docker, or capture failures |
<Note>
Internal `/sandbox*` errors use a flat `{"error":"..."}` string. The v1 layer reshapes those bodies via `relayV1Error`; callers should only parse the v1 envelope on `/v1/*`.
</Note>
## Endpoint inventory
| Method | Path | Purpose |
|--------|------|---------|
| POST | `/v1/sandboxes` | Create (or return existing) project sandbox |
| GET | `/v1/sandboxes/{id}` | Get sandbox + live preview snapshot |
| POST | `/v1/sandboxes/{id}/stop` | Stop container (idle); idempotent if already stopped |
| DELETE | `/v1/sandboxes/{id}` | Full destroy (purge: container + workspace + row) |
| POST | `/v1/sandboxes/{id}/tasks` | Submit coding task (wake-on-submit) |
| GET | `/v1/sandboxes/{id}/tasks/{taskId}` | Canonical task result (SQLite-backed) |
| GET | `/v1/sandboxes/{id}/tasks/{taskId}/events` | SSE progress stream |
| POST | `/v1/sandboxes/{id}/tasks/{taskId}/cancel` | Cancel in-flight task |
| GET | `/v1/sandboxes/{id}/files` | List files under `workspace/app` |
| GET | `/v1/sandboxes/{id}/files/content` | Read one file (2 MiB cap) |
| PUT | `/v1/sandboxes/{id}/files` | Write opaque bytes into workspace mount |
| GET | `/v1/sandboxes/{id}/export` | Zip download of `workspace/app` |
| POST | `/v1/snapshots` | Capture stopped sandbox workspace image |
| GET | `/v1/snapshots` | List tenant snapshots |
| GET | `/v1/snapshots/{id}` | Get one snapshot |
| DELETE | `/v1/snapshots/{id}` | Delete snapshot image + row |
There is no `GET /v1/sandboxes` list route; discovery is by project idempotency on create or by retained sandbox id.
## Sandbox resource
The v1 sandbox object folds SQLite row state with a live preview probe from `runtimed` when the Unix socket answers (3s timeout).
<ResponseField name="id" type="string">ULID sandbox identifier; container name is `s-{id}`.</ResponseField>
<ResponseField name="status" type="string">Control-plane status: `creating`, `running`, `stopped`, or `error`.</ResponseField>
<ResponseField name="preview" type="object">Dev-server snapshot: `url`, `status` (`down` | `starting` | `ready` | `error`), optional `last_http_status`, `last_checked_at`, `build_error_message`.</ResponseField>
<ResponseField name="active_task_id" type="string">Set when runtimed reports an active task.</ResponseField>
<ResponseField name="template" type="string">Always `react-standard` in responses today (fixed v1 default label).</ResponseField>
<ResponseField name="git_remote_url" type="string">Assigned HTTPS push target for auto-git-push after tasks, if configured at create.</ResponseField>
<ResponseField name="created_at" type="string">RFC3339 UTC timestamp.</ResponseField>
<ResponseField name="updated_at" type="string">RFC3339 UTC timestamp.</ResponseField>
Preview URLs are synthesized as `https://s-{id}-3000.preview.{PREVIEW_DOMAIN}` (port 3000 is fixed on the v1 create path).
:::endpoint POST /v1/sandboxes
Create a durable sandbox for a project, or return the existing non-`error` sandbox for the same `project.id` (HTTP 200, idempotent).
<ParamField body="project.id" type="string" required>
External project identifier; drives idempotent reuse.
</ParamField>
<ParamField body="project.user_id" type="string" required>
External user identifier stored on the sandbox row.
</ParamField>
<ParamField body="visibility" type="string">
`public` (default) or `private`. Passed through to internal create.
</ParamField>
<ParamField body="template" type="string">
Golden template name (e.g. `react-standard`). Resolved to `{SANDBOXD_TEMPLATES_DIR}/{name}.img`. Default when omitted: `react-standard`. Mutually exclusive with `from_snapshot`.
</ParamField>
<ParamField body="from_snapshot" type="string">
Snapshot ULID owned by the API tenant. Source image must be `ready`. Workspace is cloned from the snapshot `.img` via internal `template_path`. Mutually exclusive with `template`.
</ParamField>
<ParamField body="git_remote_url" type="string">
HTTPS remote for post-task workspace push (not a secret; host holds credentials).
</ParamField>
<RequestExample>
```bash
curl -s -X POST http://127.0.0.1:9090/v1/sandboxes \
-H 'Content-Type: application/json' \
-d '{
"project": {"id": "proj_01", "user_id": "user_01"},
"template": "react-standard",
"visibility": "public"
}'
```
</RequestExample>
<ResponseExample>
```json
{
"id": "01JABCDEFGHJKMNPQRSTVWXYZ0",
"status": "running",
"preview": {
"url": "https://s-01JABCDEFGHJKMNPQRSTVWXYZ0-3000.preview.localhost",
"status": "starting"
},
"template": "react-standard",
"created_at": "2026-06-04T12:00:00Z",
"updated_at": "2026-06-04T12:00:05Z"
}
```
</ResponseExample>
Returns `201 Created` on first create, `200 OK` when reusing an existing project sandbox. Delegates to `POST /sandbox` with `ports: [3000]` and `external` tags. Capacity refusal surfaces as `503` / `sandbox_capacity` with `Retry-After: 30`.
:::
:::endpoint GET /v1/sandboxes/{id}
Fetch one sandbox. Works regardless of container run state; preview fields reflect runtimed when reachable, else `preview.status` is `down` or `starting` (running sandbox, socket not up yet).
:::
:::endpoint POST /v1/sandboxes/{id}/stop
Stop the Docker container and mark the sandbox `stopped`. Idempotent when already stopped.
<Warning>
Returns `409` / `task_in_progress` if runtimed reports an active task. Cancel the task first.
</Warning>
Non-`running` sandboxes (except already `stopped`) return `409` / `conflict`.
:::
:::endpoint DELETE /v1/sandboxes/{id}
Full destroy: delegates to internal `POST /sandbox/{id}/purge` (container, workspace image, SQLite row). Success is `204 No Content`. This is not the soft internal `DELETE` that preserves workspace images for id reuse.
:::
## Template spin-up
v1 create always exposes port **3000** and seeds the workspace in one of three ways:
```text
POST /v1/sandboxes
│
├─ from_snapshot ──► authorize tenant snapshot ──► template_path = library/{id}.img
│
├─ template name ──► {SANDBOXD_TEMPLATES_DIR}/{name}.img (default: react-standard)
│
└─ (neither) ───────► internal empty provision + skeleton seed
```
<ParamField body="template" type="string">
Name must match `^[a-z0-9-]{1,64}
sandboxed Documentation · Grok Docs
. Host must have `SANDBOXD_TEMPLATES_DIR` set; unknown names fail before row creation.
</ParamField>
<ParamField body="from_snapshot" type="string">
Uses `ProvisionFromTemplate` with a raw `.img` under `LibraryRoot`. Snapshot must belong to the authenticated API token's tenant (`owner_token` = token name). Cross-tenant IDs return `404`.
</ParamField>
<Tip>
Fast cold start: prebuilt golden `.img` templates skip scaffold/install on the create hot path. Snapshots are independent ext4 copies—deleting a snapshot never affects sandboxes already cloned from it.
</Tip>
## Tasks
Task submission wakes a **stopped** sandbox first (internal wake JSON), then calls `runtimed` `POST /tasks`. Only agent **`opencode`** is accepted on v1 (default when `agent` omitted). One active task per sandbox.
:::endpoint POST /v1/sandboxes/{id}/tasks
<ParamField body="prompt" type="string" required>Coding instruction for the agent.</ParamField>
<ParamField body="agent" type="string">Must be `opencode` if set.</ParamField>
<ResponseExample>
```json
{
"id": "01JTASKULIDEXAMPLE000000001",
"sandbox_id": "01JABCDEFGHJKMNPQRSTVWXYZ0",
"status": "running",
"agent": "opencode",
"events_url": "/v1/sandboxes/01JABCDEFGHJKMNPQRSTVWXYZ0/tasks/01JTASKULIDEXAMPLE000000001/events"
}
```
</ResponseExample>
HTTP **202 Accepted**. A background watcher persists the terminal `done` event into SQLite (`task` table, migration `0005_tasks.sql`) so results survive stop and destroy.
:::
:::endpoint GET /v1/sandboxes/{id}/tasks/{taskId}
While `status` is `running` or `result_json` is unset, returns a minimal `{"id","sandbox_id","status":"running"}`. When finished, returns the full `runtime.TaskResult` plus `sandbox_id` (promoted JSON fields).
:::
:::endpoint GET /v1/sandboxes/{id}/tasks/{taskId}/events
Server-Sent Events (`text/event-stream`). Resume with `Last-Event-ID` (next event is `id+1`) or query `?since=N`. Each event:
```text
id: 42
event: message
data: {"..."}
```
Event types from runtimed: `status`, `message`, `tool` (best-effort), `build`, and terminal `done` (carries `TaskResult` JSON). Stream ends after `done`.
:::
:::endpoint POST /v1/sandboxes/{id}/tasks/{taskId}/cancel
Proxies to runtimed cancel. Response: `{"id":"<taskId>","status":"cancelling"}`.
:::
### Task lifecycle states
SQLite `task.status` and `TaskResult.status` use the same vocabulary:
| Status | Meaning |
|--------|---------|
| `running` | Accepted; watcher or runtimed still in progress |
| `succeeded` | Terminal success (`done` event persisted) |
| `failed` | Terminal failure (agent, build, watcher, or reconcile) |
| `cancelled` | User-cancelled terminal state |
```mermaid
stateDiagram-v2
[*] --> running: POST /tasks (202)
running --> succeeded: done event (build/preview OK)
running --> failed: done or watcher failure
running --> cancelling: POST .../cancel
cancelling --> cancelled: runtimed terminal
cancelling --> failed: abnormal end
succeeded --> [*]
failed --> [*]
cancelled --> [*]
```
`TaskResult` fields (canonical outcome—no event replay required):
<ResponseField name="status" type="string">Terminal status.</ResponseField>
<ResponseField name="failure_reason" type="string">e.g. `sandbox_unavailable`, `internal` on watcher failures.</ResponseField>
<ResponseField name="files_changed" type="string[]">Paths touched by the agent.</ResponseField>
<ResponseField name="build_ok" type="boolean">`pnpm build` outcome.</ResponseField>
<ResponseField name="build_error_message" type="string">Build stderr summary when `build_ok` is false.</ResponseField>
<ResponseField name="preview_status_after" type="string">`down` | `starting` | `ready` | `error` after post-task health.</ResponseField>
<ResponseField name="preview_error_message" type="string">Live dev-server error when build passed but preview fails.</ResponseField>
<ResponseField name="tokens" type="object">`input`, `output`, `reasoning`, cache fields, `total`, `cost`.</ResponseField>
<ResponseField name="duration_ms" type="number">Wall time.</ResponseField>
<Info>
`GET /v1/.../tasks/{id}` reads SQLite only—safe after `DELETE /v1/sandboxes/{id}`. The full event log lives under `.runtimed/` in the workspace and is **not** retained past destroy.
</Info>
On `sandboxd` restart, `ReconcileTasks` finalizes stuck `running` rows from runtimed `result.json`, re-attaches watchers, or marks `failed` / `sandbox_unavailable`.
## Files and export
File **reads** and **export** root at `workspace/app` on the host loopback mount. They work whether or not the container is running.
| Operation | Root | Limits / exclusions |
|-----------|------|---------------------|
| `GET .../files` | `workspace/app` | Query `path` (default `""`), `recursive=true` optional |
| `GET .../files/content` | `workspace/app` | Query `path`; 2 MiB max per file; `text/plain` body |
| `GET .../export` | `workspace/app` | `application/zip`; attachment `{id}.zip` |
| `PUT .../files` | **Workspace mount root** | Query `path` relative to mount; 25 MiB max |
Excluded from list/read/export: `node_modules`, `.git`, `dist`, `.vite`.
:::endpoint GET /v1/sandboxes/{id}/files
<ResponseExample>
```json
{
"path": "src",
"recursive": false,
"entries": [
{"path": "src/App.tsx", "type": "file", "size": 412},
{"path": "src/components", "type": "dir"}
]
}
```
</ResponseExample>
:::
:::endpoint PUT /v1/sandboxes/{id}/files?path={rel}
Raw request body bytes (opaque). Atomic write: temp file in-target directory, `O_NOFOLLOW`, rename. Refuses `.runtimed/`, `lost+found/`, `..` segments, directory paths, symlinks at leaf. `chown` to workspace owner when known.
<ResponseExample>
```json
{"path": "AGENTS.md", "size": 1024}
```
</ResponseExample>
Use this to inject `AGENTS.md`, `CLAUDE.md`, `opencode.json`, or other agent config at the workspace root—not only under `app/`.
:::
:::endpoint GET /v1/sandboxes/{id}/export
Streams a zip of all files under `workspace/app` (same exclusions). No JSON wrapper.
:::
## Snapshots
Snapshots freeze a **stopped** sandbox workspace `.img` into `LibraryRoot/{id}.img`. Ownership is the API token name (`auth.Actor.Name`), not `external user_id`.
<ResponseField name="id" type="string">ULID.</ResponseField>
<ResponseField name="name" type="string">Caller-chosen label.</ResponseField>
<ResponseField name="status" type="string">`ready` or `error` (v1 create path sets `ready` synchronously).</ResponseField>
<ResponseField name="source_sandbox_id" type="string">Provenance.</ResponseField>
<ResponseField name="base_image" type="string">Captured sandbox image name (recorded, not pinned on spin-up).</ResponseField>
<ResponseField name="visibility" type="string">`private` in v1.</ResponseField>
<ResponseField name="size_bytes" type="number">Allocated on-disk bytes (sparse-aware).</ResponseField>
:::endpoint POST /v1/snapshots
<ParamField body="source_sandbox_id" type="string" required>Must exist; must **not** be `running`.</ParamField>
<ParamField body="name" type="string" required>Display name for the library entry.</ParamField>
Capture holds the source sandbox id-lock, copies with `cp --reflink=auto --sparse=always`, then inserts the row. Returns `201` with the snapshot object. Requires `LibraryRoot` configured; otherwise `503` / `internal`.
:::
:::endpoint GET /v1/snapshots
Returns `{"snapshots":[...]}` for the current tenant token only.
:::
:::endpoint GET /v1/snapshots/{id}
Tenant-scoped get; cross-tenant id returns `404`.
:::
:::endpoint DELETE /v1/snapshots/{id}
Removes image file and row; `204` on success. Cloned sandboxes are unaffected.
:::
Wire snapshot into new sandboxes with `from_snapshot` on `POST /v1/sandboxes`.
## Architecture seam
```mermaid
sequenceDiagram
participant Client
participant sandboxd as sandboxd /v1
participant Internal as POST /sandbox
participant RT as runtimed (Unix socket)
participant SQL as SQLite task/snapshot
Client->>sandboxd: POST /v1/sandboxes
sandboxd->>Internal: delegate create
Internal-->>sandboxd: 201 + id
sandboxd-->>Client: v1 Sandbox
Client->>sandboxd: POST /v1/.../tasks
alt sandbox stopped
sandboxd->>Internal: wake
end
sandboxd->>RT: StartTask
sandboxd->>SQL: CreateTask running
sandboxd-->>Client: 202 + events_url
loop watchTask goroutine
RT-->>sandboxd: SSE/NDJSON done
sandboxd->>SQL: FinishTask + result_json
end
```
<Info>
The machine-readable integrator contract is also published at `GET /llm.txt` when `SANDBOXD_LLM_TXT_PATH` is configured on the host.
</Info>
## Related pages
<CardGroup>
<Card title="API authentication" href="/api-authentication">
Bearer tokens, auth disable rollback, and loopback exemptions for `/v1/*`.
</Card>
<Card title="Run coding agents" href="/run-coding-agents">
Wake-on-submit, SSE streaming, and env injection at create.
</Card>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
SQLite status machine and destroy vs purge semantics behind v1 delete.
</Card>
<Card title="Control plane API (legacy)" href="/legacy-api-reference">
Internal `/sandbox*` routes this layer delegates to.
</Card>
<Card title="runtimed reference" href="/runtimed-reference">
In-sandbox supervisor protocol under `.runtimed/sock`.
</Card>
<Card title="Quickstart" href="/quickstart">
Shortest create → task → preview curl flow.
</Card>
</CardGroup>
---
## 15. Configuration reference
> Compose-backed env keys: preview domain/ports, SANDBOXED_DATA_DIR, API bind, auth tokens, idle/reaper/memory wake tuning, templates/library paths, and advanced cgroup toggles.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/15-configuration-reference.md
- Generated: 2026-06-04T22:45:17.995Z
### Source Files
- `.env.example`
- `docker-compose.yml`
- `control-plane/cmd/sandboxd/main.go`
- `control-plane/internal/auth/config.go`
- `README.md`
- `control-plane/README.md`
---
title: "Configuration reference"
description: "Compose-backed env keys: preview domain/ports, SANDBOXED_DATA_DIR, API bind, auth tokens, idle/reaper/memory wake tuning, templates/library paths, and advanced cgroup toggles."
---
Runtime configuration for the sandboxed stack is a single host-level `.env` file (bootstrapped from `.env.example` by `install.sh`) that `docker compose` substitutes into `docker-compose.yml` and passes into the `sandboxd` container as process environment variables. Traefik reads compose-level keys (`HTTP_PORT`, log mounts); `sandboxd` reads the rest at startup and on `SIGHUP` for auth-only reload.
```text
.env ──substitute──► docker-compose.yml
│
┌───────────────┴───────────────┐
▼ ▼
traefik sandboxd
(ports, log mount) (all SANDBOXD_* / SANDBOXED_* env)
```
<Note>
Changing `.env` requires `docker compose up -d` (or `restart sandboxd`) for most keys. Auth token rotation can use `SIGHUP` on `sandboxd` when `SANDBOXD_ENV_FILE` is configured — see [API authentication](/api-authentication).
</Note>
## Compose-backed keys (`.env.example`)
These keys are documented in `.env.example`, wired in `docker-compose.yml`, and are the knobs most operators touch.
### Preview routing
| Variable | Default | Scope | Effect |
|---|---|---|---|
| `PREVIEW_DOMAIN` | `localhost` | Traefik + sandboxd | Host suffix for preview URLs: `s-{id}-{port}.preview.{domain}` |
| `PREVIEW_ENTRYPOINT` | `web` | sandboxd | Traefik entrypoint name on per-sandbox Docker labels |
| `PREVIEW_TLS` | `false` | sandboxd | When `true`, emits `traefik.http.routers.*.tls=true` on sandbox labels |
| `HTTP_PORT` | `80` | Traefik (compose) | Host port mapped to Traefik `:80`; append `:{port}` to preview URLs when not 80 |
Browsers resolve `*.localhost` to `127.0.0.1`, so the default stack needs no DNS or certificates. For a public wildcard domain, set `PREVIEW_DOMAIN` to your apex, enable Traefik `websecure` + a cert resolver in `traefik/traefik.yml`, uncomment the `HTTPS_PORT` mapping in compose, and set `PREVIEW_ENTRYPOINT=websecure` and `PREVIEW_TLS=true`.
### Images and Docker network
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXED_IMAGE` | `sandboxed-base:1.0.0` | Base image for per-sandbox containers (compose maps to `SANDBOXD_IMAGE` inside sandboxd) |
| `SANDBOXED_NETWORK` | `sandboxed_net` | Docker network name; must match Traefik's docker provider `network` in `traefik/traefik.yml` |
### Storage and API publish
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXED_DATA_DIR` | `/var/lib/sandboxed` | **Absolute** host path bind-mounted symmetrically into sandboxd; holds workspaces, SQLite, templates, library |
| `SANDBOXED_LOG_DIR` | `/var/lib/sandboxed/log` | Shared log directory (Traefik access log + tailer checkpoint) |
| `SANDBOXED_API_BIND` | `127.0.0.1:9090` | Host `host:port` published to sandboxd internal `:9000` |
Derived layout under `SANDBOXED_DATA_DIR` (created by sandboxd):
```text
{SANDBOXED_DATA_DIR}/
workspaces/<ulid>/ # per-sandbox bind mount → /home/sandbox
state/sandboxd.db # SQLite (override path with SANDBOXD_DB)
templates/ # golden templates (unless SANDBOXD_TEMPLATES_DIR set)
library/ # snapshot spin-up images (unless SANDBOXD_LIBRARY_DIR set)
_snapshots/ # manual snapshot storage root
```
<Warning>
`SANDBOXED_DATA_DIR` must stay an absolute path with the same value on host and inside the sandboxd container. The OSS build stores workspaces as plain directories with **no** hard per-workspace disk quota.
</Warning>
### API authentication (compose defaults)
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXD_API_AUTH_DISABLED` | `true` | When `true`/`1`/`yes`/`on`, external API calls skip bearer checks. Set `false` and supply tokens for production |
| `SANDBOXD_API_TOKENS` | *(empty)* | Comma-separated `name:secret` pairs; clients send `Authorization: Bearer <secret>` |
With auth enabled (`SANDBOXD_API_AUTH_DISABLED=false`) and an empty token list, every non-loopback API call returns 401 while loopback still works.
### Resource policy (compose)
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXD_IDLE_THRESHOLD_SECONDS` | `2100` | Seconds without activity before idle reaper runs `docker stop` (workspace preserved) |
| `SANDBOXED_SET_MEMORY_HIGH` | `false` | When `true`, sandboxd writes cgroup v2 `memory.high` after start/wake/reconcile (needs host cgroup visibility) |
Per-sandbox hard limits are fixed in create logic (`--memory=10g`, `--memory-swap=10g`, `memory_high` request default `4G` when unset) — not `.env` tunables.
## How compose maps into sandboxd
`docker-compose.yml` passes a subset of `.env` into the `sandboxd` service `environment` block. Host-only keys (`HTTP_PORT`, `SANDBOXED_API_BIND`) affect compose port mappings only.
| `.env` key | Container env | Notes |
|---|---|---|
| `SANDBOXED_IMAGE` | `SANDBOXD_IMAGE` | Name differs: compose uses `SANDBOXED_*`, process reads `SANDBOXD_IMAGE` |
| `SANDBOXED_LOG_DIR` | `SANDBOXD_ACCESS_LOG` | Set to `{SANDBOXED_LOG_DIR}/traefik-access.log` |
| `SANDBOXED_USERNS` | `SANDBOXED_USERNS` | Not in `.env.example`; compose default `host` — see [Workspaces and isolation](/workspaces-persistence) |
Infrastructure containers (`traefik`, `sandboxd`) use `userns_mode: host` in compose so they work on daemons with or without `userns-remap`. Sandboxes inherit `SANDBOXED_USERNS` (default `host`) on `docker run`.
## Auth and private previews (sandboxd env)
Not all auth keys appear in `.env.example`; add them to the `sandboxd` `environment` section (or a mounted env file) when needed.
<ParamField body="SANDBOXD_PREVIEW_TOKEN_SECRETS" type="string">
Comma-separated `kid:secret` map for preview JWT signing (private sandboxes).
</ParamField>
<ParamField body="SANDBOXD_AUTH_REDIRECT_URL" type="string">
Template URL for `/preview-auth` redirects; supports `{sandbox_id}` and `{return}` placeholders.
</ParamField>
<ParamField body="SANDBOXD_FORWARD_AUTH_DENY_MODE" type="string" default="redirect">
How `/forward-auth` denies unauthenticated viewers (`redirect` or `meta-refresh`).
</ParamField>
<ParamField body="SANDBOXD_ENV_FILE" type="string" default="/etc/sandboxed/sandboxd.env">
Systemd-style `KEY=value` file re-read on `SIGHUP` for token rotation without restart.
</ParamField>
Private sandboxes attach Traefik `sandbox-preview-auth@file` (see `traefik/dynamic/auth.yml`), which calls `http://sandboxd:9000/forward-auth`.
## Idle, wake, and memory pressure
These variables are read in `cmd/sandboxd/main.go` at startup. Only `SANDBOXD_IDLE_THRESHOLD_SECONDS` is in the stock compose file; add others under `sandboxd.environment` to tune behavior.
### Idle reaper
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXD_IDLE_THRESHOLD_SECONDS` | `2100` | No activity → candidate for `docker stop` |
| `SANDBOXD_IDLE_REAP_INTERVAL_SECONDS` | `30` | Tick interval for idle scan |
| `SANDBOXD_WAKE_GRACE_SECONDS` | `60` | After last access-log activity, grace before idle stop (widens when connection poller is in fallback mode) |
Activity is driven primarily by the Traefik access-log tailer (`SANDBOXD_ACCESS_LOG`, default `{SANDBOXED_LOG_DIR}/traefik-access.log`) and optionally an open-connection poller.
### Host memory pressure reaper
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXD_PRESSURE_INTERVAL_SECONDS` | `10` | Tick interval; `<= 0` disables the loop |
| `SANDBOXD_MEM_HEADROOM_PCT` | `15` | Healthy floor — startup runs one synchronous tick if below |
| `SANDBOXD_MEM_REFUSE_WAKES_PCT` | `10` | Wake admission refuses new starts below this MemAvailable % |
| `SANDBOXD_MEM_EMERGENCY_PCT` | `5` | Emergency band — stops sandboxes by RSS / idle ranking |
### Wake admission and keepalive
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXD_WAKE_COST_MB` | `800` | Estimated RAM cost per wake for admission math |
| `SANDBOXD_WAKE_TCP_READY_TIMEOUT_SECONDS` | `8` | Wait for preview port TCP after `docker start` |
| `SANDBOXD_KEEPALIVE_MAX_SECONDS` | `86400` | Upper bound for `POST .../keepalive` extensions |
Stopped sandboxes wake on preview hits via the Traefik file-provider catch-all (`traefik/dynamic/wake.yml`, priority 1) → `http://sandboxd:9000`.
### Activity poller (optional)
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXD_POLLER_METRIC_RE` | *(unset)* | Regex matching a Traefik open-connection metric name; unset → fallback mode (access log only) |
| `SANDBOXD_POLLER_URL` | `http://127.0.0.1:8082/metrics` | Metrics scrape URL |
| `SANDBOXD_POLLER_INTERVAL_SECONDS` | `15` | Scrape interval |
| `SANDBOXD_POLLER_SERVICE_LABEL` | `service` | Label name on matched metric series |
## Templates and library paths
| Variable | Default | Effect |
|---|---|---|
| `SANDBOXD_TEMPLATES_DIR` | `{SANDBOXED_DATA_DIR}/templates` | Directory of golden template workspaces (`template` field on create) |
| `SANDBOXD_LIBRARY_DIR` | `{SANDBOXED_DATA_DIR}/library` | Allowed root for `template_path` snapshot spin-up |
Create accepts `template` (name under `SANDBOXD_TEMPLATES_DIR`) or `template_path` (must resolve under library or templates roots). If `SANDBOXD_TEMPLATES_DIR` is unset/empty, named templates return 503.
Snapshot-related defaults (manual API; auto-snapshotter disabled in OSS):
| Variable | Default |
|---|---|
| `SANDBOXD_SNAPSHOT_RETENTION_DAYS` | `7` |
| `SANDBOXD_SNAPSHOT_IDLE_HOURS` | `24` |
## Advanced cgroup and container toggles
<ParamField body="SANDBOXED_SET_MEMORY_HIGH" type="boolean" default="false">
After `docker run` / wake / reconcile, write `memory.high` to `/sys/fs/cgroup/.../memory.high` using the sandbox row's value (create default `4G`). Failures are logged but do not fail create — hard `--memory=10g` still applies.
</ParamField>
<ParamField body="SANDBOXED_USERNS" type="string" default="host">
Passed as `docker run --userns` for sandboxes and the workspace seed container. Set empty to use the daemon default (see ARCHITECTURE.md userns-remap note).
</ParamField>
Create-time API field `memory_high` (JSON) sets the per-sandbox soft throttle stored in SQLite and re-applied on wake when `SANDBOXED_SET_MEMORY_HIGH=true`.
## Other sandboxd environment variables
| Variable | Default | Purpose |
|---|---|---|
| `SANDBOXD_ADDR` | `0.0.0.0:9000` | Listen address inside the container (compose publishes via `SANDBOXED_API_BIND`) |
| `SANDBOXD_DB` | `{dataDir}/state/sandboxd.db` | SQLite DSN path |
| `SANDBOXD_MIGRATIONS` | `/usr/local/share/sandboxd/migrations` | SQL migration directory |
| `SANDBOXD_TAILER_OFFSET` | `{dataDir}/state/traefik-tail.offset` | Access-log tailer checkpoint |
| `SANDBOXD_DEBUG` | *(unset)* | Any non-empty value sets log level to debug |
| `SANDBOXD_LLM_TXT_PATH` | `/etc/sandboxed/llm.txt` | Path for `GET /llm.txt` integrator contract |
| `SANDBOXD_GIT_TOKEN_PATH` | `/etc/sandboxed/git/token` | Host credential file for HTTPS git clone |
Subsystems **disabled** in the OSS compose build (code retained; enabling requires extra host services):
| Variable | Purpose |
|---|---|
| `SANDBOXED_NGINX_WATCH_PATHS` / `SANDBOXED_NGINX_CONTAINER` | Hot-reload a host nginx registry proxy |
| Egress manager | nftables/journald — always `nil` in portable build |
## Traefik static paths (not `.env`)
`traefik/traefik.yml` hardcodes the docker provider constraint `Label(\`sandboxed.managed\`,\`true\`)` and default access log path `/var/lib/sandboxed/log/traefik-access.log`. Keep `SANDBOXED_LOG_DIR` aligned with that mount, or edit the static config to match your log dir.
File-provider routes in `traefik/dynamic/`:
| File | Role |
|---|---|
| `wake.yml` | Catch-all `HostRegexp` for stopped-sandbox wake (priority 1) |
| `api.yml` | Optional `api.preview.*` → sandboxd (loopback bind remains primary) |
| `auth.yml` | Forward-auth middleware for `visibility=private` |
## Example `.env` profiles
<Tabs>
<Tab title="Local (default)">
```bash
PREVIEW_DOMAIN=localhost
HTTP_PORT=80
SANDBOXED_DATA_DIR=/var/lib/sandboxed
SANDBOXED_API_BIND=127.0.0.1:9090
SANDBOXD_API_AUTH_DISABLED=true
SANDBOXD_IDLE_THRESHOLD_SECONDS=2100
```
</Tab>
<Tab title="Port 80 busy">
```bash
HTTP_PORT=8088
# Preview: http://s-<id>-<port>.preview.localhost:8088
```
</Tab>
<Tab title="LAN API + auth">
```bash
SANDBOXED_API_BIND=0.0.0.0:9090
SANDBOXD_API_AUTH_DISABLED=false
SANDBOXD_API_TOKENS=integrator:your-secret-here
```
</Tab>
<Tab title="Production TLS">
```bash
PREVIEW_DOMAIN=yourdomain.com
PREVIEW_ENTRYPOINT=websecure
PREVIEW_TLS=true
SANDBOXD_API_AUTH_DISABLED=false
SANDBOXD_API_TOKENS=prod:secret
# Also: traefik websecure entrypoint + cert resolver, HTTPS_PORT in compose
```
</Tab>
</Tabs>
## Apply and verify
<Steps>
<Step title="Edit configuration">
Copy `.env.example` to `.env` if missing, then set keys. For sandboxd-only variables, extend the `sandboxd.environment` list in `docker-compose.yml`.
</Step>
<Step title="Recreate the stack">
```bash
docker compose up -d
```
`install.sh` is idempotent and loads `.env` before creating `SANDBOXED_DATA_DIR` and building images.
</Step>
<Step title="Check health and binding">
```bash
curl -s http://127.0.0.1:9090/healthz # ok
curl -s http://127.0.0.1:9090/readyz # ready
```
Use your actual `SANDBOXED_API_BIND` host:port if changed.
</Step>
<Step title="Confirm preview URL shape">
After creating a sandbox on port 3000, open
`http://s-{id}-3000.preview.{PREVIEW_DOMAIN}`
and append `:{HTTP_PORT}` when `HTTP_PORT` ≠ 80.
</Step>
</Steps>
<Check>
After changing auth tokens in a mounted env file, send `SIGHUP` to the sandboxd process (or `docker compose restart sandboxd`) and watch logs for `reload: auth config reloaded`.
</Check>
## Related pages
<CardGroup>
<Card title="Installation" href="/installation">
Bootstrap `.env`, build images, and bring up compose with health checks.
</Card>
<Card title="Preview routing" href="/preview-routing">
Traefik labels, wake catch-all priority, and `PREVIEW_DOMAIN` host rules.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
How idle threshold, pressure bands, and wake admission interact at runtime.
</Card>
<Card title="API authentication" href="/api-authentication">
Bearer tokens, loopback exemptions, `SIGHUP` reload, and LAN exposure risks.
</Card>
<Card title="Production deployment" href="/production-deployment">
Wildcard DNS, `websecure`, TLS store, and hardening checklist.
</Card>
<Card title="Preview URL reference" href="/preview-url-reference">
Exact hostname pattern and `HTTP_PORT` suffix rules.
</Card>
</CardGroup>
---
## 16. Preview URL reference
> Hostname pattern s-{ulid}-{port}.preview.{PREVIEW_DOMAIN}, HTTP_PORT suffix rules, localhost vs production HTTPS, and Traefik router/service naming.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/16-preview-url-reference.md
- Generated: 2026-06-04T22:45:21.500Z
### Source Files
- `control-plane/internal/traefik/traefik.go`
- `traefik/dynamic/wake.yml`
- `.env.example`
- `control-plane/internal/api/v1.go`
- `README.md`
---
title: "Preview URL reference"
description: "Hostname pattern s-{ulid}-{port}.preview.{PREVIEW_DOMAIN}, HTTP_PORT suffix rules, localhost vs production HTTPS, and Traefik router/service naming."
---
Each sandbox with exposed ports gets a deterministic preview hostname `s-{id}-{port}.preview.{PREVIEW_DOMAIN}`; Traefik registers one router and one service per port using the same name `s-{id}-{port}`, and traffic reaches the process listening on that port inside the container `s-{id}`.
## Hostname pattern
| Segment | Value | Notes |
|---|---|---|
| Prefix | `s-` | Fixed; matches container name `s-{id}` |
| Sandbox id | ULID | Auto-generated when `id` is omitted on create; custom `id` must pass ULID validation |
| Port | `1`–`65535` | Must be listed in `ports` at create; one hostname per exposed port |
| Preview segment | `.preview.` | Fixed literal between id-port and domain |
| Domain | `PREVIEW_DOMAIN` | Default `localhost`; set to your apex domain in production |
**Canonical hostname (no scheme, no Traefik host port):**
```text
s-{id}-{port}.preview.{PREVIEW_DOMAIN}
```
Examples with defaults (`PREVIEW_DOMAIN=localhost`, port `3000`):
```text
s-01ARZ3NDEKTSV4RRFFQ69G5FAV-3000.preview.localhost
```
<Note>
The port in the hostname is the **application port inside the sandbox** (from `POST /sandbox` `ports`), not `HTTP_PORT`. `HTTP_PORT` is only how clients reach Traefik on the host.
</Note>
## Full URL shape
| Mode | Scheme | Host | Host port suffix |
|---|---|---|---|
| Local OSS default | `http` | `s-{id}-{port}.preview.localhost` | Append `:HTTP_PORT` when `HTTP_PORT` ≠ `80` |
| Production TLS | `https` | `s-{id}-{port}.preview.{yourdomain.com}` | Usually omitted (`443`) after enabling `websecure` |
**Local (default `.env`):**
```text
http://s-{id}-{port}.preview.localhost
http://s-{id}-{port}.preview.localhost:8088 # when HTTP_PORT=8088
```
**Production (README “Production / TLS”):**
```text
https://s-{id}-{port}.preview.yourdomain.com
```
Browsers resolve `*.localhost` to `127.0.0.1`, so local previews need no DNS or certificates. A public deployment requires wildcard DNS for `*.preview.{PREVIEW_DOMAIN}`, `PREVIEW_ENTRYPOINT=websecure`, `PREVIEW_TLS=true`, and a wildcard cert in Traefik’s default TLS store (no per-host ACME on each sandbox router).
## Configuration keys
<ParamField body="PREVIEW_DOMAIN" type="string" default="localhost">
Apex domain preview hostnames hang off. Wake/auth regexes validate this exact domain; the file-provider wake catch-all uses a domain-agnostic `HostRegexp` so you do not edit `wake.yml` when changing domain.
</ParamField>
<ParamField body="HTTP_PORT" type="integer" default="80">
Host port published for Traefik’s `web` entrypoint (`${HTTP_PORT}:80` in compose). Omit `:port` in browser URLs when `80`; otherwise append `:${HTTP_PORT}` to the authority (install summary uses the same rule).
</ParamField>
<ParamField body="PREVIEW_ENTRYPOINT" type="string" default="web">
Traefik entrypoint name on per-sandbox Docker labels (`traefik.http.routers.{name}.entrypoints`). Use `websecure` with TLS deployments.
</ParamField>
<ParamField body="PREVIEW_TLS" type="boolean" default="false">
When `true`, sandboxd emits `traefik.http.routers.{name}.tls=true` on each preview router. Routers do not set a per-router cert resolver; Traefik serves `*.preview.{domain}` from the shared default TLS store.
</ParamField>
## Traefik router and service naming
For each entry in `ports` at create, `traefik.Labels` in `control-plane/internal/traefik/traefik.go` emits matching **router** and **service** names:
```text
Router name: s-{id}-{port}
Service name: s-{id}-{port} (same string)
Host rule: Host(`s-{id}-{port}.preview.{PREVIEW_DOMAIN}`)
Priority: 100
Backend port: traefik.http.services.s-{id}-{port}.loadbalancer.server.port={port}
```
Shared labels on every routed sandbox:
| Label | Purpose |
|---|---|
| `traefik.enable=true` | Expose container to Traefik (with `exposedByDefault: false`) |
| `sandboxed.managed=true` | Docker provider constraint — only routes stack-owned sandboxes |
**Per-port label set (HTTP, public):**
```text
traefik.http.routers.s-{id}-{port}.rule=Host(`s-{id}-{port}.preview.{domain}`)
traefik.http.routers.s-{id}-{port}.entrypoints={PREVIEW_ENTRYPOINT}
traefik.http.routers.s-{id}-{port}.priority=100
traefik.http.services.s-{id}-{port}.loadbalancer.server.port={port}
```
With `PREVIEW_TLS=true`, add `traefik.http.routers.s-{id}-{port}.tls=true`. With `visibility=private`, add `traefik.http.routers.s-{id}-{port}.middlewares=sandbox-preview-auth@file`.
<Warning>
If `ports` is empty or omitted, `Labels` returns `nil` — the sandbox has **no** preview routes until recreated with ports.
</Warning>
## Routing priority and wake catch-all
```mermaid
flowchart LR
subgraph client [Client]
B[Browser / curl]
end
subgraph edge [Traefik]
R100["Docker router s-id-port priority 100"]
R1["File router sandbox-wake priority 1"]
end
subgraph backends [Backends]
SB["Sandbox container :port"]
SD["sandboxd :9000 wake"]
end
B --> edge
R100 -->|running + labels visible| SB
R1 -->|stopped or no router| SD
SD -->|docker start + warming page| B
B -->|retry| R100
```
| Router | Source | Rule | Priority | Backend |
|---|---|---|---|---|
| `s-{id}-{port}` | Docker labels on `s-{id}` | `Host(\`s-{id}-{port}.preview.{domain}\`)` | `100` | Sandbox container port |
| `sandbox-wake` | `traefik/dynamic/wake.yml` | `HostRegexp(\`^s-[0-9A-Za-z]+-[0-9]+\\.preview\\..+$\`)` | `1` | `http://sandboxd:9000` |
A **running** sandbox’s label-backed router always wins. The catch-all fires when the container is **stopped** (or Traefik has not yet observed new labels right after wake); sandboxd starts the container and returns the warming HTML until the app listens.
The wake `HostRegexp` intentionally matches **any** preview domain suffix; sandboxd’s handler compiles a **domain-specific** regex anchored to `PREVIEW_DOMAIN` and normalizes the captured id to uppercase ULID for DB lookup (browsers lowercase `Host`).
## Related hostnames
| Hostname | Role |
|---|---|
| `api.preview.{PREVIEW_DOMAIN}` | Optional Traefik route to sandboxd API (`traefik/dynamic/api.yml`); loopback API remains `SANDBOXED_API_BIND` |
| `.preview.{PREVIEW_DOMAIN}` | Cookie `Domain` for private preview sessions (`sandbox_preview`) |
Private sandboxes use forward-auth middleware `sandbox-preview-auth@file` → `http://sandboxd:9000/forward-auth` (`traefik/dynamic/auth.yml`).
## Constructing URLs
**From create response:** use sandbox `id` and each port in `ports`.
**Formula:**
```text
{scheme}://s-{id}-{port}.preview.{PREVIEW_DOMAIN}{hostPortSuffix}
```
Where:
- `scheme` = `http` when `PREVIEW_TLS=false`; `https` when `PREVIEW_TLS=true`
- `hostPortSuffix` = empty if `HTTP_PORT` is `80`, else `:${HTTP_PORT}`
**v1 API `preview.url`:** `GET /v1/sandboxes/{id}` returns `preview.url` built as `https://s-{id}-3000.preview.{PREVIEW_DOMAIN}` — fixed port `3000` and **always `https`**, regardless of local HTTP settings. For local HTTP stacks, build the URL from env yourself; use the v1 field as a production-oriented hint when TLS and port 3000 match your template.
**Host parsing in sandboxd:** `parseSandboxIDFromHost` accepts an optional `:digits` suffix on the host (`s-{id}-{port}.preview.{domain}:8088`) for proxies that include the Traefik listen port in `Host`.
## Local verification without DNS
When Traefik listens on loopback, send the preview `Host` header explicitly:
<CodeGroup>
```bash title="Default HTTP_PORT 80"
curl -s -H "Host: s-{id}-3000.preview.localhost" \
http://127.0.0.1:80/
```
```bash title="HTTP_PORT 8088"
curl -s -H "Host: s-{id}-3000.preview.localhost" \
http://127.0.0.1:8088/
```
</CodeGroup>
Replace `{id}` with the sandbox ULID from create. First hit to a **stopped** sandbox returns the warming page; repeat after the container is running to reach the app.
## Multi-port sandboxes
`ports: [3000, 3001]` yields two independent URLs and two Traefik router/service pairs:
```text
s-{id}-3000.preview.{domain}
s-{id}-3001.preview.{domain}
```
Prometheus/activity code also recognizes Traefik internal names like `s-{id}-{port}@docker`.
## Quick reference
| Question | Answer |
|---|---|
| Container name? | `s-{id}` |
| Traefik router/service name? | `s-{id}-{port}` |
| Minimum create field for previews? | `"ports": [<port>]` |
| Local domain default? | `localhost` → `*.localhost` → 127.0.0.1 |
| When to add `:port` to URL? | When `HTTP_PORT` ≠ 80 |
| TLS on router? | `PREVIEW_TLS=true` + `PREVIEW_ENTRYPOINT=websecure` + wildcard cert |
| Stopped sandbox behavior? | Catch-all → sandboxd wake → warming page → retry hits priority-100 router |
## Related pages
<CardGroup>
<Card title="Preview routing" href="/preview-routing">
Traefik labels, entrypoints, TLS store, and the managed-container constraint in depth.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Catch-all wake path, idle stop, and warming-page behavior.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
All compose-backed env keys including preview and HTTP_PORT.
</Card>
<Card title="Production deployment" href="/production-deployment">
Wildcard DNS, websecure, cert resolver, and auth hardening.
</Card>
<Card title="Private previews" href="/private-previews">
forwardAuth, preview tokens, and cookie domain `.preview.{domain}`.
</Card>
<Card title="Quickstart" href="/quickstart">
End-to-end create → task → open preview URL.
</Card>
</CardGroup>
---
## 17. runtimed reference
> In-sandbox supervisor HTTP over Unix socket: GET /status, POST /tasks, GET /tasks/{id}/events (SSE), POST /tasks/{id}/cancel; workspace paths and sandboxd runtime.Client bridge.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/17-runtimed-reference.md
- Generated: 2026-06-04T22:45:31.829Z
### Source Files
- `control-plane/cmd/runtimed/server.go`
- `control-plane/cmd/runtimed/task.go`
- `control-plane/internal/runtime/client.go`
- `image/Dockerfile`
- `image/HOME_LAYOUT.md`
- `ARCHITECTURE.md`
---
title: "runtimed reference"
description: "In-sandbox supervisor HTTP over Unix socket: GET /status, POST /tasks, GET /tasks/{id}/events (SSE), POST /tasks/{id}/cancel; workspace paths and sandboxd runtime.Client bridge."
---
`runtimed` is the main process inside every sandbox container (`tini` → `/usr/local/bin/runtimed`). It supervises the dev server, runs at most one coding task at a time, and exposes an HTTP/1.1 control API on a Unix domain socket under the durable workspace. The host control plane (`sandboxd`) reaches that socket through the workspace bind mount via `internal/runtime.Client`; integrators normally use the public `/v1/sandboxes/{id}/tasks` surface, which proxies to runtimed and reframes the event stream as SSE.
## Placement in the stack
```mermaid
flowchart TB
subgraph host["Host — sandboxd"]
API["/v1/sandboxes/{id}/tasks"]
RC["runtime.Client"]
DB[(SQLite task rows)]
Watcher[taskwatch goroutine]
end
subgraph ws["Workspace mount — <id>.mnt"]
SOCK[".runtimed/sock"]
TASKS[".runtimed/tasks/<taskId>/"]
end
subgraph container["Sandbox container — runtimed"]
RT["runtimed HTTP mux"]
DEV["dev server supervisor"]
AG["OpenCode agent"]
end
API --> RC
RC -->|"unix HTTP"| SOCK
SOCK --> RT
RT --> DEV
RT --> AG
Watcher --> RC
Watcher --> DB
RT --> TASKS
```
| Layer | Responsibility |
|---|---|
| `runtimed` | Dev-server lifecycle, preview probing, task execution, on-disk task artifacts |
| `runtime.Client` | Host-side dialer over `<workspaces>/<id>.mnt/.runtimed/sock` |
| `internal/api/v1_tasks.go` | Wake-on-submit, ULID allocation, SSE translation, SQLite persistence |
| `internal/api/taskwatch.go` | Background NDJSON consumer; writes canonical `TaskResult` to SQLite |
<Note>
`runtimed` is never exposed on the container network. Preview traffic goes through Traefik to the dev server port; control traffic stays on the workspace UDS.
</Note>
## Socket and workspace paths
| Path | Role |
|---|---|
| `/home/sandbox/.runtimed/sock` | Control socket inside the container (`runtime.DefaultSocketPath`) |
| `<SANDBOXED_DATA_DIR>/workspaces/<id>.mnt/.runtimed/sock` | Same inode on the host (loopback/bind mount) |
| `/home/sandbox/.runtimed/tasks/<taskId>/` | Per-task directory (`events.jsonl`, `result.json`, `agent.log`) |
| `/home/sandbox/workspace/app` | App working directory (`RUNTIMED_APP_DIR`); dev server and agent `chdir` here |
| `/home/sandbox/workspace` | User project root (see `image/HOME_LAYOUT.md`) |
The `.runtimed/` subtree is reserved: v1 file writes reject paths under `.runtimed/` so clients cannot corrupt the supervisor state.
### Environment variables
| Variable | Default | Effect |
|---|---|---|
| `RUNTIMED_APP_DIR` | `/home/sandbox/workspace/app` | Dev-server and agent working directory |
| `RUNTIMED_DIR` | `/home/sandbox/.runtimed` | Runtime state root |
| `RUNTIMED_SOCKET` | `<RUNTIMED_DIR>/sock` | UDS bind path |
| `RUNTIMED_DEV_CMD` | `pnpm dev` | Dev-server command (`bash -lc`) |
| `RUNTIMED_PREVIEW_PORT` | `3000` | HTTP probe target (not the control socket) |
| `RUNTIMED_PROBE_INTERVAL_SECONDS` | `3` | Preview health poll interval |
On boot, `runtimed` creates `RUNTIMED_DIR` and `RUNTIMED_APP_DIR` if missing, recovers interrupted tasks, starts dev-server supervision, and binds the socket (stale socket files are removed before listen).
## Control API (Unix HTTP)
All routes are served by Go 1.22+ `http.ServeMux` method patterns on the workspace socket. Clients use a synthetic host (`http://runtimed/...`) with `net.Dial("unix", socketPath)` — the same pattern as `runtime.Client`.
| Method | Path | Success | Purpose |
|---|---|---|---|
| `GET` | `/status` | `200` | Full supervisor snapshot (`runtime.Status`) |
| `POST` | `/tasks` | `202` | Start a coding task |
| `GET` | `/tasks/{id}/events` | `200` | Live or replayed event stream (NDJSON) |
| `POST` | `/tasks/{id}/cancel` | `200` | Cancel active task (idempotent) |
### `GET /status`
Returns JSON matching `runtime.Status`:
<ResponseField name="runtimed" type="object">
Supervisor identity: `version`, `booted_at`, `uptime_s`.
</ResponseField>
<ResponseField name="preview" type="object">
Reported dev-server state — never commanded. Fields include `status` (`down` | `starting` | `ready` | `error`), optional `pid`, `last_http_status`, `last_checked_at`, `build_error_message`, `restarts`.
</ResponseField>
<ResponseField name="active_task" type="object | null">
When a task is in flight: `id`, `status` (`running`), `phase`. Null when idle or after the task finishes.
</ResponseField>
A connection error from `runtime.Client.Status` (socket missing or refused) means the sandbox is stopped or runtimed has not finished booting.
### `POST /tasks`
<ParamField body="task_id" type="string" required>
Caller-supplied task identifier (sandboxd uses a ULID on the v1 path).
</ParamField>
<ParamField body="prompt" type="string" required>
Agent instruction text.
</ParamField>
<ParamField body="agent" type="string">
Agent adapter name. Empty or `"opencode"` selects OpenCode; other values return `400`.
</ParamField>
<ParamField body="env" type="object">
Key/value map injected into the agent process (e.g. provider API keys from sandbox create).
</ParamField>
<ParamField body="timeout_s" type="integer">
Task timeout in seconds; default **600** (10 minutes) when omitted or zero.
</ParamField>
<RequestExample>
```json
{
"task_id": "01JABCDEFGHJKMNPQRSTVWXYZ0",
"prompt": "Add a greeting component and wire it into the app",
"agent": "opencode",
"env": { "ANTHROPIC_API_KEY": "sk-ant-..." }
}
```
</RequestExample>
<ResponseExample>
```json
{ "task_id": "01JABCDEFGHJKMNPQRSTVWXYZ0", "status": "running" }
```
</ResponseExample>
| HTTP status | Meaning |
|---|---|
| `202 Accepted` | Task started |
| `409 Conflict` | Another task is active (`error`: `task_in_progress`, `active_task_id`) |
| `400 Bad Request` | Invalid JSON, missing fields, or unsupported agent |
Exactly **one** active task per sandbox is enforced in `startTask`.
### `GET /tasks/{id}/events`
Streams **newline-delimited JSON** (`Content-Type: application/x-ndjson`), not SSE at this layer.
<ParamField query="since" type="integer">
Event index to resume from (default `0`). For live tasks, only values `> 0` are honored when parsing the query string.
</ParamField>
Each line is a `runtime.Event`:
| Field | Type | Notes |
|---|---|---|
| `id` | int | Monotonic index |
| `type` | string | `status`, `message`, `tool`, `build`, `done` |
| `ts` | RFC3339 time | Event timestamp |
| `data` | JSON | Type-specific payload |
The stream ends after the terminal `done` event. Past tasks replay from `.runtimed/tasks/<id>/events.jsonl`; unknown IDs return `404`.
`runtime.Client` uses a **no-timeout** HTTP client for this route; status/start/cancel use a **5s** timeout.
### `POST /tasks/{id}/cancel`
Always returns `200` with `{"task_id":"<id>","status":"cancelling"}` — idempotent if the ID does not match the active task. Cancellation kills the agent process group; the task finalizes as `cancelled` (build/health checks are skipped on cancel).
## Task lifecycle
```mermaid
stateDiagram-v2
[*] --> queued: POST /tasks
queued --> checkpoint: runTask
checkpoint --> agent_running: git checkpoint
agent_running --> build_check: agent returns
build_check --> health_check: not cancelled
health_check --> done: probe preview
agent_running --> done: timeout / error / cancel
build_check --> done: failure
health_check --> done: finish
done --> [*]: EventDone + result.json
```
| Phase (`active_task.phase`) | Work |
|---|---|
| `starting` | Task accepted |
| `checkpoint` | Pre-task git commit in `RUNTIMED_APP_DIR` |
| `agent_running` | OpenCode (`opencode run --format json`) |
| `build_check` | `pnpm build` (skipped if cancelled) |
| `health_check` | Post-task dev-server remediation and entry-asset probe |
| `done` | Terminal; `active_task` cleared from `/status` |
Terminal outcomes in `runtime.TaskResult` (`status` on the `done` event and `result.json`):
| `status` | Typical `failure_reason` |
|---|---|
| `succeeded` | — |
| `failed` | `agent_timeout`, `agent_error`, `internal`, `sandbox_unavailable` |
| `cancelled` | `cancelled` |
Authoritative `files_changed` comes from `git diff` against the checkpoint, not from agent stream events. `build_ok` / `build_error_message` reflect the build check; `preview_status_after` and `preview_error_message` reflect live preview health after the task.
### On-disk artifacts
```
.runtimed/
├── sock
├── dev-server.log
└── tasks/<taskId>/
├── events.jsonl # append-only canonical log
├── result.json # written at terminal done
└── agent.log # raw agent stdout
```
On `runtimed` boot, any directory with `events.jsonl` but no `result.json` is finalized as `failed` / `sandbox_unavailable` — interrupted tasks are **never resumed**.
## `runtime.Client` bridge (sandboxd)
`runtime.NewClient(socketPath)` dials the workspace socket and speaks the same HTTP routes:
| Client method | runtimed route | Notes |
|---|---|---|
| `Status(ctx)` | `GET /status` | Returns `*Status` or connection error |
| `StartTask(ctx, StartTaskRequest)` | `POST /tasks` | `ErrTaskInProgress` on `409` |
| `TaskEvents(ctx, taskID, since)` | `GET /tasks/{id}/events?since=N` | `io.ReadCloser` of NDJSON |
| `CancelTask(ctx, taskID)` | `POST /tasks/{id}/cancel` | Idempotent at runtimed |
Construction from a sandbox ID:
```go
_, mnt := loopback.Paths(id)
runtime.NewClient(filepath.Join(mnt, ".runtimed", "sock"))
```
`GET /sandbox/{id}` and v1 sandbox `get` merge `runtime.Status` into the response `runtime` block when the socket is reachable.
### Background watcher
After `POST /v1/sandboxes/{id}/tasks`, `sandboxd` starts `watchTask`, which opens `TaskEvents` from index `0`, decodes until `type: done`, and persists `TaskResult` to SQLite — independent of any client SSE connection. On `sandboxd` restart, `ReconcileTasks` prefers `result.json` on disk, re-attaches a watcher if the sandbox is still running, or marks `sandbox_unavailable`.
## Public v1 mapping
Integrators should use the v1 API; it wraps the same protocol:
| v1 route | runtimed / store behavior |
|---|---|
| `POST /v1/sandboxes/{id}/tasks` | Wake stopped sandbox, allocate ULID, `StartTask`, SQLite row + watcher |
| `GET /v1/sandboxes/{id}/tasks/{taskId}` | SQLite canonical result (works after stop/destroy) |
| `GET /v1/sandboxes/{id}/tasks/{taskId}/events` | Proxies NDJSON → **SSE** (`text/event-stream`); resume via `Last-Event-ID` or `?since=` |
| `POST /v1/sandboxes/{id}/tasks/{taskId}/cancel` | `CancelTask` |
v1 submit accepts `prompt` and optional `agent` (default `opencode`); `task_id` and `env` are not set on the public body — env comes from sandbox create injection into the container environment.
<Warning>
At the runtimed layer, events are NDJSON. Only the v1 `events` endpoint speaks SSE. Do not point SSE clients directly at the Unix socket.
</Warning>
## Direct socket debugging
From the host, with the workspace mounted at `$MNT`:
```bash
curl --unix-socket "$MNT/.runtimed/sock" http://runtimed/status
curl --unix-socket "$MNT/.runtimed/sock" -X POST http://runtimed/tasks \
-H 'Content-Type: application/json' \
-d '{"task_id":"debug-1","prompt":"echo hello > src/hello.txt","agent":"opencode"}'
curl -N --unix-socket "$MNT/.runtimed/sock" \
'http://runtimed/tasks/debug-1/events?since=0'
```
Expect `preview.status` to reach `ready` within a few seconds on a healthy template sandbox.
## Limits and deferred behavior
| Topic | Current behavior |
|---|---|
| Agents | OpenCode only (`selectAgent` in `task.go`) |
| Concurrency | One active task per sandbox |
| Event retention after destroy | Full NDJSON log lives in workspace; only **result** is retained in SQLite after destroy |
| `tool` / `file_change` events | Protocol constants exist; OpenCode adapter surfaces `message` events today |
| Dev server on `package.json` change | No automatic restart after dependency edits |
Shared types and protocol comments live in `control-plane/internal/runtime` (`protocol.go`, `tasks.go`, `client.go`). The in-tree operator guide is `control-plane/cmd/runtimed/README.md`.
## Related pages
<CardGroup>
<Card title="Run coding agents" href="/run-coding-agents">
Submit tasks via v1, env injection, and integrator-facing SSE streaming.
</Card>
<Card title="v1 API reference" href="/v1-api-reference">
Public task request/response shapes and error envelope.
</Card>
<Card title="Workspaces and persistence" href="/workspaces-persistence">
Bind mounts, skeleton seeding, and what survives stop/destroy.
</Card>
<Card title="Overview" href="/overview">
End-to-end create → task → preview path across sandboxd, runtimed, and Traefik.
</Card>
</CardGroup>
---
## 18. Health and metrics
> GET /healthz and /readyz semantics, Prometheus GET /metrics labels, audit/access logging paths, and docker compose logs for sandboxd and Traefik.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/18-health-and-metrics.md
- Generated: 2026-06-04T22:45:54.289Z
### Source Files
- `control-plane/internal/api/handlers.go`
- `control-plane/internal/metrics/metrics.go`
- `control-plane/internal/api/api.go`
- `control-plane/internal/audit/audit.go`
- `control-plane/cmd/sandboxd/main.go`
- `AGENTS.md`
---
title: "Health and metrics"
description: "GET /healthz and /readyz semantics, Prometheus GET /metrics labels, audit/access logging paths, and docker compose logs for sandboxd and Traefik."
---
`sandboxd` exposes three operator-facing HTTP probes on the control-plane mux — `GET /healthz`, `GET /readyz`, and `GET /metrics` — plus structured request logs on stderr and a shared Traefik JSON access log that drives idle detection. Audit rows land in SQLite; there is no audit read API.
## Probe endpoints
All three routes register on the API mux in `control-plane/internal/api/api.go`. Each handler (except the raw Prometheus handler) runs inside the `observe` middleware, which increments `sandboxd_api_requests_total` and records `sandboxd_api_request_duration_seconds` with the route pattern as the `endpoint` label.
| Route | Purpose | Auth on external path |
| --- | --- | --- |
| `GET /healthz` | Liveness — process is up | Exempt (no bearer token) |
| `GET /readyz` | Readiness — SQLite + Docker daemon | Exempt |
| `GET /metrics` | Prometheus scrape | **404** (loopback only) |
The auth middleware (`control-plane/internal/auth/middleware.go`) treats `/healthz` and `/readyz` as exempt on Traefik-routed traffic. `/metrics` is explicitly blocked on the external path; only direct loopback calls to `SANDBOXED_API_BIND` can scrape it.
### Liveness: `GET /healthz`
:::endpoint GET /healthz
Process liveness. Does not touch SQLite or Docker.
:::
<ResponseExample>
```http
HTTP/1.1 200 OK
ok
```
</ResponseExample>
The handler always returns **200** with the plain-text body `ok\n`. Use this for “is the HTTP server accepting connections?” — not for Docker socket or database health.
<RequestExample>
```bash
curl -s "http://127.0.0.1:9090/healthz"
```
</RequestExample>
### Readiness: `GET /readyz`
:::endpoint GET /readyz
Readiness. Fails unless SQLite answers a ping and `docker info` succeeds in this request.
:::
`handleReadyz` runs two synchronous checks on every call:
1. **SQLite** — `Store.DB().PingContext(r.Context())`. Failure → **503** JSON `{"error":"sqlite ping: ..."}`.
2. **Docker** — `Docker.Info`, which shells out to `docker info --format {{.ServerVersion}}`. Failure → **503** JSON `{"error":"docker info: ..."}`. Success also feeds `sandboxd_docker_command_duration_seconds{op="info"}`.
When both pass, the response is **200** with body `ready\n`.
<ResponseExample>
```http
HTTP/1.1 200 OK
ready
```
</ResponseExample>
<RequestExample>
```bash
curl -s "http://127.0.0.1:9090/readyz"
```
</RequestExample>
<Warning>
A **503** on `readyz` with `docker info: exit status 1` almost always means the control-plane container cannot reach the host Docker socket. Confirm `/var/run/docker.sock` is mounted and the daemon is running. See the troubleshooting page for the full checklist.
</Warning>
<Note>
Orchestrators should use **`/healthz` for liveness** and **`/readyz` for readiness**. Do not point a liveness probe at `readyz` — a transient Docker blip would restart the pod unnecessarily.
</Note>
## Prometheus: `GET /metrics`
The scrape handler uses a dedicated registry (`metrics.Registry`) and `promhttp.HandlerFor` — not the default global Prometheus registry.
### Scrape access
| Caller | Result |
| --- | --- |
| Host loopback (`curl http://127.0.0.1:9090/metrics`) | **200** — standard Prometheus text exposition |
| Traefik / LAN without loopback | **404** — middleware blocks external `/metrics` |
Bind is controlled by compose: host `${SANDBOXED_API_BIND:-127.0.0.1:9090}` → container `:9000` (`SANDBOXD_ADDR` default `0.0.0.0:9000` inside the container).
### Metric families and labels
Metrics are defined in `control-plane/internal/metrics/metrics.go`. API middleware buckets HTTP status into `1xx` … `5xx` for the `code` label on `sandboxd_api_requests_total`.
#### Build and inventory
| Metric | Type | Labels | Notes |
| --- | --- | --- | --- |
| `sandboxd_build_info` | Gauge (=1) | `version`, `git_commit` | Set at startup from build metadata |
| `sandboxd_sandboxes_total` | Gauge | `status` | `creating`, `running`, `stopped`, `error` — refreshed on create/destroy and at boot |
#### API and Docker CLI
| Metric | Type | Labels |
| --- | --- | --- |
| `sandboxd_api_requests_total` | Counter | `endpoint`, `method`, `code` |
| `sandboxd_api_request_duration_seconds` | Histogram | `endpoint`, `method` |
| `sandboxd_docker_command_duration_seconds` | Histogram | `op` — e.g. `run`, `inspect`, `exec`, `rm`, `info` |
| `sandboxd_docker_command_errors_total` | Counter | `op` |
The `endpoint` label is the registered route pattern (e.g. `GET /readyz`, `POST /v1/sandboxes/{id}/tasks`), not the resolved path with IDs.
#### Reconciler
| Metric | Type | Labels |
| --- | --- | --- |
| `sandboxd_reconciler_runs_total` | Counter | — |
| `sandboxd_reconciler_last_duration_seconds` | Gauge | — |
| `sandboxd_reconciler_orphans_total` | Gauge | `kind` — e.g. `container`, `mount` |
#### Idle, pressure, wake, activity
| Metric | Type | Labels |
| --- | --- | --- |
| `sandboxd_idle_reaper_runs_total` | Counter | — |
| `sandboxd_idle_reaper_stops_total` | Counter | `reason` |
| `sandboxd_pressure_reaper_runs_total` | Counter | — |
| `sandboxd_pressure_reaper_stops_total` | Counter | `band` |
| `sandboxd_mem_available_percent` | Gauge | — |
| `sandboxd_mem_available_bytes` | Gauge | — |
| `sandboxd_wakes_total` | Counter | `outcome` — `success`, `admission_denied`, `start_failed`, `tcp_ready_timeout`, `not_found`, `error` |
| `sandboxd_wake_duration_seconds` | Histogram | — |
| `sandboxd_wakes_refused_active` | Gauge | 0/1 pressure gate |
| `sandboxd_inflight_exec_total` | Gauge | — |
| `sandboxd_access_log_lag_seconds` | Gauge | Lag between now and newest parsed Traefik access line |
#### Snapshots, preview auth, git push
| Metric | Type | Labels |
| --- | --- | --- |
| `sandboxd_snapshots_taken_total` | Counter | `outcome` — `ok`, `error` |
| `sandboxd_snapshot_restores_total` | Counter | `outcome` |
| `sandboxd_snapshotter_runs_total` | Counter | — |
| `sandboxd_snapshot_last_duration_seconds` | Gauge | — |
| `sandboxd_snapshot_last_size_bytes` | Gauge | — |
| `sandboxd_forward_auth_duration_seconds` | Histogram | — |
| `sandboxd_preview_access_total` | Counter | `result` — `allowed`, `denied` |
| `sandboxd_git_push_total` | Counter | `outcome` — `ok`, `failed` |
| `sandboxd_nginx_reloads_total` | Counter | `outcome` |
#### Egress (non-OSS compose)
These collectors exist in code but the **portable docker-compose build sets `egressMgr = nil`** and does not start journald/nft pollers. In OSS, `sandboxd_egress_sources_active` stays **0** and egress counters are unused.
| Metric | Labels |
| --- | --- |
| `sandboxd_egress_connections_total` | `sandbox_id`, `dst_port_bucket` |
| `sandboxd_egress_drops_total` | `reason` |
| `sandboxd_egress_log_lag_seconds` | — |
| `sandboxd_git_hosts_refresh_runs_total` | `outcome` |
| `sandboxd_abuse_list_refresh_runs_total` | `outcome` |
<Info>
Optional Traefik connection polling (`SANDBOXD_POLLER_METRIC_RE`, `SANDBOXD_POLLER_URL`, default `http://127.0.0.1:8082/metrics`) is a separate scrape target for long-lived WebSocket activity. Without that env, sandboxd logs **fallback mode** and relies on the access-log tailer alone.
</Info>
## Log and audit paths
```text
SANDBOXED_DATA_DIR/ (default /var/lib/sandboxed)
├── state/
│ ├── sandboxd.db SQLite — sandboxes + audit_log
│ └── traefik-tail.offset Access-log tailer checkpoint
└── workspaces/ Per-sandbox bind mounts
SANDBOXED_LOG_DIR/ (default /var/lib/sandboxed/log)
└── traefik-access.log Traefik JSON access log (shared mount)
```
Both paths default under `/var/lib/sandboxed` and are bind-mounted **symmetrically** into `traefik` and `sandboxd` (`docker-compose.yml`, `.env.example`).
### Traefik access log (activity signal)
Traefik writes JSON access lines to `filePath: /var/lib/sandboxed/log/traefik-access.log` (`traefik/traefik.yml`). Override the directory with `SANDBOXED_LOG_DIR`; sandboxd reads the file via `SANDBOXD_ACCESS_LOG` (compose default: `${SANDBOXED_LOG_DIR}/traefik-access.log`).
A background **access-log tailer** (`control-plane/internal/activity/tailer.go`):
- Matches `RequestHost` against `s-{id}-{port}.preview.{PREVIEW_DOMAIN}`
- Calls `BumpLastActive` on matching sandbox rows (feeds the idle reaper)
- Persists read offset to `SANDBOXED_DATA_DIR/state/traefik-tail.offset` (`SANDBOXD_TAILER_OFFSET` override)
- Updates `sandboxd_access_log_lag_seconds` from each line’s `StartUTC`
Fields parsed per line: `RequestHost`, `RequestMethod`, `OriginStatus`, `StartUTC`, `RouterName`. Malformed lines are skipped.
### sandboxd request logs
Every HTTP request passes through `logging.Middleware`, which emits **JSON slog** lines to **stderr** with `request_id`, `method`, `path`, `status`, `duration_ms`. In compose, collect them with `docker compose logs sandboxd`. Set `SANDBOXD_DEBUG` for debug-level verbosity.
Traefik process logs are separate JSON on stderr (`log.format: json` in `traefik/traefik.yml`) — use `docker compose logs traefik`.
### Audit log (SQLite)
Privileged actions append one row to `audit_log` in `sandboxd.db` (`control-plane/migrations/0004_external_identity.sql`). There is **no HTTP read API**; operators query with `sqlite3` on the host.
| Column | Meaning |
| --- | --- |
| `at` | Unix seconds |
| `actor_kind` | `service`, `operator`, `system`, `unknown` |
| `actor_name` | Token name or `loopback` |
| `actor_ip` | Client IP (respects `X-Forwarded-For` on external path) |
| `external_user_id` | Upstream user when relevant |
| `action` | Stable action string |
| `target` | Sandbox ID, external user/project id, etc. |
| `detail` | JSON blob |
Writes are **best-effort** (failures log a warning, never fail the API response). `preview.access_allowed` uses **sampled** writes (at most one row per minute per `sub|sandbox` key).
Representative `action` values:
| Action | Trigger |
| --- | --- |
| `sandbox.create`, `sandbox.destroy`, `sandbox.exec`, `sandbox.stop` | Lifecycle / exec |
| `sandbox.wake` | Wake handler |
| `sandbox.purge`, `sandbox.claim` | Purge / claim |
| `sandbox.snapshot.create`, `sandbox.snapshot.restore` | Manual snapshots |
| `task.create`, `task.cancel` | v1 tasks API |
| `file.put` | v1 workspace file write |
| `snapshot.create`, `snapshot.delete` | v1 library snapshots |
| `preview.session_issued`, `preview.access_denied`, `preview.access_allowed` | Private preview auth |
| `external_user.purge`, `external_project.purge` | Bulk purge |
| `auth.token_invalid` | Failed bearer token |
`POST /sandbox/{id}/exec` audits **only `cmd[0]`** — never the full argv.
<RequestExample>
```bash
sqlite3 /var/lib/sandboxed/state/sandboxd.db \
"SELECT datetime(at,'unixepoch'), actor_kind, actor_name, action, target
FROM audit_log ORDER BY id DESC LIMIT 20;"
```
</RequestExample>
## Compose log operations
<Steps>
<Step title="Follow control-plane logs">
```bash
docker compose logs -f sandboxd
```
Look for `component` fields (`api`, `idle-reaper`, `pressure-reaper`, `access-log-tailer`, `reconcile`, `wake`) and `http` access lines with `request_id`.
</Step>
<Step title="Follow edge router logs">
```bash
docker compose logs -f traefik
```
Router/service churn, TLS, and provider errors appear here — not in sandboxd.
</Step>
<Step title="Verify probes after install">
```bash
curl -s "http://127.0.0.1:9090/healthz" # ok
curl -s "http://127.0.0.1:9090/readyz" # ready
curl -s "http://127.0.0.1:9090/metrics" | head
```
</Step>
</Steps>
When `traefik/dynamic/api.yml` is present, the same probes are reachable at `http://api.preview.<PREVIEW_DOMAIN>/healthz` without a bearer token even when API auth is enabled.
## Request flow (probes vs activity)
```mermaid
sequenceDiagram
participant Op as Operator / kubelet
participant SD as sandboxd
participant DB as SQLite
participant DK as Docker daemon
participant TF as Traefik
participant AL as traefik-access.log
Op->>SD: GET /healthz
SD-->>Op: 200 ok
Op->>SD: GET /readyz
SD->>DB: Ping
SD->>DK: docker info
SD-->>Op: 200 ready
TF->>AL: JSON access line
SD->>AL: tailer reads RequestHost
SD->>DB: BumpLastActive
```
## Related pages
<CardGroup>
<Card title="Installation" href="/installation">
Install verification uses `healthz` / `readyz` after `docker compose up`.
</Card>
<Card title="API authentication" href="/api-authentication">
Exempt probes, loopback operator path, and why `/metrics` is not exposed externally.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
`SANDBOXED_DATA_DIR`, `SANDBOXED_LOG_DIR`, `SANDBOXED_API_BIND`, and poller env keys.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
How access-log bumps and reaper metrics tie into stop-on-idle behavior.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
`readyz` / Docker socket failures and compose log probes.
</Card>
<Card title="Control plane API (legacy)" href="/legacy-api-reference">
Full route list including health and metrics endpoints.
</Card>
</CardGroup>
---
## 19. Build a todo app with an agent
> End-to-end recipe: create sandbox on port 3000, submit opencode task prompt, stream task events, verify preview URL, optional ANTHROPIC_API_KEY via env at create.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/19-build-a-todo-app-with-an-agent.md
- Generated: 2026-06-04T22:46:08.955Z
### Source Files
- `README.md`
- `AGENTS.md`
- `control-plane/internal/api/v1_tasks.go`
- `control-plane/internal/api/handlers.go`
- `image/Dockerfile`
---
title: "Build a todo app with an agent"
description: "End-to-end recipe: create sandbox on port 3000, submit opencode task prompt, stream task events, verify preview URL, optional ANTHROPIC_API_KEY via env at create."
---
The `POST /v1/sandboxes/{id}/tasks` route on `sandboxd` submits an OpenCode prompt to `runtimed` inside a sandbox container, wakes a stopped sandbox first, streams progress as Server-Sent Events, and exposes the built app at `http://s-{ulid}-3000.preview.{PREVIEW_DOMAIN}` once the in-sandbox Vite dev server on port 3000 is healthy.
## Prerequisites
| Requirement | Notes |
|---|---|
| Linux host with Docker Engine + Compose | `./install.sh` builds `sandboxed-base` and starts `sandboxd` + Traefik |
| Control plane reachable | Default `SANDBOXED_API_BIND=127.0.0.1:9090` — verify `curl -s http://127.0.0.1:9090/healthz` returns `ok` |
| Port 3000 exposed at create | `ports` in `POST /sandbox` registers Traefik routers for preview |
| Optional provider key | `env.ANTHROPIC_API_KEY` at create; visible to `runtimed` and agent subprocesses |
<Note>
API auth is off by default (`SANDBOXD_API_AUTH_DISABLED=true`). For non-loopback use, set `SANDBOXD_API_AUTH_DISABLED=false` and send `Authorization: Bearer <secret>` on every API call.
</Note>
## Request flow
```mermaid
sequenceDiagram
participant Client
participant sandboxd
participant SQLite
participant runtimed
participant OpenCode
participant Traefik
Client->>sandboxd: POST /sandbox ports=[3000] env?
sandboxd->>SQLite: persist sandbox row
sandboxd-->>Client: id, status
Client->>sandboxd: POST /v1/sandboxes/{id}/tasks
alt sandbox stopped
sandboxd->>sandboxd: wake via internal /wake/{id}
end
sandboxd->>runtimed: POST /tasks (UDS)
runtimed->>OpenCode: opencode run --format json
sandboxd->>SQLite: CreateTask running
sandboxd-->>Client: 202 task id, events_url
Client->>sandboxd: GET .../tasks/{taskId}/events (SSE)
sandboxd->>runtimed: stream events.jsonl
runtimed-->>Client: status, message, tool, build, done
runtimed->>runtimed: pnpm dev on :3000
Client->>Traefik: Host s-{id}-3000.preview.localhost
Traefik->>runtimed: forward :3000
```
Coding work happens under `/home/sandbox/workspace/app` (`RUNTIMED_APP_DIR`). `runtimed` is the container main process: it supervises `pnpm dev` (default port 3000) and runs at most one active task per sandbox.
## Full recipe
Set the API base URL once:
```bash
API=http://127.0.0.1:9090
```
<Steps>
<Step title="Create a sandbox on port 3000">
Expose port 3000 so Traefik registers `s-{id}-3000.preview.{domain}`.
<RequestExample>
```bash
curl -s -XPOST "$API/sandbox" \
-H 'content-type: application/json' \
-d '{"ports":[3000]}'
```
</RequestExample>
<ResponseExample>
```json
{"id":"01JXXXXXXXXXXXXXXXXXXXXXX","status":"running","ports":[3000],...}
```
</ResponseExample>
Capture the sandbox ULID:
```bash
ID=$(curl -s -XPOST "$API/sandbox" -H 'content-type: application/json' \
-d '{"ports":[3000]}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
echo "sandbox=$ID"
```
</Step>
<Step title="Submit the OpenCode task">
Send a prompt that asks for a Vite todo app and to serve it on port 3000. The default agent is `opencode`; only `opencode` is accepted today.
:::endpoint POST /v1/sandboxes/{id}/tasks
Submit a headless coding task. Wakes a stopped sandbox before submit. Returns `202 Accepted`.
:::
<ParamField body="prompt" type="string" required>
Natural-language task for the agent (non-empty).
</ParamField>
<ParamField body="agent" type="string">
Agent adapter name. Defaults to `opencode`. Other values return `400 invalid_request`.
</ParamField>
<RequestExample>
```bash
TASK=$(curl -s -XPOST "$API/v1/sandboxes/$ID/tasks" \
-H 'content-type: application/json' \
-d '{
"prompt": "create a Vite app that shows a todo list and run it on port 3000",
"agent": "opencode"
}')
echo "$TASK"
TASK_ID=$(echo "$TASK" | sed -E 's/.*"id":"([^"]+)".*/\1/')
```
</RequestExample>
<ResponseExample>
```json
{
"id": "01JYYYYYYYYYYYYYYYYYYYYYY",
"sandbox_id": "01JXXXXXXXXXXXXXXXXXXXXXX",
"status": "running",
"agent": "opencode",
"events_url": "/v1/sandboxes/01JXXXXXXXXXXXXXXXXXXXXXX/tasks/01JYYYYYYYYYYYYYYYYYYYYYY/events"
}
```
</ResponseExample>
<Warning>
Only one task may run per sandbox. A concurrent submit returns `409` with `code: task_in_progress`.
</Warning>
</Step>
<Step title="Stream task events (SSE)">
Follow `events_url` with `curl -N` to receive Server-Sent Events. Resume with `Last-Event-ID` or `?since=<n>`.
<RequestExample>
```bash
curl -N "$API/v1/sandboxes/$ID/tasks/$TASK_ID/events"
```
</RequestExample>
| Event type | Role |
|---|---|
| `status` | Phase transitions inside `runtimed` |
| `message` | Agent text (best-effort, from OpenCode JSON stream) |
| `tool` | Tool invocations (`read`, `write`, `bash`, …) |
| `build` | Post-task `pnpm build` check |
| `done` | Terminal; `data` carries canonical `TaskResult` |
Poll the durable result without keeping SSE open:
```bash
curl -s "$API/v1/sandboxes/$ID/tasks/$TASK_ID"
```
While running, the response is `{"id":"...","sandbox_id":"...","status":"running"}`. When finished, fields include `status` (`succeeded` | `failed` | `cancelled`), `build_ok`, `files_changed`, and token usage.
</Step>
<Step title="Open and verify the preview URL">
Preview hostname pattern (from Traefik labels):
```
http://s-{id}-3000.preview.{PREVIEW_DOMAIN}
```
Defaults: `PREVIEW_DOMAIN=localhost`, `HTTP_PORT=80`. Modern browsers resolve `*.localhost` to `127.0.0.1`.
| Setting | Local URL |
|---|---|
| Defaults | `http://s-$ID-3000.preview.localhost` |
| `HTTP_PORT=8088` | `http://s-$ID-3000.preview.localhost:8088` |
| Production + TLS | `https://s-$ID-3000.preview.yourdomain.com` |
Verify from the shell (Traefik on loopback):
```bash
curl -s -H "Host: s-$ID-3000.preview.localhost" \
"http://127.0.0.1:${HTTP_PORT:-80}/" | head
```
<Info>
A stopped sandbox shows the warming page ("Spinning up your app…") until wake completes. The dev server may still be starting after the task finishes — poll until HTTP 200 and expected HTML.
</Info>
Inspect in-sandbox preview health via the legacy get endpoint:
```bash
curl -s "$API/sandbox/$ID" | jq '.runtime.preview'
```
`preview.status` is `down`, `starting`, or `ready`.
</Step>
<Step title="Clean up (optional)">
```bash
# stop container, keep workspace
curl -s -XPOST "$API/v1/sandboxes/$ID/stop"
# destroy container + delete workspace
curl -s -XPOST "$API/sandbox/$ID/purge"
```
</Step>
</Steps>
## Inject `ANTHROPIC_API_KEY` at create
OpenCode and Claude Code ship in the base image (`image/Dockerfile`). Keys belong in the create payload so both the tasks API and any `exec` session see them:
<RequestExample>
```bash
ID=$(curl -s -XPOST "$API/sandbox" -H 'content-type: application/json' \
-d '{
"ports": [3000],
"env": {"ANTHROPIC_API_KEY": "sk-ant-..."}
}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
curl -s -XPOST "$API/v1/sandboxes/$ID/tasks" -H 'content-type: application/json' \
-d '{"prompt":"build a Vite todo app and run it on port 3000","agent":"opencode"}'
```
</RequestExample>
<ParamField body="env" type="object">
Map of environment variables passed to `docker run --env`. Keys must be non-empty and must not contain `=` or newlines. Values are visible inside the container to `runtimed` and spawned agents.
</ParamField>
<Tip>
Without a key, OpenCode can still run on its bundled free plan. Inject a key when you want your own provider account and quotas.
</Tip>
## Task outcome fields
When `GET /v1/sandboxes/{id}/tasks/{taskId}` returns a completed task, the embedded result follows `runtime.TaskResult`:
| Field | Meaning |
|---|---|
| `status` | `succeeded`, `failed`, or `cancelled` |
| `build_ok` | Whether post-task `pnpm build` passed |
| `files_changed` | Paths from git diff against pre-task checkpoint |
| `agent_message_final` | Last agent message text |
| `preview_status_after` | Preview health after the task |
| `failure_reason` / `error_message` | Set when `status` is `failed` |
Results persist in SQLite even after `stop` or `delete` on the sandbox.
## Common errors
| HTTP | `error.code` | Cause |
|---|---|---|
| 400 | `invalid_request` | Missing `prompt`, bad JSON, or unsupported `agent` |
| 404 | `not_found` | Unknown sandbox or task |
| 409 | `conflict` | Sandbox not `running` after wake attempt |
| 409 | `task_in_progress` | Another task is active |
| 502 | `sandbox_unavailable` | `runtimed` socket unreachable |
| 503 | `sandbox_capacity` | Wake refused (host memory admission) |
v1 errors use `{"error":{"code","message","retryable"}}`; `retryable` is true for 502/503.
## Troubleshooting this recipe
| Symptom | Check |
|---|---|
| Warming page never finishes | `docker compose logs -f sandboxd`; sandbox may still be waking or nothing listens on 3000 yet |
| Task `failed` / `build_ok: false` | Stream events for `build` and `done`; inspect `build_error_message` |
| Preview 502 after `succeeded` | `preview_error_message` on task result — dev server up but app assets unhealthy |
| `id must be a ULID` on create | Omit custom `id` or supply a valid ULID |
| Preview needs `:8088` | Set `HTTP_PORT` in `.env` and include the port in browser and curl URLs |
## Related pages
<CardGroup>
<Card title="Quickstart" href="/quickstart">
Shorter copy-paste path for create → task → preview without the todo-specific prompt.
</Card>
<Card title="Run coding agents" href="/run-coding-agents">
Task API contract, wake-on-submit, SSE resume, and `runtimed` bridge details.
</Card>
<Card title="Preview URL reference" href="/preview-url-reference">
Hostname pattern, `HTTP_PORT` suffix rules, and localhost vs production TLS.
</Card>
<Card title="Installation" href="/installation">
First-time `./install.sh`, `.env` bootstrap, and `healthz` / `readyz` checks.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Control-plane logs, port conflicts, ULID validation, and warming-page stalls.
</Card>
</CardGroup>
---
## 20. Exec a dev server preview
> Recipe without tasks API: POST /sandbox/{id}/exec to start a server on an exposed port, wake stopped sandboxes via preview hit, and curl with Host header locally.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/20-exec-a-dev-server-preview.md
- Generated: 2026-06-04T22:46:28.927Z
### Source Files
- `README.md`
- `AGENTS.md`
- `control-plane/internal/api/handlers.go`
- `control-plane/internal/wake/handler.go`
- `traefik/dynamic/wake.yml`
---
title: "Exec a dev server preview"
description: "Recipe without tasks API: POST /sandbox/{id}/exec to start a server on an exposed port, wake stopped sandboxes via preview hit, and curl with Host header locally."
---
`POST /sandbox/{id}/exec` runs a non-interactive command inside container `s-{ulid}` via `docker exec`; Traefik routes `s-{id}-{port}.preview.{PREVIEW_DOMAIN}` to that port when the sandbox is running, and the priority-1 wake catch-all in `traefik/dynamic/wake.yml` forwards stopped sandboxes to `sandboxd` on port 9000 for `docker start` plus an HTML meta-refresh before the live route takes over.
## When to use exec instead of tasks
| Path | API | Best for |
|------|-----|----------|
| **Exec + preview** (this page) | `POST /sandbox/{id}/exec` | You already have code, a one-off shell command, or a dev server you start manually — no `runtimed` agent loop |
| **Agent + preview** | `POST /v1/sandboxes/{id}/tasks` | Prompt-driven builds where OpenCode/Claude writes the app and runs the dev server |
Both paths share the same preview hostname and wake behavior once something listens on the exposed port.
## Prerequisites
- Stack installed and healthy (`curl http://127.0.0.1:9090/healthz` → `ok`, `curl http://127.0.0.1:9090/readyz` → `ready`).
- `API` base URL = value of `SANDBOXED_API_BIND` (default `http://127.0.0.1:9090`).
- If `SANDBOXD_API_AUTH_DISABLED=false`, add `-H "Authorization: Bearer <secret>"` on every API call.
## End-to-end recipe
<Steps>
<Step title="Create a sandbox with an exposed port">
Declare the port your dev server will bind to inside the container. `sandboxd` emits Traefik Docker labels (`traefik.http.routers.s-{id}-{port}.rule=Host(...)`, `priority=100`) only for ports listed at create time.
```bash
API=http://127.0.0.1:9090
ID=$(curl -s -XPOST "$API/sandbox" \
-H 'content-type: application/json' \
-d '{"ports":[3000]}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
echo "sandbox=$ID"
```
Omit `id` to auto-generate a ULID; a custom `id` must be valid ULID casing or create returns `400` with `id must be a ULID`.
</Step>
<Step title="Start a dev server with exec">
:::endpoint POST /sandbox/{id}/exec
Run a command inside `s-{id}`. Non-interactive only: no PTY, no stdin (`docker exec` wrapper in `control-plane/internal/docker/docker.go`). Default response is JSON; set `"stream": true` for chunked plain text.
<ParamField body="cmd" type="string[]" required>
Argv passed to `docker exec` (e.g. `["bash","-lc","..."]`).
</ParamField>
<ParamField body="stream" type="boolean">
If true, returns `200` chunked `text/plain` with stdout, optional stderr block, and trailing `exit_code: N`.
</ParamField>
<ResponseField name="stdout" type="string">Captured stdout.</ResponseField>
<ResponseField name="stderr" type="string">Captured stderr.</ResponseField>
<ResponseField name="exit_code" type="number">Process exit code; non-zero exits still return `200` with the code set.</ResponseField>
:::
Exec is **synchronous**: the HTTP request blocks until the container command exits. Background long-running servers so exec returns and you can verify the preview in a second request.
<RequestExample>
```bash
curl -s -XPOST "$API/sandbox/$ID/exec" \
-H 'content-type: application/json' \
-d '{"cmd":["bash","-lc","cd ~/workspace && echo hello > index.html && nohup python3 -m http.server 3000 >/tmp/http.log 2>&1 &"]}'
```
</RequestExample>
<ResponseExample>
```json
{"stdout":"","stderr":"","exit_code":0}
```
</ResponseExample>
<Note>
User project files live under `/home/sandbox/workspace` (`~/workspace` in the sandbox shell). Exec bumps `last_active_at` at start and end, which postpones the idle reaper (`SANDBOXD_IDLE_THRESHOLD_SECONDS`, default 2100s).
</Note>
</Step>
<Step title="Hit the preview locally with a Host header">
Preview URL pattern:
```
http://s-<ID>-<PORT>.preview.<PREVIEW_DOMAIN>[:<HTTP_PORT>]
```
Defaults: `PREVIEW_DOMAIN=localhost`, `HTTP_PORT=80`. Modern browsers resolve `*.localhost` to `127.0.0.1`, so you can open the URL directly. For `curl` against `127.0.0.1` you must send the preview `Host` header so Traefik selects the sandbox router:
```bash
curl -s -H "Host: s-$ID-3000.preview.localhost" "http://127.0.0.1:${HTTP_PORT:-80}/"
# expect: hello (or your app body)
```
If you changed `HTTP_PORT` in `.env` (e.g. `8088`), include it in the URL: `http://127.0.0.1:8088/` with the same `Host` header.
</Step>
<Step title="Stop the sandbox and wake it with a preview request">
Stop frees RAM while keeping the workspace on disk:
```bash
curl -s -XPOST "$API/v1/sandboxes/$ID/stop"
```
The next preview request to a stopped sandbox hits the wake catch-all (router priority `1` in `traefik/dynamic/wake.yml`, service `http://sandboxd:9000`). `sandboxd` runs admission, `docker start s-{id}`, optionally probes TCP readiness (default 8s, `SANDBOXD_WAKE_TCP_READY_TIMEOUT_SECONDS`), marks the row `running`, and returns the **Spinning up your app…** HTML page with a 2-second meta-refresh. After refresh, Traefik’s dynamic per-port router (priority `100`) forwards to the container.
```bash
curl -s -H "Host: s-$ID-3000.preview.localhost" "http://127.0.0.1:${HTTP_PORT:-80}/"
```
<Warning>
If nothing is listening on port 3000 inside the container, you still get the warming page or connection errors after wake. Re-run exec (or ensure your server process survived the stop/wake cycle — background servers started before stop do not automatically restart; exec again after wake if needed).
</Warning>
</Step>
<Step title="Destroy when finished">
```bash
curl -s -XPOST "$API/sandbox/$ID/purge"
```
`purge` removes the container **and** deletes the workspace. Use `DELETE /sandbox/{id}` instead to destroy the container but keep files.
</Step>
</Steps>
## Request flow (stopped → running preview)
```mermaid
sequenceDiagram
participant Client
participant Traefik
participant Sandboxd as sandboxd :9000
participant Docker
participant Sandbox as s-{id}
Client->>Traefik: GET / Host: s-{id}-3000.preview.localhost
Note over Traefik: catch-all priority 1 (stopped)
Traefik->>Sandboxd: forward (passHostHeader)
Sandboxd->>Docker: docker start s-{id}
Sandboxd-->>Client: 200 HTML meta-refresh 2s
Client->>Traefik: GET / (after refresh)
Note over Traefik: dynamic router priority 100 (running)
Traefik->>Sandbox: :3000 dev server
Sandbox-->>Client: HTTP response body
```
While `status` is already `running`, the wake handler returns success immediately (no `docker start`). Concurrent preview wakes for the same id dedupe through an in-memory inflight map in `control-plane/internal/wake/handler.go`.
## Exec API reference (legacy route)
| Item | Value |
|------|--------|
| Method / path | `POST /sandbox/{id}/exec` |
| Container name | `s-{id}` |
| Content-Type | `application/json` |
| Success | `200` + `execResp` JSON (or chunked stream) |
| Errors | `400` invalid JSON / missing `cmd`; `500` `docker exec: ...` |
<AccordionGroup>
<Accordion title="Streaming exec">
```bash
curl -s -XPOST "$API/sandbox/$ID/exec" \
-H 'content-type: application/json' \
-d '{"cmd":["bash","-lc","echo hi"],"stream":true}'
```
Response is `text/plain` with body, optional `---stderr---` section, and `exit_code: N` line.
</Accordion>
<Accordion title="Programmatic wake (no browser)">
`POST /wake/{id}` on the loopback API returns JSON `{ "id", "status": "running", "wake_duration_ms" }` when `Accept` is not routed through the preview Host shape. Preview hits remain the integrator-facing wake path for browsers and `curl` with `Host`.
</Accordion>
</AccordionGroup>
## Configuration that affects this recipe
| Variable | Default | Effect on exec preview |
|----------|---------|------------------------|
| `PREVIEW_DOMAIN` | `localhost` | Hostname segment after `.preview.` |
| `HTTP_PORT` | `80` | Host port Traefik publishes (`docker-compose.yml`) |
| `PREVIEW_ENTRYPOINT` | `web` | Traefik entrypoint on sandbox labels |
| `PREVIEW_TLS` | `false` | When `true`, use `https://` preview URLs and `websecure` |
| `SANDBOXED_API_BIND` | `127.0.0.1:9090` | Where `exec` and `stop` are called |
| `SANDBOXD_IDLE_THRESHOLD_SECONDS` | `2100` | Idle stop window; exec resets activity |
## Troubleshooting
| Symptom | Likely cause | What to check |
|---------|----------------|---------------|
| `docker exec: ...` from API | Container not running or wrong id | `GET /sandbox/{id}` → `status`; after stop, wake via preview or start server again after wake |
| `curl` to `127.0.0.1` returns Traefik 404 | Missing `Host` header | Use `Host: s-{id}-{port}.preview.{domain}` matching create-time port |
| Warming page loops | Nothing listening on exposed port | Exec a server after wake; confirm port is in create `ports` |
| `id must be a ULID` | Invalid custom id | Omit `id` on create |
| Preview works in browser but not curl | Port suffix | Append `:$HTTP_PORT` when not `80` |
## Related pages
<CardGroup>
<Card title="Quickstart (tasks API)" href="/quickstart">
Agent-driven build on port 3000 with SSE task events — alternative to manual exec.
</Card>
<Card title="Preview routing" href="/preview-routing">
Traefik Host rules, priority 100 vs wake catch-all priority 1, and `sandboxed.managed` constraint.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Stop-on-idle, admission, TCP-ready probe, and warming-page behavior in depth.
</Card>
<Card title="Preview URL reference" href="/preview-url-reference">
Exact hostname pattern and `HTTP_PORT` suffix rules.
</Card>
<Card title="Legacy API reference" href="/legacy-api-reference">
Full `/sandbox*` contract including exec, keepalive, and purge.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
readyz, port 80 conflicts, ULID validation, and warming-page stalls.
</Card>
</CardGroup>
---
## 21. Troubleshooting
> readyz/docker socket failures, port 80 conflicts (HTTP_PORT), ULID validation, warming-page stalls, userns-remap seed errors, preview spin-up, and compose log probes.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/21-troubleshooting.md
- Generated: 2026-06-04T22:46:48.829Z
### Source Files
- `AGENTS.md`
- `README.md`
- `control-plane/internal/api/handlers.go`
- `control-plane/internal/wake/handler.go`
- `install.sh`
- `.env.example`
---
title: "Troubleshooting"
description: "readyz/docker socket failures, port 80 conflicts (HTTP_PORT), ULID validation, warming-page stalls, userns-remap seed errors, preview spin-up, and compose log probes."
---
sandboxd exposes `GET /healthz` and `GET /readyz` on `SANDBOXED_API_BIND` (default `127.0.0.1:9090`); Traefik publishes previews on `HTTP_PORT` (default `80`). Most install and runtime failures show up as a non-`ready` readiness probe, a compose bind error on port 80, a `400` with `id must be a ULID`, a stuck **Spinning up your app!** page, or `seed:` / permission errors during workspace provisioning.
## Quick checks
<Steps>
<Step title="Probe the control plane">
```bash
curl -s http://127.0.0.1:9090/healthz # expect: ok
curl -s http://127.0.0.1:9090/readyz # expect: ready
```
`healthz` only confirms the process is up. `readyz` additionally requires SQLite and a recent successful `docker info`.
</Step>
<Step title="Inspect the stack">
```bash
docker compose ps
docker compose logs -f sandboxd
docker compose logs -f traefik
docker ps --filter label=sandboxed.managed=true
```
</Step>
<Step title="Confirm Docker from the host">
```bash
docker info
```
`install.sh` falls back to `sudo docker` when the current user cannot reach the daemon.
</Step>
</Steps>
| Symptom | Likely cause | First move |
| --- | --- | --- |
| `readyz` → `503` with `docker info:` | Socket unreachable or daemon down | `docker info`; verify `/var/run/docker.sock` in compose |
| `docker compose up` fails on `:80` | Port 80 in use | Set `HTTP_PORT=8088` in `.env`, `docker compose up -d` |
| `id must be a ULID` | Non-ULID `id` on create | Omit `id` or pass a valid ULID |
| **Spinning up your app!** loops | Wake OK but nothing listens on the port, or slow start | `POST .../exec` or tasks API; check sandbox logs |
| `seed:` / permission on create | `userns-remap` without `SANDBOXED_USERNS=host` | Keep default `SANDBOXED_USERNS=host` |
| Preview 404 / not_found | Wrong id or Host casing (handled internally) | `GET /sandbox/{id}`; use canonical preview host |
## Health and readiness
| Endpoint | HTTP | Body | Meaning |
| --- | --- | --- | --- |
| `GET /healthz` | 200 | `ok\n` | Process alive; no Docker or DB check |
| `GET /readyz` | 200 | `ready\n` | SQLite `Ping` OK and `docker info` succeeded |
| `GET /readyz` | 503 | JSON error | `sqlite ping: …` or `docker info: …` |
Both probes are auth-exempt when API auth is enabled.
<Warning>
Orchestrators that only hit `healthz` can mark the stack healthy while `readyz` is still failing — sandbox create, exec, and wake all need a working Docker socket.
</Warning>
### `readyz` and Docker socket failures
`sandboxd` shells out to the host `docker` CLI over the mounted socket (`docker-compose.yml` bind-mounts `/var/run/docker.sock` into the `sandboxd` service). `handleReadyz` runs `Store.DB().PingContext` then `Docker.Info`, which executes `docker info --format {{.ServerVersion}}`.
<RequestExample>
```bash
curl -s -w "\nHTTP %{http_code}\n" http://127.0.0.1:9090/readyz
```
</RequestExample>
<ResponseExample>
```text
docker info: exit status 1
HTTP 503
```
</ResponseExample>
<Steps>
<Step title="Verify the daemon on the host">
```bash
docker info
# or, if install.sh selected sudo:
sudo docker info
```
</Step>
<Step title="Confirm the socket inside sandboxd">
```bash
docker compose exec sandboxd ls -l /var/run/docker.sock
docker compose exec sandboxd docker info --format '{{.ServerVersion}}'
```
</Step>
<Step title="Restart the control plane">
```bash
docker compose restart sandboxd
curl -s http://127.0.0.1:9090/readyz
```
</Step>
</Steps>
<Note>
`install.sh` probes `docker info` before compose and prints `using 'sudo docker'` when the user is not in the `docker` group. The in-container `sandboxd` process still uses the socket mount — fix host daemon access, not only your shell alias.
</Note>
## Port 80 conflicts (`HTTP_PORT`)
Traefik maps host port `${HTTP_PORT:-80}` to container port `80`. If another service owns 80, compose fails at bind time or previews hit the wrong process.
<ParamField body="HTTP_PORT" type="number" default="80">
Host port for Traefik HTTP. When not `80`, every preview URL needs the suffix, e.g. `http://s-<id>-3000.preview.localhost:8088`.
</ParamField>
<Steps>
<Step title="Pick a free port">
```bash
# example: use 8088
echo 'HTTP_PORT=8088' >> .env
```
</Step>
<Step title="Recreate the stack">
```bash
docker compose up -d
```
</Step>
<Step title="Hit previews with the suffix">
```bash
ID=<your-ulid>
curl -s -H "Host: s-${ID}-3000.preview.localhost" http://127.0.0.1:8088/
```
</Step>
</Steps>
The installer summary prints the correct `PORTSUFFIX` when `HTTP_PORT != 80`.
## ULID validation
`POST /sandbox` accepts an optional `id`. Empty `id` triggers server-side `newULID()`. A supplied `id` must parse via `ulid.Parse`; otherwise the handler returns `400` with message `id must be a ULID`.
| Input | Result |
| --- | --- |
| omit `id` | Auto-generated Crockford Base32 ULID |
| `"01ARZ3NDEKTSV4RRFFQ69G5FAV"` | Accepted if valid |
| `"demo01"`, UUIDs, arbitrary strings | `400` — `id must be a ULID` |
```bash
# recommended: omit id
curl -s -XPOST http://127.0.0.1:9090/sandbox \
-H 'content-type: application/json' \
-d '{"ports":[3000]}'
```
<Warning>
The post-install banner in `install.sh` still shows `"id":"demo01"` as an example — that value is **not** a ULID and will fail validation. Omit `id` or capture the generated id from the response.
</Warning>
Sandbox primary keys in SQLite are ULIDs; preview hostnames embed the same id (`s-{id}-{port}.preview.{domain}`).
## Warming-page stalls
Stopped sandboxes have no Traefik priority-100 router. The file-provider catch-all in `traefik/dynamic/wake.yml` (priority `1`) forwards matching preview hosts to `http://sandboxd:9000`. The wake handler starts the container and returns an HTML page titled **Spinning up your app!** with a meta-refresh (default **2** seconds). After refresh, the sandbox’s Docker labels should register a priority-100 route and Traefik proxies to the dev server.
```mermaid
sequenceDiagram
participant Browser
participant Traefik
participant sandboxd
participant Docker
participant Sandbox as s-{id} container
Browser->>Traefik: GET preview Host (stopped)
Traefik->>sandboxd: catch-all priority 1
sandboxd->>Docker: docker start s-{id}
sandboxd->>Sandbox: TCP probe :port (up to 8s)
sandboxd-->>Browser: 200 Spinning up + meta-refresh
Note over Traefik,Sandbox: Labels publish priority-100 router
Browser->>Traefik: refresh
Traefik->>Sandbox: proxy to listening port
```
### When the page keeps appearing
| Cause | What to verify |
| --- | --- |
| Nothing listening on the exposed port | Process bound to the port you declared in `ports` (e.g. `3000`) |
| Slow boot (install, agent task) | Wait longer than one refresh; default TCP-ready wait is **8s** (`SANDBOXD_WAKE_TCP_READY_TIMEOUT_SECONDS`) — timeout is informational; wake still succeeds and serves the refresh page |
| Sandbox `error` or `creating` | `GET /sandbox/{id}` → `status`, `error_message` |
| Memory admission denied | **Almost ready…** page (503) with `Retry-After`; host memory pressure |
| Wrong preview port in URL | Port in hostname must match an exposed port |
```bash
# is the sandbox up?
curl -s http://127.0.0.1:9090/sandbox/$ID | jq .status
# start a minimal server on the declared port
curl -s -XPOST http://127.0.0.1:9090/sandbox/$ID/exec \
-H 'content-type: application/json' \
-d '{"cmd":["bash","-lc","cd ~/workspace && python3 -m http.server 3000"]}'
```
### Machine-readable wake headers
White-label HTML hides internal reasons; correlate in logs or DevTools:
| Header | When |
| --- | --- |
| `X-Wake-Error` | Non-admission failure (`not_found`, `start_failed`, `creating`, …) |
| `X-Retry-After-Reason` | Admission / pressure denial |
| `Retry-After` | Seconds before retry (busy page, default 30) |
Prometheus label `wakes_total{result="tcp_ready_timeout"}` increments when the TCP probe times out; the user still gets the refresh page.
## Userns-remap and seed errors
Workspace directories live under `SANDBOXED_DATA_DIR/workspaces/<id>`. On first create, `loopback.Provision` runs a one-shot seed container that copies `/opt/sandbox-skel` and `chown`s to `sandbox:sandbox`. With Docker `userns-remap`, seeding as root inside the default user namespace maps to a high host UID that cannot write a host-owned workspace dir.
Default: `SANDBOXED_USERNS=host` on infra containers (`traefik`, `sandboxd`) and on sandbox + seed `docker run` (`--userns host`). On a daemon **without** userns-remap this is a no-op; with remap it keeps `chown` deterministic.
| Failure shape | Fix |
| --- | --- |
| `seed: …` / permission denied on create | Ensure `.env` / compose passes `SANDBOXED_USERNS=host` (default) |
| Intentionally use daemon userns | Set `SANDBOXED_USERNS=` empty (see `ARCHITECTURE.md`) — only if you understand ownership on the data dir |
```bash
docker compose exec sandboxd printenv SANDBOXED_USERNS
# expect: host
```
After changing userns policy, recreate affected sandboxes or purge workspaces that were seeded with wrong ownership.
## Preview spin-up
End-to-end preview recovery:
<Steps>
<Step title="Confirm row state">
```bash
curl -s http://127.0.0.1:9090/sandbox/$ID
```
Expect `running` for an active server, or `stopped` before the first wake.
</Step>
<Step title="Wake via preview (local)">
```bash
# add :$HTTP_PORT when HTTP_PORT is not 80
curl -sv -H "Host: s-${ID}-3000.preview.localhost" http://127.0.0.1:${HTTP_PORT:-80}/ 2>&1 | head -30
```
Browsers lowercase `Host`; the wake handler uppercases the captured id before DB lookup.
</Step>
<Step title="Programmatic wake">
```bash
curl -s -XPOST http://127.0.0.1:9090/wake/$ID \
-H 'Accept: application/json'
```
</Step>
<Step title="Ensure Traefik sees the sandbox network">
```bash
docker inspect s-$ID --format '{{json .NetworkSettings.Networks}}'
```
Sandboxes join `SANDBOXED_NETWORK` (default `sandboxed_net`).
</Step>
</Steps>
| Preview symptom | Check |
| --- | --- |
| `not_found` / error page | Id typo; sandbox purged; `GET /sandbox/{id}` |
| Redirect to sign-in | `visibility=private` without preview cookie — see private previews |
| Connection refused on `:80` | `HTTP_PORT` mismatch |
| Works with `curl -H Host` but not browser | DNS: `*.localhost` resolves to `127.0.0.1` on most browsers; production needs real wildcard DNS |
## Compose log probes
| Command | Use |
| --- | --- |
| `docker compose logs -f sandboxd` | Create, wake, reaper, `docker start` / `seed:` failures |
| `docker compose logs -f traefik` | Router registration, catch-all vs per-sandbox priority |
| `docker compose ps` | `traefik` / `sandboxd` restarts, port mapping |
| `docker logs s-<ID>` | In-sandbox dev server or `runtimed` |
| `tail -f ${SANDBOXED_LOG_DIR}/traefik-access.log` | Per-request routing (shared log dir mount) |
```bash
# follow wake-related lines
docker compose logs -f sandboxd 2>&1 | rg -i 'wake|seed|docker start|admission'
# last 100 lines without follow
docker compose logs --tail=100 sandboxd
```
<Info>
Access logs path defaults to `${SANDBOXED_LOG_DIR}/traefik-access.log` (see `SANDBOXD_ACCESS_LOG` in compose). `install.sh` chmods the log dir `0777` so Traefik can write.
</Info>
## Related pages
<CardGroup>
<Card title="Installation" href="/installation">
Prerequisites, `install.sh`, `.env` bootstrap, and first `healthz` / `readyz` checks.
</Card>
<Card title="Observability" href="/observability">
Probe semantics, Prometheus metrics, and structured logging paths.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Stop-on-idle, wake admission, keepalive, and warming-page behavior in depth.
</Card>
<Card title="Preview routing" href="/preview-routing">
Traefik priorities, catch-all wake router, and `PREVIEW_DOMAIN` rules.
</Card>
<Card title="Workspaces and isolation" href="/workspaces-persistence">
Bind mounts, skeleton seeding, `SANDBOXED_USERNS`, and storage layout.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
`HTTP_PORT`, `SANDBOXED_DATA_DIR`, wake timeouts, and reaper tuning env keys.
</Card>
</CardGroup>
---
## 22. Control plane development
> Go 1.22+ build/test/vet in control-plane/, CGO sqlite note, compose --build loop, package map (docker, store, reaper, wake, api), and image build cache behavior.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/22-control-plane-development.md
- Generated: 2026-06-04T22:47:04.263Z
### Source Files
- `CONTRIBUTING.md`
- `control-plane/README.md`
- `control-plane/go.mod`
- `control-plane/Dockerfile`
- `control-plane/internal/api/api.go`
- `image/build.sh`
---
title: "Control plane development"
description: "Go 1.22+ build/test/vet in control-plane/, CGO sqlite note, compose --build loop, package map (docker, store, reaper, wake, api), and image build cache behavior."
---
The `control-plane/` tree is a Go 1.22 module (`github.com/sandboxed/control-plane`) that builds `sandboxd` (CGO + SQLite, shells out to the host `docker` CLI) and `runtimed` (static, CGO-free, baked into the sandbox base image). Day-to-day work is `go build` / `go test` / `go vet` under `control-plane/`, or rebuilding the stack with `docker compose build` after `./install.sh` has produced the base image.
## Toolchain and layout
| Item | Value |
|------|--------|
| Go version | `1.22` (`control-plane/go.mod`) |
| Module path | `github.com/sandboxed/control-plane` |
| Daemon entrypoint | `control-plane/cmd/sandboxd` |
| In-sandbox supervisor | `control-plane/cmd/runtimed` |
| SQL migrations | `control-plane/migrations/*.sql` → `/usr/local/share/sandboxd/migrations/` in the image |
| Container image tag | `sandboxed-control-plane:1.0.0` (`docker-compose.yml`) |
:::files
sandboxed/
├── docker-compose.yml # traefik + sandboxd (build context: control-plane/)
├── install.sh # base image + compose build + up
├── image/
│ ├── build.sh # sandboxed-base image (repo root context)
│ └── Dockerfile # stage 1: CGO_ENABLED=0 runtimed; stage 2: runtime
└── control-plane/
├── cmd/sandboxd/ # daemon: store, reconcile, reapers, API
├── cmd/runtimed/ # in-sandbox HTTP supervisor
├── internal/ # docker, store, api, wake, reaper, …
├── migrations/
├── Dockerfile # CGO sandboxd + docker-ce-cli runtime
└── go.mod
:::
## Local Go workflow
From the repository root:
```bash
cd control-plane
go build ./...
go test ./...
go vet ./...
```
<Note>
`go build ./...` compiles every package including `cmd/sandboxd` and `cmd/runtimed`. A successful local build of `sandboxd` requires **CGO** and a C toolchain (see below). `runtimed` builds without CGO.
</Note>
### CGO and SQLite
`sandboxd` depends on `github.com/mattn/go-sqlite3`, which requires CGO. The production image build sets this explicitly:
```dockerfile
RUN CGO_ENABLED=1 go build -trimpath -o /sandboxd ./cmd/sandboxd
```
`runtimed` is built with `CGO_ENABLED=0` inside `image/Dockerfile` so the sandbox base image carries a static binary with no extra native SQLite dependency.
<Warning>
On macOS or other non-Linux hosts, `go test ./...` may fail to link `go-sqlite3` without CGO and a working C compiler. The supported integration path is the Linux Docker stack (`install.sh` / `docker compose build`), which uses `golang:1.22-bookworm` for the CGO build stage.
</Warning>
At runtime, `store.Open` opens SQLite with WAL, busy timeout, and foreign keys (`file:…?_journal=WAL&_busy_timeout=5000&_fk=1` in `cmd/sandboxd/main.go`). Migrations are applied at startup from `SANDBOXD_MIGRATIONS` (default `/usr/local/share/sandboxd/migrations/`). When running a locally built `sandboxd` binary, the daemon falls back to `../../migrations` relative to the executable if the default path is missing.
## Full-stack dev loop
The installer and Compose file split **two** image builds:
```mermaid
flowchart LR
subgraph host["Host"]
install["install.sh"]
compose["docker compose build && up -d"]
end
subgraph images["Images"]
base["sandboxed-base:tag\nimage/Dockerfile\n(runtimed + toolchains)"]
cp["sandboxed-control-plane:1.0.0\ncontrol-plane/Dockerfile\n(sandboxd CGO)"]
end
install --> base
install --> compose
compose --> cp
base --> sandboxd_run["sandboxd docker run\nSANDBOXD_IMAGE"]
cp --> sandboxd_svc["sandboxd service :9000"]
```
<Steps>
<Step title="Bootstrap env and data dir">
Run `./install.sh` once (idempotent). It copies `.env.example` → `.env` if needed, creates `SANDBOXED_DATA_DIR`, and detects `docker` / `docker compose` (with optional `sudo`).
</Step>
<Step title="Build the sandbox base image">
`image/build.sh` builds from the **repo root** (`-f image/Dockerfile`) and tags `SANDBOXED_IMAGE` (default `sandboxed-base:1.0.0`). This stage compiles `runtimed` and installs Node, pnpm, uv, bun, and agent CLIs — the slow step on first install.
</Step>
<Step title="Build and start the control plane">
`docker compose build` rebuilds `sandboxd` when `control-plane/` changes, then `docker compose up -d` starts Traefik and sandboxd. API: `SANDBOXED_API_BIND` (default `127.0.0.1:9090` → container `:9000`).
</Step>
<Step title="Iterate on Go changes">
After editing `control-plane/`, run `docker compose build sandboxd && docker compose up -d sandboxd` (or `docker compose up -d --build`) and tail logs with `docker compose logs -f sandboxd`.
</Step>
</Steps>
<RequestExample>
```bash
# Rebuild only the control-plane service after a code change
docker compose build sandboxd
docker compose up -d sandboxd
curl -s http://127.0.0.1:9090/healthz
curl -s http://127.0.0.1:9090/readyz
```
</RequestExample>
Per-sandbox containers are **not** Compose services; `sandboxd` launches them at runtime from `SANDBOXD_IMAGE` on `SANDBOXED_NETWORK`.
## Package map
Core packages named in the control-plane README, plus collaborators wired from `cmd/sandboxd/main.go` and `internal/api`:
| Package | Responsibility |
|---------|----------------|
| `cmd/sandboxd` | Env wiring, SQLite open, boot `reconcile.Once`, background reapers/tailers, HTTP server, signals |
| `cmd/runtimed` | In-sandbox supervisor: dev server, tasks, Unix-socket HTTP API |
| `internal/docker` | Typed wrapper over `docker` CLI (`os/exec`); no policy defaults inside the package |
| `internal/store` | SQLite source of truth; single-writer channel; numbered `migrations/*.sql` |
| `internal/loopback` | Per-sandbox workspace directories under `SANDBOXED_DATA_DIR/workspaces` |
| `internal/traefik` | Preview-route Docker label generation |
| `internal/reconcile` | Boot-time convergence: Docker state → SQLite (orphans logged, not deleted) |
| `internal/reaper` | Idle stop (`docker stop`) and host memory pressure stop |
| `internal/wake` | Preview catch-all and `POST /wake/{id}`; admission + warming HTML |
| `internal/api` | HTTP mux: legacy `/sandbox*`, `/v1/*`, health, metrics, forward-auth |
| `internal/auth` | Service-token and preview-token middleware (optional) |
| `internal/runtime` | Client bridge to `runtimed` over the in-container Unix socket |
| `internal/activity` | Traefik access-log tailer, optional connection poller, inflight exec tracking |
| `internal/snapshot` | Manual snapshot/template hooks (auto-snapshotter disabled in OSS compose) |
| `internal/metrics` | Prometheus registry used by `/metrics` |
```text
┌──────────── cmd/sandboxd ────────────┐
│ store.Open → reconcile.Once │
│ idle.Run │ pressure.Run │ tailer │
└──────────────┬───────────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
▼ ▼ ▼
internal/store internal/docker internal/api.Server
(SQLite truth) (CLI shell-out) ├─ handlers (CRUD, exec)
▲ ▲ ├─ v1 (tasks, files, stop)
│ │ └─ wake via hostDispatch
internal/reaper internal/loopback │
internal/wake internal/traefik ▼
internal/reconcile internal/runtime ──────► runtimed (in container)
```
`internal/api.Server.Handler()` registers legacy routes (`POST /sandbox`, `GET /sandboxes`, exec, keepalive, purge, snapshots) and v1 routes (`POST /v1/sandboxes`, tasks, files, export). `cmd/sandboxd` wraps the API mux with `auth.Wrap`, then `hostDispatch` so Traefik preview Host headers hit `wake.Handler` before the API.
## Image build cache behavior
Docker layer caching dominates iteration time. Neither Dockerfile uses BuildKit cache mounts; caching follows normal layer invalidation rules.
### Control plane (`control-plane/Dockerfile`)
| Layer order | Cache hit when |
|-------------|----------------|
| `COPY go.mod go.sum` + `go mod download` | Module versions unchanged |
| `COPY . .` + `go build sandboxd` | Any source file under `control-plane/` changes |
Rebuilding `sandboxd` after a one-line Go change reuses the module-download layer but recompiles the binary layer. The runtime stage (Debian slim + `docker-ce-cli` + copied `sandboxd` + `migrations/`) rebuilds when the builder output or migration files change.
### Sandbox base (`image/Dockerfile`)
| Stage | Cache hit when |
|-------|----------------|
| `runtimed-builder`: `go mod download` | `control-plane/go.mod` / `go.sum` unchanged |
| `runtimed-builder`: `go build runtimed` | Any `control-plane/` source change |
| Runtime: `apt-get` / Node / pnpm / uv / bun / npm globals | Earlier Dockerfile instructions unchanged |
| `COPY image/skel`, `COPY runtimed` | Skeleton or `runtimed` binary changed |
`image/build.sh` uses the **repository root** as context so the builder can `COPY control-plane/`. The first `install.sh` run pays the full apt and toolchain cost; subsequent runs skip unchanged layers (CONTRIBUTING: “cached after the first run”). Changing only `control-plane/` code does **not** invalidate the base image unless you rebuild it explicitly — but you **must** rebuild `sandboxd` via Compose for API changes, and rebuild the base image if `cmd/runtimed` changed.
<Tip>
Tag bumps: `image/build.sh [version]` and `SANDBOXED_IMAGE` in `.env` control the base tag. Compose pins `sandboxed-control-plane:1.0.0` independently; bump the image name in `docker-compose.yml` when you need a clean control-plane tag.
</Tip>
## Runtime wiring (for debugging)
On startup, `sandboxd`:
1. Opens SQLite and applies migrations.
2. Runs `reconcile.Once` before accepting HTTP traffic.
3. Starts idle and pressure reapers, access-log tailer, and optional connection poller.
4. Listens on `SANDBOXD_ADDR` (default `0.0.0.0:9000` in the container).
Egress nftables, nginx registry-proxy watcher, and the hourly auto-snapshotter are **disabled** in the OSS compose build (`egressMgr == nil` in `main.go`); the reconciler and manual snapshot APIs still run with directory-backed workspaces.
<Check>
Verify a dev iteration: `curl -s http://127.0.0.1:9090/healthz` → `ok`, `readyz` → `ready`, and `docker compose logs sandboxd` shows `startup: reconcile complete` and `startup: listening`.
</Check>
## Design constraints (when changing code)
- Shell out to `docker` CLI in `internal/docker` unless there is a measured reason to adopt the SDK.
- Treat SQLite as the only durable lifecycle truth; reconciler converges Docker → DB.
- Keep new host dependencies out of the default compose path; optional features should be env-gated and default-off.
- Add or extend `_test.go` where behavior is non-trivial (`internal/traefik`, `internal/auth`, `internal/api`, `cmd/runtimed` already have tests).
## Related pages
<CardGroup>
<Card title="Contributing" href="/contributing">
Project layout, design constraints, and issue report fields.
</Card>
<Card title="Installation" href="/installation">
`./install.sh`, `.env`, base image build, compose up, health checks.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
Compose-backed env keys for preview, data dir, idle/reaper, and auth.
</Card>
<Card title="Observability" href="/observability">
`healthz` / `readyz`, Prometheus `/metrics`, compose logs.
</Card>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
SQLite status machine and reconcile-on-boot semantics.
</Card>
<Card title="runtimed reference" href="/runtimed-reference">
In-sandbox supervisor API bridged by `internal/runtime`.
</Card>
</CardGroup>
---
## 23. Uninstall and maintenance
> uninstall.sh flags (--images, --data, --all), managed-container cleanup, workspace retention defaults, docker compose ps/logs/restart sandboxd, and backup paths for SQLite and workspaces.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/23-uninstall-and-maintenance.md
- Generated: 2026-06-04T22:46:54.552Z
### Source Files
- `uninstall.sh`
- `install.sh`
- `AGENTS.md`
- `docker-compose.yml`
- `ARCHITECTURE.md`
---
title: "Uninstall and maintenance"
description: "uninstall.sh flags (--images, --data, --all), managed-container cleanup, workspace retention defaults, docker compose ps/logs/restart sandboxd, and backup paths for SQLite and workspaces."
---
`./uninstall.sh` tears down the Compose stack (`traefik`, `sandboxd`), removes every sandbox container labeled `sandboxed.managed=true`, and optionally drops built images or the entire `SANDBOXED_DATA_DIR` tree. By default it keeps workspaces, SQLite state, logs, and the git checkout; day-to-day maintenance uses `docker compose` from the repository root.
## Uninstall script
Run `./uninstall.sh` from the repository root (same directory as `docker-compose.yml`). The script loads `.env` when present, mirrors `install.sh` Docker/sudo detection, and never deletes the checkout itself.
| Invocation | Stack | Managed sandboxes | Network | Images | Data dir |
|---|---|---|---|---|---|
| `./uninstall.sh` | `compose down` | `docker rm -f` (label filter) | remove if lingering | kept | **kept** |
| `./uninstall.sh --images` | same | same | same | remove `SANDBOXED_IMAGE` + `sandboxed-control-plane:1.0.0` | kept |
| `./uninstall.sh --data` | same | same | same | kept | **deleted** (prompt) |
| `./uninstall.sh --all` | same | same | same | removed | **deleted** (prompt) |
| `./uninstall.sh --all --yes` | same | same | same | removed | deleted (no prompt) |
<ParamField body="--images" type="flag">
Removes `sandboxed-base:1.0.0` (or `SANDBOXED_IMAGE` from `.env`) and `sandboxed-control-plane:1.0.0`. Idempotent if images are already gone.
</ParamField>
<ParamField body="--data" type="flag">
Runs `rm -rf` on `SANDBOXED_DATA_DIR` (default `/var/lib/sandboxed`), including workspaces, SQLite, snapshots, templates, and logs under that tree. Requires typing `yes` unless combined with `--yes` / `-y`.
</ParamField>
<ParamField body="--all" type="flag">
Equivalent to `--images` plus `--data`.
</ParamField>
<ParamField body="--yes" type="flag">
Skips the destructive confirmation when `--data` or `--all` is set.
</ParamField>
### Execution order
<Steps>
<Step title="Stop the Compose stack">
`docker compose down --remove-orphans` stops `traefik` and `sandboxd`. A non-running stack is reported and skipped.
</Step>
<Step title="Remove managed sandbox containers">
Containers are selected only with `--filter label=sandboxed.managed=true`, then force-removed. Other Docker workloads on the host are untouched.
</Step>
<Step title="Remove the Docker network">
`SANDBOXED_NETWORK` (default `sandboxed_net`) is removed if it still exists.
</Step>
<Step title="Optional image and data removal">
`--images` and/or `--data` run last. Without `--data`, the script prints that data was kept at `SANDBOXED_DATA_DIR`.
</Step>
</Steps>
<Note>
After a default uninstall, re-run `./install.sh` to rebuild images (if removed), recreate the data dir if needed, and bring the stack back. A full `--all` uninstall suggests you may delete the repo folder manually; the script does not do that for you.
</Note>
## What uninstall removes vs keeps
```text
Host (outside repo checkout)
├── git checkout/ ← never touched by uninstall.sh
├── Docker: traefik, sandboxd ← compose down
├── Docker: s-<ulid> … ← label sandboxed.managed=true
├── network sandboxed_net ← removed if present
└── SANDBOXED_DATA_DIR/ ← KEPT unless --data / --all
├── workspaces/<id>/ ← per-sandbox bind mounts
├── state/sandboxd.db ← SQLite (WAL mode)
├── _snapshots/, templates/, library/
└── (logs may live here or under SANDBOXED_LOG_DIR)
```
Traefik’s Docker provider is constrained to `sandboxed.managed=true`, matching the uninstall filter and ensuring preview routes only target sandboxes this stack created.
<Warning>
`--data` / `--all` deletes **all** content under `SANDBOXED_DATA_DIR`, not individual sandboxes. To drop one workspace while the stack runs, use `POST /sandbox/{id}/purge` or the v1 equivalent (see sandbox operations).
</Warning>
## Workspace retention defaults
| Action | Container | Workspace on disk | SQLite rows |
|---|---|---|---|
| Default `./uninstall.sh` | removed | **retained** | **retained** |
| `DELETE /sandbox/{id}` (API) | destroyed | retained | retained (reconcile on next boot) |
| `POST /sandbox/{id}/purge` | destroyed | deleted | purged for that id |
| `./uninstall.sh --data` | N/A (stack gone) | entire data dir deleted | deleted with data dir |
Workspaces are plain directories at `SANDBOXED_DATA_DIR/workspaces/<id>/`, bind-mounted at `/home/sandbox`. They survive idle `docker stop`, host reboot, default uninstall, and container destroy without purge.
## Day-to-day maintenance
Run these from the repository root (where `docker-compose.yml` lives).
<CodeGroup>
```bash title="Stack status"
docker compose ps
```
```bash title="Control-plane logs"
docker compose logs -f sandboxd
```
```bash title="Restart sandboxd only"
docker compose restart sandboxd
```
```bash title="Running sandbox containers"
docker ps --filter label=sandboxed.managed=true
```
</CodeGroup>
| Task | Command / signal |
|---|---|
| Rebuild control plane after Go changes | `docker compose build sandboxd && docker compose up -d sandboxd` |
| Full stack restart | `docker compose restart` or `docker compose up -d` |
| Reload API auth tokens from env | `docker compose restart sandboxd` (SIGHUP re-reads env in-process when configured) |
| Apply `.env` port/domain changes | Edit `.env`, then `docker compose up -d` |
| Liveness / readiness | `curl -s http://127.0.0.1:9090/healthz` and `/readyz` (host bind from `SANDBOXED_API_BIND`) |
<Info>
`install.sh` is idempotent: existing `.env` is left unchanged, the data directory is created if missing, base and control-plane images are built, then `docker compose up -d`.
</Info>
## Data directory layout and backup
Defaults come from `.env` / `docker-compose.yml`. `SANDBOXED_DATA_DIR` must be an **absolute** path with a **symmetric** host:container bind mount so `sandboxd` paths match host `docker run -v` paths.
| Path | Contents | Backup approach |
|---|---|---|
| `SANDBOXED_DATA_DIR/workspaces/<id>/` | User project files, agent output | Copy the directory (rsync/tar). No separate “image file” in the OSS build. |
| `SANDBOXED_DATA_DIR/state/sandboxd.db` | Sandbox lifecycle, ports, tasks metadata (SQLite WAL) | Stop `sandboxd` (or entire stack), then copy `sandboxd.db` and any `sandboxd.db-wal` / `sandboxd.db-shm` sidecars; or use `sqlite3 .backup`. |
| `SANDBOXED_DATA_DIR/state/traefik-tail.offset` | Access-log tail checkpoint | Optional; safe to omit for disaster recovery |
| `SANDBOXED_DATA_DIR/_snapshots/<id>/` | Experimental snapshot tree | Copy if you rely on snapshots |
| `SANDBOXED_DATA_DIR/templates/`, `library/` | Template/snapshot library images | Copy if used |
| `SANDBOXED_LOG_DIR/traefik-access.log` | Traefik access log (default under data dir) | Copy for audit; not required to restore sandboxes |
Override the database file location with `SANDBOXD_DB` in the `sandboxd` service environment (default: `{SANDBOXED_DATA_DIR}/state/sandboxd.db`).
<Steps>
<Step title="Quiesce writes">
`docker compose stop sandboxd` (or `docker compose down` for a full pause).
</Step>
<Step title="Copy workspaces">
Archive each `workspaces/<id>/` you need, or the whole `workspaces/` tree.
</Step>
<Step title="Copy SQLite state">
Archive `state/sandboxd.db` and WAL sidecars together, or run a SQLite backup against the DSN path.
</Step>
<Step title="Start the stack">
`docker compose up -d`. On boot, the reconciler converges Docker containers to SQLite (SQLite is source of truth).
</Step>
</Steps>
<Tip>
For a running system without stopping the stack, prefer per-sandbox workspace copies over copying `sandboxd.db` while `sandboxd` is writing. ARCHITECTURE documents workspace backup as copying the directory; state backup as copying the DB file when quiesced.
</Tip>
## Managed-container cleanup without full uninstall
To clear sandboxes but leave the control plane up:
```bash
docker ps -aq --filter label=sandboxed.managed=true | xargs -r docker rm -f
```
Use the API for selective lifecycle: `POST /v1/sandboxes/{id}/stop` frees RAM but keeps the workspace; `DELETE /sandbox/{id}` destroys the container and keeps files; `POST /sandbox/{id}/purge` removes both.
## Uninstall vs install responsibilities
| Component | `install.sh` | Default `uninstall.sh` | `uninstall.sh --all` |
|---|---|---|---|
| `.env` | create from example if missing | untouched | untouched |
| `SANDBOXED_DATA_DIR` | `mkdir -p` | preserved | removed |
| Base + control-plane images | build | preserved | removed (with `--images`) |
| Compose services | `up -d` | `down` | `down` |
## Related pages
<CardGroup>
<Card title="Installation" href="/installation">
Prerequisites, `./install.sh`, `.env` bootstrap, and health checks before you need uninstall.
</Card>
<Card title="Workspaces and persistence" href="/workspaces-persistence">
Directory layout, bind mounts, isolation, and what survives stop/destroy.
</Card>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
SQLite status machine, destroy vs purge, and reconcile-on-boot after maintenance.
</Card>
<Card title="Observability" href="/observability">
`healthz`/`readyz`, metrics, and log paths including Traefik access logs.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
`SANDBOXED_DATA_DIR`, `SANDBOXED_LOG_DIR`, network, and image env keys.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Compose log probes, Docker socket failures, and port conflicts when restarting the stack.
</Card>
</CardGroup>
---
## 24. Contributing
> Project layout, design constraints (Docker-only core, sqlite truth, docker CLI shell-out), issue report fields, and extension boundaries for integrators.
- Page Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/pages/24-contributing.md
- Generated: 2026-06-04T22:47:33.406Z
### Source Files
- `CONTRIBUTING.md`
- `ARCHITECTURE.md`
- `control-plane/README.md`
- `image/README.md`
- `LICENSE`
---
title: "Contributing"
description: "Project layout, design constraints (Docker-only core, sqlite truth, docker CLI shell-out), issue report fields, and extension boundaries for integrators."
---
sandboxed is maintained as a small, self-hostable stack: a Go control plane (`sandboxd`) in `control-plane/`, a per-sandbox base image in `image/`, Traefik edge config in `traefik/`, and compose/install scripts at the repository root. Changes should preserve the headline constraint—**runs fully on Docker with one command**—and the three non-negotiables below before adding features or host dependencies.
## Project layout
The repository splits operational surfaces by responsibility. Sandboxes are **not** compose services; `sandboxd` launches them at runtime as sibling containers on `${SANDBOXED_NETWORK}`.
:::files
sandboxed/
├── docker-compose.yml # traefik + sandboxd stack
├── install.sh # idempotent installer
├── uninstall.sh # stack + managed-container cleanup
├── .env.example # all configuration keys and defaults
├── traefik/ # static + dynamic (wake, forward-auth)
├── image/ # sandboxed-base image, skel, HOME_LAYOUT contract
└── control-plane/ # sandboxd + runtimed (Go 1.22+)
├── cmd/sandboxd/ # daemon entrypoint, env wiring, background workers
├── cmd/runtimed/ # in-sandbox supervisor (baked into base image)
├── internal/ # docker, store, reconcile, reaper, wake, api, …
└── migrations/ # numbered SQLite schema files
:::
| Path | Owns |
|------|------|
| `docker-compose.yml` | Traefik edge, `sandboxd` build, host socket + data-dir mounts |
| `control-plane/` | Lifecycle API, SQLite store, reconciler, reapers, wake path |
| `image/` | Base image Dockerfile, `/opt/sandbox-skel`, registry proxy config |
| `traefik/` | Entrypoints, catch-all wake router, optional TLS / forward-auth |
| `SANDBOXED_DATA_DIR` (default `/var/lib/sandboxed`) | `workspaces/<id>/`, `state/sandboxd.db`, logs |
<Note>
`image/scripts/dev/` are **debugging utilities only**. They do not update SQLite; containers created outside `sandboxd` are orphans (logged, not adopted). See [Control plane development](/control-plane-development) for the supported dev loop.
</Note>
## Design constraints
Contributions that fight these constraints are unlikely to merge unless they are optional, default-off, and clearly scoped.
### Docker-only core
New **host** dependencies beyond Docker Engine + Compose are a hard sell. The product promise is a single `./install.sh` (or `docker compose up -d --build`) with no Kubernetes, separate database server, or message queue. Optional features that need extra host tooling should ship behind env flags and stay off by default.
### SQLite is the source of truth
All durable sandbox lifecycle state lives in SQLite (WAL) under `SANDBOXED_DATA_DIR/state/sandboxd.db`. The boot-time reconciler in `internal/reconcile` **converges Docker to SQLite, never the other way**. Do not add state that exists only in Docker labels or container metadata without a matching row.
```mermaid
stateDiagram-v2
[*] --> creating
creating --> running
creating --> error
running --> stopped
stopped --> running : docker start / wake
running --> error
error --> [*]
stopped --> [*] : destroy / purge
```
| Rule | Implication |
|------|-------------|
| Every managed container `s-{ulid}` should have one `sandbox` row | Create/delete paths must update the store |
| Reconciler on boot | Rows drive `docker start`/`stop` and status repair; Docker drift is corrected toward the DB |
| Orphans (container/mount without row) | **Logged only** in v1—no auto-adoption; hand-debugged `docker run` is not destroyed |
| Migrations | Numbered SQL under `control-plane/migrations/`; applied by the migration runner at startup |
Workspace files persist on disk (`workspaces/<id>/` bind mounts), separate from SQLite but referenced by row paths in the schema.
### Docker CLI shell-out
`internal/docker` is a thin typed wrapper over the **`docker` CLI** (`os/exec`), not the Docker SDK. Policy (hardened `RunSpec`, networks, labels) lives in callers; the package only encodes invocation. Reach for the SDK only with a measured bottleneck and strong review justification.
The `sandboxd` container image bundles `docker-ce-cli` and talks to the host daemon via the mounted `/var/run/docker.sock` (see `control-plane/Dockerfile`).
<Warning>
The control plane is **root-equivalent on the host** through the Docker socket. Treat the host as a trust boundary; do not co-locate unrelated secrets. See [Production deployment](/production-deployment) for hardening when exposing the API beyond loopback.
</Warning>
### Other v1 trade-offs (conscious, not bugs)
| Area | v1 choice | Extension / hardening path |
|------|-----------|----------------------------|
| Workspace storage | Plain directory per sandbox | Disk quotas at fs/volume layer; multi-host sharding is out of core scope |
| Snapshots / templates | API present; **experimental** on directory storage | Contribute a directory-tar snapshot backend, or use plain workspace copies |
| Egress | Default-allow, no logging | Host firewall / proxy; not a control-plane default |
| `memory.high` cgroup throttle | Opt-in `SANDBOXED_SET_MEMORY_HIGH` | Needs host cgroup access from the control-plane container |
| External identity columns | Opaque passthrough (`external_*`); equality checks only | `sandboxd` does not interpret upstream IDs beyond storage |
Full rationale: [Overview](/overview) and the architecture doc’s “Design choices & current limitations” table.
## Development loop
<Steps>
<Step title="Control plane (Go 1.22+)">
From `control-plane/`:
```bash
go build ./...
go test ./...
go vet ./...
```
`sandboxd` requires **CGO** (`github.com/mattn/go-sqlite3`). Container builds set `CGO_ENABLED=1`. `runtimed` is CGO-free and is compiled into the sandbox base image, not the host.
</Step>
<Step title="Full stack">
Run `./install.sh` or `docker compose up -d --build`, then exercise the API on `SANDBOXED_API_BIND` (default `http://127.0.0.1:9090`). Verify:
```bash
curl -s http://127.0.0.1:9090/healthz
curl -s http://127.0.0.1:9090/readyz
```
The base image build is slow the first time; later runs cache it.
</Step>
<Step title="Base image / sandbox contract">
Image changes live under `image/` (Dockerfile, `skel/`, `etc/`). Layout and survival rules are defined in `image/HOME_LAYOUT.md`—update that contract before changing seeding or mount paths in control-plane code.
</Step>
</Steps>
Match surrounding style and comment density. Include tests for behavior changes where practical (`control-plane` has package-level `_test.go` coverage in `internal/api`, `internal/auth`, `internal/traefik`, `cmd/runtimed`, etc.).
## Reporting issues
Useful bug reports speed reproduction on a Docker-only host. Include:
| Field | What to attach |
|-------|----------------|
| Docker version | Output of `docker version` |
| OS / kernel | Host distribution and version |
| Control-plane logs | `docker compose logs sandboxd` (and Traefik if routing-related) |
| Triggering request | Exact HTTP method, path, and JSON body (redact secrets) |
| Sandbox id | ULID from create response, if applicable |
| `.env` deltas | Non-default `HTTP_PORT`, `PREVIEW_DOMAIN`, auth, data dir |
For preview or wake failures, also note the preview URL pattern and whether the sandbox was stopped (idle) or never started. See [Troubleshooting](/troubleshooting) for common `readyz`, port-80, ULID, and userns-remap failures.
## Extension boundaries for integrators
Integrate **through the HTTP API**, not by managing `docker run` lifecycle in parallel with `sandboxd`.
### Supported integration surfaces
| Surface | Use for |
|---------|---------|
| `POST /v1/sandboxes/{id}/tasks` | Headless coding agents (`runtimed`); SSE on `.../tasks/{taskId}/events` |
| `POST /sandbox`, `GET /sandboxes`, `GET /sandbox/{id}` | Create, list, inspect (legacy paths; v1 equivalents exist) |
| `PUT` / `GET /v1/sandboxes/{id}/files` | Workspace file CRUD without exec |
| `POST /sandbox/{id}/exec` | Non-interactive commands (no TTY/stdin) |
| `POST /sandbox/{id}/keepalive`, `POST /v1/sandboxes/{id}/stop` | Idle / manual stop semantics |
| `DELETE /sandbox/{id}` vs `POST /sandbox/{id}/purge` | Destroy container (keep workspace) vs destroy + delete workspace |
| Preview URLs | `http://s-{id}-{port}.preview.{PREVIEW_DOMAIN}` (see [Preview URL reference](/preview-url-reference)) |
| `GET /llm.txt` | **Public** machine-readable API contract (no bearer token); served from `SANDBOXD_LLM_TXT_PATH` (default `/etc/sandboxed/llm.txt`); 404 if unset |
<Info>
`GET /llm.txt` is exempt from service-token auth so agents and third-party tools can fetch the contract without credentials. Mount or deploy the file at `SANDBOXD_LLM_TXT_PATH`; edits are read per request (no redeploy required). When auth is enabled, all other external routes need `Authorization: Bearer <secret>` unless loopback-exempt—see [API authentication](/api-authentication).
</Info>
### Do not rely on (v1)
- **Hand-started containers** (`docker run --name s-...`) without a SQLite row—they remain orphans; the reconciler will not adopt them.
- **`image/scripts/dev/`** for production lifecycle—they bypass the store.
- **State only in Docker** (labels, inspect output) without a store migration and reconciler path.
- **Interactive exec** (TTY/stdin)—the exec API is non-interactive by design.
- **Snapshots/templates** as production guarantees—they are experimental on directory-backed storage until a contributed backend lands.
### Safe extension patterns
- **Env injection at create** — pass provider keys in `POST /sandbox` body `env` so tasks and exec see them.
- **External identity** — store upstream user/app ids in `external_*` columns; `sandboxd` treats them as opaque passthrough for claim/purge hooks.
- **Private previews** — `visibility=private` + forward-auth (Traefik → `/forward-auth`); see [Private previews](/private-previews).
- **Custom base image** — set `SANDBOXD_IMAGE` / `SANDBOXED_IMAGE`; keep `HOME_LAYOUT.md` contract if you change skel paths.
- **Hardening at the edge** — TLS, wildcard DNS, API tokens, egress rules on the host—without forking core lifecycle logic.
Provider-neutral agent integration: any coding CLI that runs inside the sandbox (OpenCode, Claude Code, etc.) is invoked via tasks or exec; no hosted model vendor is required in core.
## Contribution guidelines
- **Keep the core lean** — prefer config toggles and host-level hardening over growing `sandboxd` for every tenant policy.
- **Preserve SQLite ↔ Docker direction** — new lifecycle features need store columns, migrations, and reconciler awareness.
- **Prefer CLI shell-out** in `internal/docker` unless profiling proves otherwise.
- **Document contract changes** in `image/HOME_LAYOUT.md` when touching workspace layout or seeding.
- **License** — contributions are accepted under the [MIT License](https://github.com/tastyeffectco/sandboxes/blob/main/LICENSE) (Copyright (c) 2026 sandboxed contributors).
Pull requests that tighten isolation, snapshot backends, or auth defaults are especially welcome once they respect the constraints above.
## Related pages
<CardGroup>
<Card title="Control plane development" href="/control-plane-development">
Go build/test/vet, CGO sqlite note, package map, and compose rebuild loop.
</Card>
<Card title="v1 API reference" href="/v1-api-reference">
Public `/v1/sandboxes` routes, error envelope, tasks, and files API.
</Card>
<Card title="Legacy API reference" href="/legacy-api-reference">
`/sandbox*` routes, healthz/readyz, metrics, and `GET /llm.txt`.
</Card>
<Card title="Architecture (overview)" href="/overview">
sandboxd, Traefik, runtimed, and create → task → preview path.
</Card>
</CardGroup>
---