# Run coding agents

> Submit tasks via POST /v1/sandboxes/{id}/tasks (prompt, agent default opencode), wake-on-submit, SSE on /events, env injection at create, and runtimed socket contract.

- Repository: tastyeffectco/sandboxes
- GitHub: https://github.com/tastyeffectco/sandboxes
- Human docs: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0
- Complete Markdown: https://grok-wiki.com/public/docs/tastyeffectco-sandboxes-f551c1a2e9a0/llms-full.txt

## Source Files

- `control-plane/internal/api/v1_tasks.go`
- `control-plane/internal/api/taskwatch.go`
- `control-plane/cmd/runtimed/task.go`
- `control-plane/internal/runtime/client.go`
- `control-plane/migrations/0005_tasks.sql`
- `image/Dockerfile`

---

---
title: "Run coding agents"
description: "Submit tasks via POST /v1/sandboxes/{id}/tasks (prompt, agent default opencode), wake-on-submit, SSE on /events, env injection at create, and runtimed socket contract."
---

Headless coding agents run inside each sandbox through the v1 tasks API: `sandboxd` accepts `POST /v1/sandboxes/{id}/tasks`, wakes stopped sandboxes when needed, proxies to in-container `runtimed` over a Unix socket at `<workspace>.mnt/.runtimed/sock`, streams progress as Server-Sent Events, and persists canonical results in SQLite so `GET` still works after stop or destroy.

## How tasks fit the stack

Every sandbox container runs `runtimed` as its main process (under `tini`). `sandboxd` never talks to the agent directly; it uses `runtime.Client` to call `runtimed` HTTP routes on the workspace loopback socket. The public v1 layer in `control-plane/internal/api/v1_tasks.go` translates NDJSON from runtimed into SSE for integrators.

```mermaid
sequenceDiagram
  participant Client
  participant sandboxd as sandboxd (v1_tasks)
  participant SQLite
  participant runtimed as runtimed (UDS)
  participant Agent as opencode

  Client->>sandboxd: POST /v1/sandboxes/{id}/tasks
  alt sandbox stopped
    sandboxd->>sandboxd: POST /wake/{id}
  end
  sandboxd->>runtimed: POST /tasks (task_id, prompt, agent)
  runtimed->>Agent: opencode run --format json
  sandboxd->>SQLite: INSERT task (running)
  sandboxd-->>Client: 202 Accepted + events_url
  par SSE to client
    Client->>sandboxd: GET .../tasks/{taskId}/events
    sandboxd->>runtimed: GET /tasks/{id}/events?since=
    runtimed-->>sandboxd: NDJSON events
    sandboxd-->>Client: SSE (id, event, data)
  and Background watcher
    sandboxd->>runtimed: GET /tasks/{id}/events?since=0
    runtimed-->>sandboxd: terminal done event
    sandboxd->>SQLite: UPDATE result_json
  end
```

<Note>
Provider API keys are not passed on the task body. Inject them at sandbox **create** via `POST /sandbox` `env` so they land in the container environment; `opencode` inherits `os.Environ()` when the agent runs.
</Note>

## Prerequisites

- A running `sandboxd` stack (`GET /healthz` → `ok`, `GET /readyz` → `ready`).
- A sandbox with port **3000** exposed (default app template serves the Vite dev server there).
- For a custom model account, create the sandbox with `env` on `POST /sandbox` (see [Env injection at create](#env-injection-at-create)).

## Submit a task

<Steps>
<Step title="Create or resolve a sandbox">

Use `POST /sandbox` (legacy create) or an existing sandbox id. Expose the preview port your app will use (typically `3000`).

</Step>
<Step title="POST the task">

```bash
API=http://127.0.0.1:9090
ID=<sandbox-ulid>

curl -s -XPOST "$API/v1/sandboxes/$ID/tasks" \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "create a Vite todo app and run it on port 3000",
    "agent": "opencode"
  }'
```

Omit `agent` to use the default **`opencode`**. Only `opencode` is accepted today; any other value returns `400 invalid_request`.

</Step>
<Step title="Stream events">

Use the `events_url` from the response (or build `/v1/sandboxes/{id}/tasks/{taskId}/events`):

```bash
TASK_ID=<task-ulid>
curl -N "$API/v1/sandboxes/$ID/tasks/$TASK_ID/events"
```

</Step>
<Step title="Fetch the canonical result">

Poll until `status` is no longer `running`:

```bash
curl -s "$API/v1/sandboxes/$ID/tasks/$TASK_ID"
```

The full `TaskResult` (files changed, build check, tokens, preview state) is returned once the background watcher has persisted it.

</Step>
</Steps>

### Wake-on-submit

If the sandbox row is **`stopped`**, `v1SubmitTask` calls the internal wake path (`POST /wake/{id}`) before submitting. Wake failures surface as v1 errors (for example `503 sandbox_capacity`). After wake, the sandbox must be **`running`**; other statuses yield `409 conflict`.

<Warning>
Wake-on-submit for **private** sandboxes that require a preview-token cookie on the browser wake path is **not** covered for unauthenticated callers. Service/operator-authenticated API callers skip private wake gating so task submit can still wake private sandboxes.
</Warning>

## Agent selection

| Field | Default | Constraint |
|-------|---------|------------|
| `prompt` | — | Required non-empty string |
| `agent` | `opencode` | Only `opencode` is implemented |

Inside the container, `runtimed` drives `opencode run --format json --dangerously-skip-permissions`, maps stdout NDJSON into canonical `message` and `tool` events, and enforces **one active task per sandbox** (`409 task_in_progress` if another task is running).

The base image also installs **Claude Code** (`claude`); use `POST /sandbox/{id}/exec` for ad hoc CLI runs. The tasks API does not yet expose a `claude` agent adapter.

## Env injection at create

Task submit does not accept an `env` map. Credentials and provider configuration must be present in the **container environment** at create time:

<ParamField body="env" type="object">
Map of environment variables passed to `docker run --env`. Keys must be non-empty and must not contain `=` or newlines; values must not contain newlines. Visible to `runtimed` and any agent process it spawns.
</ParamField>

<RequestExample>

```bash
curl -s -XPOST "$API/sandbox" -H 'Content-Type: application/json' \
  -d '{"ports":[3000],"env":{"ANTHROPIC_API_KEY":"sk-ant-..."}}'
```

</RequestExample>

`POST /v1/sandboxes` (project-scoped create) does not currently expose `env`; use legacy `POST /sandbox` when you need key injection before calling the v1 tasks API.

`runtimed`’s `StartTaskRequest` supports optional per-task `env`, but `sandboxd` does not forward it on v1 submit—the container env from create is the supported path.

## v1 task endpoints

| Method | Path | Purpose |
|--------|------|---------|
| `POST` | `/v1/sandboxes/{id}/tasks` | Submit task; wake if stopped |
| `GET` | `/v1/sandboxes/{id}/tasks/{taskId}` | Canonical result (SQLite) |
| `GET` | `/v1/sandboxes/{id}/tasks/{taskId}/events` | Live SSE stream |
| `POST` | `/v1/sandboxes/{id}/tasks/{taskId}/cancel` | Cancel in-flight task |

:::endpoint POST /v1/sandboxes/{id}/tasks
Submit a headless coding task. Returns **202 Accepted** with task metadata.

<ParamField path="id" type="string" required>
Sandbox ULID.
</ParamField>

<ParamField body="prompt" type="string" required>
Natural-language instruction for the agent (works in `~/workspace/app`).
</ParamField>

<ParamField body="agent" type="string">
Coding agent id. Defaults to `opencode`. Only `opencode` is supported.
</ParamField>

<ResponseExample>

```json
{
  "id": "01JABCDEF...",
  "sandbox_id": "01JXYZ...",
  "status": "running",
  "agent": "opencode",
  "events_url": "/v1/sandboxes/01JXYZ.../tasks/01JABCDEF.../events"
}
```

</ResponseExample>

| HTTP | `error.code` | When |
|------|----------------|------|
| 404 | `not_found` | Unknown sandbox |
| 409 | `conflict` | Sandbox not `running` after wake attempt |
| 409 | `task_in_progress` | Another task is active in runtimed |
| 502 | `sandbox_unavailable` | Cannot reach runtimed (socket down, etc.) |
| 503 | `sandbox_capacity` | Wake refused (memory admission) |

:::

:::endpoint GET /v1/sandboxes/{id}/tasks/{taskId}
Read the durable task outcome from SQLite. Works while the sandbox is running, after **stop**, and after **delete** (workspace gone; result retained).

While `status` is `running` or `result_json` is unset:

<ResponseExample>

```json
{
  "id": "01JABCDEF...",
  "sandbox_id": "01JXYZ...",
  "status": "running"
}
```

</ResponseExample>

When finished, the response promotes fields from `runtime.TaskResult` (`status`, `files_changed`, `build_ok`, `agent_message_final`, `tokens`, `failure_reason`, `preview_status_after`, etc.).

:::

:::endpoint GET /v1/sandboxes/{id}/tasks/{taskId}/events
Server-Sent Events stream proxied from runtimed’s newline-delimited JSON event log.

**Resume:**

- `Last-Event-ID: <n>` — continue after event id `n` (`since = n + 1`).
- `?since=<n>` — start at event index `n` (query wins when both are set).

Each SSE record:

```text
id: <monotonic_index>
event: <type>
data: <json>

```

Event types from runtimed:

| `event` | Role |
|---------|------|
| `status` | Phase updates (`phase` in data) |
| `message` | Agent text (`role`: `agent`, `agent_error`, …) |
| `tool` | Tool progress (`name`, `status`, `path`) |
| `build` | Post-task `pnpm build` outcome |
| `done` | Terminal; `data` is the full `TaskResult` |

Requires a **running** sandbox and a reachable runtimed socket (`502 sandbox_unavailable` otherwise).

:::

:::endpoint POST /v1/sandboxes/{id}/tasks/{taskId}/cancel
Ask runtimed to cancel the task (kills the agent process group). Idempotent at the runtimed layer.

<ResponseExample>

```json
{
  "id": "01JABCDEF...",
  "status": "cancelling"
}
```

</ResponseExample>

Cancel finalizes as `cancelled` with `failure_reason` `cancelled`. Timeout inside runtimed (default **10 minutes**) finalizes as `failed` / `agent_timeout`.

:::

## Task lifecycle inside runtimed

```mermaid
stateDiagram-v2
  [*] --> queued: POST /tasks
  queued --> checkpoint: runTask
  checkpoint --> agent_running: git checkpoint
  agent_running --> build_check: opencode finishes
  build_check --> health_check: pnpm build
  health_check --> done: preview probes
  done --> [*]: emit done + result.json

  agent_running --> cancelled: POST cancel
  agent_running --> failed: agent_error / timeout
  build_check --> failed: build failure path
```

Phases surfaced on `status` events include `starting`, `checkpoint`, `agent_running`, `build_check`, and `health_check`. The terminal `done` event carries the canonical `TaskResult`.

**Terminal `status` values:** `succeeded`, `failed`, `cancelled` (plus `running` while in flight).

**Common `failure_reason` values:** `agent_timeout`, `agent_error`, `cancelled`, `sandbox_unavailable`, `internal`.

Per-task artifacts under `.runtimed/tasks/<taskId>/`:

```text
.runtimed/tasks/<taskId>/
  events.jsonl    # append-only event log
  result.json     # canonical outcome (written at finish)
  agent.log       # agent stderr
```

## Durability and the background watcher

On accept, `sandboxd` inserts a row into the `task` table (`running`, `result_json` NULL) and starts `watchTask`, which tails runtimed’s event stream for up to **15 minutes** (three connect retries). When it sees `event: done`, it marshals `TaskResult` into SQLite via `FinishTask`.

<Info>
Clients do not need to stay connected to SSE for the result to be saved. A disconnected integrator can still `GET` the task once the watcher finishes.
</Info>

On `sandboxd` restart, `ReconcileTasks` finalizes orphaned `running` rows from `result.json`, re-attaches a watcher if the sandbox is still up, or marks `failed` / `sandbox_unavailable`.

The **idle reaper** skips sandboxes with a running task row so agents are not stopped mid-run; reaping resumes after the task ends.

**Retention trade-off:** SQLite keeps the canonical **result** after destroy. The full **event log** lives in the workspace and is **not** retained past sandbox destroy.

## runtimed socket contract

Transport is HTTP/1.1 over a Unix domain socket—no TCP port inside the sandbox.

| Property | Value |
|----------|--------|
| In-container path | `/home/sandbox/.runtimed/sock` (override `RUNTIMED_SOCKET`) |
| Host path | `<SANDBOXED_DATA_DIR>/workspaces/<id>.mnt/.runtimed/sock` |
| Client | `runtime.NewClient(socketPath)` — 5s timeout for control calls; unbounded for event streams |

|runtimed route|Method|Purpose|
|--------------|------|---------|
| `/status` | `GET` | `runtime.Status` — preview + `active_task` |
| `/tasks` | `POST` | Start task (`task_id`, `prompt`, `agent`, optional `env`, `timeout_s`) |
| `/tasks/{id}/events` | `GET` | NDJSON stream (`?since=` index) |
| `/tasks/{id}/cancel` | `POST` | Cancel task |

`sandboxd` assigns `task_id` (ULID) on v1 submit and passes it to runtimed so ids align across SQLite, SSE, and on-disk task dirs.

## Operational constraints

- **One task at a time** per sandbox (enforced in runtimed and surfaced as `task_in_progress`).
- **`POST /v1/sandboxes/{id}/stop`** is rejected while runtimed reports an active task; cancel the task first.
- **Interrupted tasks** (sandbox stop, runtimed crash) are finalized as `failed` / `sandbox_unavailable`, not resumed.
- **Claude Code / Codex** adapters are not wired through the tasks API yet; only OpenCode runs headlessly via `/tasks`.

## Verify end-to-end

1. Create sandbox with `ports: [3000]` (and optional `env` for your provider).
2. `POST /v1/sandboxes/{id}/tasks` with a concrete app prompt.
3. `curl -N` the `events` URL until you see `event: done`.
4. `GET /v1/sandboxes/{id}/tasks/{taskId}` — expect `status: succeeded` and `build_ok: true` for a healthy template build.
5. Open `http://s-{id}-3000.preview.localhost` (add `:$HTTP_PORT` if not using port 80).

## Related pages

<CardGroup>
<Card title="Quickstart" href="/quickstart">
Create → task → SSE → preview URL in one flow.
</Card>
<Card title="Example: agent todo app" href="/example-agent-todo">
End-to-end recipe with env injection and preview check.
</Card>
<Card title="runtimed reference" href="/runtimed-reference">
In-sandbox supervisor routes, env defaults, and protocol types.
</Card>
<Card title="v1 API reference" href="/v1-api-reference">
Full v1 request/response shapes and error envelope.
</Card>
<Card title="Wake, idle, and pressure" href="/wake-idle-reapers">
Stop-on-idle, wake-on-preview, and task-aware reaping.
</Card>
<Card title="Sandbox lifecycle" href="/sandbox-lifecycle">
Status machine, destroy vs purge, and reconcile-on-boot.
</Card>
</CardGroup>
