# sandbox.py — Fork, Diff, and Promote

> How sandbox_apply forks the SQLite-backed Runtime via Runtime.fork(at_event=...) or falls back to a structural replay on an in-memory graph, applies changes, emits a synthetic smoke TestEvent so newly bound behaviors fire, diffs added_objects and added_relations, and conditionally promotes to the live graph when promote=True. Covers the real-fork vs. in-memory fallback distinction and the single comment in sandbox.py marking where a public projector entry point would live.

- Repository: yoheinakajima/activegraph-selfgraph
- GitHub: https://github.com/yoheinakajima/activegraph-selfgraph
- Human wiki: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393
- Complete Markdown: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393/llms-full.txt

## Source Files

- `selfgraph/sandbox.py`
- `selfgraph/cli.py`
- `tests/test_smoke.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [selfgraph/sandbox.py](selfgraph/sandbox.py)
- [selfgraph/cli.py](selfgraph/cli.py)
- [tests/test_smoke.py](tests/test_smoke.py)
- [selfgraph/guardrails.py](selfgraph/guardrails.py)
- [selfgraph/propose.py](selfgraph/propose.py)
</details>

# sandbox.py — Fork, Diff, and Promote

`sandbox.py` is the safe-execution layer that sits between a validated `PatchProposal` and the live graph. Its single public function, `sandbox_apply`, forks the graph into an isolated copy, applies the proposal's changes there, emits a synthetic smoke event so any newly bound behaviors get a chance to fire, and produces a structural diff. Only if the caller explicitly passes `promote=True` are the same changes then written to the live graph and the proposal's status stamped `"applied"`. Until that flag is set, the live graph is never touched.

This separation matters because selfgraph's proposals are LLM-generated patches. The fork-then-diff contract ensures that even a malformed or unexpected proposal cannot corrupt live state without an explicit human promotion step. The file is also the single location in the codebase where a private ActiveGraph API (`Graph._replay_event`) is called, kept isolated from the rest of selfgraph by a clear comment marking where a public projector entry point would eventually live.

---

## Entry Point: `sandbox_apply`

```python
# selfgraph/sandbox.py:16-73
def sandbox_apply(
    graph: Graph,
    proposal_id: str,
    *,
    runtime: Optional[Runtime] = None,
    promote: bool = False,
) -> dict:
```

**Preconditions checked before any fork:**
1. The object at `proposal_id` must exist and have `type == "PatchProposal"`.
2. Its `data["status"]` must be `"validated"` — a `"draft"` or `"rejected"` proposal raises `ValueError` immediately.

Both guards fire before `_build_fork` is called, so no fork is constructed for an unvalidated proposal. The smoke test `test_promote_lifecycle_requires_validated_status` covers this path directly.

Sources: [selfgraph/sandbox.py:28-35]()

---

## Fork Construction: Real Fork vs. In-Memory Fallback

`_build_fork` makes exactly one decision: whether to use `Runtime.fork` (the SQLite-backed real fork) or fall back to a structural event replay into a fresh `Graph`.

```python
# selfgraph/sandbox.py:79-99
def _build_fork(graph: Graph, runtime: Optional[Runtime]):
    store = graph.store
    if runtime is not None and isinstance(store, SQLiteEventStore):
        try:
            last_event = graph.events[-1].id if graph.events else None
            if last_event:
                fork_rt = runtime.fork(at_event=last_event, label="selfgraph-sandbox")
                return fork_rt.graph, f"sqlite-fork@{last_event}"
        except Exception as e:
            print(f"[sandbox] real fork failed, falling back: {e}")

    # Fallback: structural copy by replaying events into a new Graph.
    fresh = Graph(ids=IDGen(), run_id=graph.run_id + "-sandbox")
    _replay_into(fresh, graph.events)
    return fresh, "in-memory-replay"
```

The two paths are summarized below:

| Path | Condition | Fork label | Isolation level |
|---|---|---|---|
| **Real fork** | `runtime` is provided AND `graph.store` is `SQLiteEventStore` AND `graph.events` is non-empty | `sqlite-fork@<event_id>` | True SQLite snapshot via `Runtime.fork(at_event=...)` |
| **In-memory replay** | Any other case (in-memory store, no runtime passed, empty event log, or real fork exception) | `in-memory-replay` | Fresh `Graph` with events projected via `_replay_into` |

The CLI's `cmd_propose` and `cmd_promote` both pass `runtime=rt` (the loaded `Runtime` instance), so in normal CLI use the real fork is taken when the graph is SQLite-backed.

Sources: [selfgraph/sandbox.py:79-99](), [selfgraph/cli.py:69-79](), [selfgraph/cli.py:98-101]()

---

## The Private API Comment

The in-memory fallback calls `Graph._replay_event`, a private method with a leading underscore. The code explicitly documents why:

```python
# selfgraph/sandbox.py:102-108
def _replay_into(target: Graph, events) -> None:
    """Project ``events`` into ``target`` without firing listeners or
    persisting. Calls ``Graph._replay_event``, the documented replay
    entry point used by ``Runtime.load`` and ``Runtime.fork``. If
    ActiveGraph ships a public equivalent later, swap it in here."""
    for ev in events:
        target._replay_event(ev)  # noqa: SLF001 — see docstring
```

This is the **only** private-API call in selfgraph. The module docstring reinforces it: *"This is the only private-API call in selfgraph; isolate it here."* The comment marks exactly where a public projector entry point would be wired once ActiveGraph exposes one.

Sources: [selfgraph/sandbox.py:1-7](), [selfgraph/sandbox.py:92-99](), [selfgraph/sandbox.py:102-108]()

---

## Applying Changes in the Fork

`_apply_changes` iterates the proposal's `changes` list against the fork graph (or the live graph when promoting). It handles all allowed v1 change kinds:

| `kind` | What happens |
|---|---|
| `add_object`, `add_state_bucket`, `add_task`, `add_evaluation` | Calls `graph.add_object(type, data, actor=actor)` |
| `add_relation` | Resolves `from_name`/`to_name` through a local `name_index`, calls `graph.add_relation(...)` |
| `add_policy` | Calls `graph.add_object("Policy", ...)` |
| `bind_behavior` | Calls `graph.add_object("BehaviorBinding", {...})` |
| unknown kinds | **Silently skipped** — a comment notes guardrails should have caught them |

The `name_index` is built from all existing objects at the start of the apply pass, then updated as new objects with `name` fields are added, so relations referencing objects *introduced earlier in the same proposal* resolve correctly.

Sources: [selfgraph/sandbox.py:114-156]()

---

## Synthetic Smoke Event

After changes are applied to the fork, `sandbox_apply` emits a `TestEvent` object:

```python
# selfgraph/sandbox.py:43-49
fork_graph.add_object("TestEvent", {
    "goal": proposal.data.get("goal"),
    "kind": "smoke",
}, actor="sandbox")
```

The inline comment is explicit about what this does and does not do:

> *"Simple test event: emit a synthetic Task.update event so any newly bound behaviors get a chance to fire (in-memory only; we don't spin up a fresh Runtime for the fork in v1)."*

The event is added only to the **fork graph**. No new Runtime is created for the fork, so the event fires in-memory and does not persist or trigger external side effects. The object's `kind: "smoke"` field distinguishes it from real operational events.

Sources: [selfgraph/sandbox.py:43-49]()

---

## Structural Diff

`_diff` computes which objects and relations exist in the fork graph but not in the original, by comparing object/relation ID sets:

```python
# selfgraph/sandbox.py:162-174
def _diff(before: Graph, after: Graph) -> dict:
    before_ids = {o.id for o in before.all_objects()}
    before_rel_ids = {r.id for r in before.all_relations()}
    added_objects = [
        {"id": o.id, "type": o.type,
         "label": o.data.get("name") or o.data.get("goal") or ""}
        for o in after.all_objects() if o.id not in before_ids
    ]
    added_relations = [
        {"id": r.id, "type": r.type, "source": r.source, "target": r.target}
        for r in after.all_relations() if r.id not in before_rel_ids
    ]
    return {"added_objects": added_objects, "added_relations": added_relations}
```

The diff is purely **additive** — the v1 change kinds only add objects and relations; there are no removal or mutation diff keys. The report printed to stdout and returned to the caller is:

```
[sandbox] fork diff: +N objects, +M relations
```

Sources: [selfgraph/sandbox.py:162-174](), [selfgraph/sandbox.py:59-60]()

---

## Promotion: Applying to the Live Graph

When `promote=True` and the report has no failures, `sandbox_apply` re-applies the same changes directly to the original `graph` (not the fork), then patches the proposal's status to `"applied"`:

```python
# selfgraph/sandbox.py:62-69
if promote:
    print(f"[sandbox] promoting proposal to main graph (user approved)")
    _apply_changes(graph, proposal.data["changes"], actor="promote")
    graph.patch_object(
        proposal_id, {"status": "applied"},
        actor="promote",
        rationale="Promoted from sandbox after user approval.",
    )
```

Currently there is no guard inside `sandbox_apply` itself that checks `report["ok"]` before promoting — the function's docstring says *"If `promote=True` (and the report has no failures)"*, but the `ok` flag is set to `True` unconditionally in the current v1 implementation (`"ok": True` at line 57). The safety net is the mandatory `validate_proposal` call upstream: both `cmd_propose` and `cmd_promote` in the CLI call it before reaching `sandbox_apply`, and `cmd_promote` re-validates even after the proposal was already stamped `"validated"` in a prior session.

Sources: [selfgraph/sandbox.py:55-72](), [selfgraph/cli.py:83-101]()

---

## Return Value

`sandbox_apply` returns a single `dict` regardless of whether promotion happened:

```python
{
    "proposal_id": "<id>",
    "fork_label":  "sqlite-fork@<event_id>" | "in-memory-replay",
    "applied_changes": <int>,
    "diff": {
        "added_objects":   [ {"id": ..., "type": ..., "label": ...}, ... ],
        "added_relations": [ {"id": ..., "type": ..., "source": ..., "target": ...}, ... ],
    },
    "ok": True,
}
```

The CLI uses `sandbox["diff"]["added_objects"]` and `sandbox["diff"]["added_relations"]` to print a summary line and to expose the `fork_label` at promotion time.

Sources: [selfgraph/sandbox.py:52-58](), [selfgraph/cli.py:74-79](), [selfgraph/cli.py:99-100]()

---

## Full Lifecycle Sequence

```mermaid
sequenceDiagram
    participant CLI as cli.py (cmd_propose / cmd_promote)
    participant GR as guardrails.py
    participant SB as sandbox.py
    participant Fork as Fork Graph
    participant Live as Live Graph

    CLI->>GR: validate_proposal(graph, pid)
    GR-->>CLI: report {ok, violations}

    CLI->>SB: sandbox_apply(graph, pid, runtime=rt, promote=False/True)
    SB->>SB: check proposal status == "validated"
    SB->>SB: _build_fork(graph, runtime)
    alt SQLite-backed runtime
        SB->>Fork: Runtime.fork(at_event=last_event)
        note right of Fork: label: sqlite-fork@<id>
    else In-memory / no runtime
        SB->>Fork: Graph._replay_event × N
        note right of Fork: label: in-memory-replay
    end
    SB->>Fork: _apply_changes(fork_graph, changes)
    SB->>Fork: add_object("TestEvent", {kind:"smoke"})
    SB->>SB: _diff(graph, fork_graph)
    alt promote=True
        SB->>Live: _apply_changes(graph, changes, actor="promote")
        SB->>Live: patch_object(pid, {status:"applied"})
    end
    SB-->>CLI: report {fork_label, applied_changes, diff, ok}
```

---

## CLI Integration

The two CLI commands that call `sandbox_apply` reflect the two distinct promotion modes:

**`python -m selfgraph propose <goal>`** (`cmd_propose`):
- Validates, then calls `sandbox_apply(..., promote=False)`.
- Prints the diff summary and tells the user what `promote` command to run.
- The live graph is never touched.

**`python -m selfgraph promote <proposal_id>`** (`cmd_promote`):
- Re-validates with `mutate_status=False` (does not overwrite `"validated"` status).
- Calls `sandbox_apply(..., promote=True)`.
- Both the fork diff and the live graph mutation happen in the same call.

Sources: [selfgraph/cli.py:67-101]()

---

## Test Coverage

`tests/test_smoke.py` covers the main behavioral contracts:

| Test | What it verifies |
|---|---|
| `test_proposal_accepted` | In-memory path produces a non-empty `diff["added_objects"]` |
| `test_sandbox_promote_changes_main_graph` | `promote=True` adds objects to the live graph and stamps status `"applied"` |
| `test_sandbox_sqlite_fork_isolates_main_graph` | Real fork label starts with `sqlite-fork@`; live graph object/relation counts are unchanged after `promote=False` |
| `test_promote_lifecycle_requires_validated_status` | `sandbox_apply` raises `ValueError` with `"validated"` in the message for a draft proposal |

The SQLite isolation test is the most important: it confirms that `promote=False` leaves `graph.all_objects()` and `graph.all_relations()` counts unchanged, and that the proposal remains `"validated"` rather than `"applied"`.

Sources: [tests/test_smoke.py:133-207]()

---

## Summary

`sandbox.py` provides the controlled detonation chamber for LLM-generated graph patches. Its `sandbox_apply` function forks the graph (preferring `Runtime.fork` on a SQLite-backed runtime, falling back to `_replay_into` with a private but documented API), applies changes to the isolated copy, emits a smoke `TestEvent` so newly bound behaviors fire in-memory, and diffs the result. The live graph is mutated only when `promote=True` is explicitly passed. The file deliberately concentrates the single private-API call (`Graph._replay_event`) in `_replay_into` with a comment marking where a public projector entry point would slot in once ActiveGraph exposes one — a design decision documented at `selfgraph/sandbox.py:102-108`.
