# After 30 Minutes: What You Now Know and Where to Go Next

> A closing map of what a reader should understand after this wiki — the full build→ask→propose→validate→sandbox→promote flow, the safety boundaries (no code authoring, no shell, no external I/O), and the key limitations to keep in mind (fallback scaffold, keyword-only retrieval, no multi-step planning, no UI). Suggests concrete next experiments: run demo.py, inspect graph.db with sqlite3, try a goal that triggers the FALLBACK branch, or add a new regex to extract.py and re-run the harness to verify the sha changes.

- Repository: yoheinakajima/activegraph-selfgraph
- GitHub: https://github.com/yoheinakajima/activegraph-selfgraph
- Human wiki: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393
- Complete Markdown: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393/llms-full.txt

## Source Files

- `README.md`
- `selfgraph/cli.py`
- `harness/reproduce.sh`
- `REPRODUCE.md`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [README.md](README.md)
- [selfgraph/cli.py](selfgraph/cli.py)
- [selfgraph/propose.py](selfgraph/propose.py)
- [selfgraph/guardrails.py](selfgraph/guardrails.py)
- [selfgraph/sandbox.py](selfgraph/sandbox.py)
- [selfgraph/extract.py](selfgraph/extract.py)
- [harness/reproduce.sh](harness/reproduce.sh)
- [REPRODUCE.md](REPRODUCE.md)
</details>

# After 30 Minutes: What You Now Know and Where to Go Next

This page is a closing map for anyone who has just worked through the selfgraph wiki. It consolidates the full pipeline into one mental model, names every safety boundary the agent enforces by construction, calls out the known limitations you need to hold in your head before extending anything, and gives you a handful of concrete next moves you can run from the terminal right now.

After reading this you should be able to explain the system to a colleague in under five minutes, predict whether a proposed patch will be accepted or rejected by the guardrails, know exactly which parts of the design are a deliberate sketch rather than a hardened guarantee, and have a clear path into hands-on experimentation.

---

## The Complete Pipeline in One View

The six commands map directly to six phases of a single cycle. Every phase is implemented in a distinct module; none of them calls an LLM (unless `ANTHROPIC_API_KEY` is set, in which case only `extract.py` is affected — and only additively).

```text
┌─────────────────────────────────────────────────────────┐
│  Phase        Command / entry point       Module         │
├─────────────────────────────────────────────────────────┤
│  BUILD        python -m selfgraph build   ingest.py      │
│               (walk repo + introspect     extract.py     │
│                activegraph module)                       │
├─────────────────────────────────────────────────────────┤
│  ASK          python -m selfgraph ask     query.py       │
│               (keyword retrieval from                    │
│                graph nodes)                              │
├─────────────────────────────────────────────────────────┤
│  PROPOSE      python -m selfgraph propose propose.py     │
│               (compose PatchProposal                     │
│                from extracted nodes)                     │
├─────────────────────────────────────────────────────────┤
│  VALIDATE     (runs inside propose/       guardrails.py  │
│                promote automatically)                    │
├─────────────────────────────────────────────────────────┤
│  SANDBOX      (runs inside propose        sandbox.py     │
│                automatically; promote                    │
│                runs it again)                            │
├─────────────────────────────────────────────────────────┤
│  PROMOTE      python -m selfgraph promote cli.py         │
│               (re-validates + applies     sandbox.py     │
│                to live graph)                            │
└─────────────────────────────────────────────────────────┘
            state persists to .selfgraph/graph.db
```

### BUILD → EXTRACT

`cmd_build` calls `ingest_paths` (walks every `.py` and `.md` file in the repo) and `ingest_module_docs` (introspects the `activegraph` package as a synthetic `module://…` corpus), then immediately calls `extract_capabilities`. The result is a set of graph nodes — `Capability`, `API`, `Behavior`, `EventType`, `ObjectType`, `Constraint`, `AuthorityRule` — stored via the SQLite event log at `.selfgraph/graph.db`.

Sources: [selfgraph/cli.py:48-57]()

Extraction is deterministic by default. Two regex paths exist; which one runs is controlled by `SELFGRAPH_OBJECTTYPE_MATCH`. Setting it to `literal` runs only the original capitalized-identifier regex; setting it to `relaxed` (the default in normal use) also runs a constructor-call pattern that captures lowercase pack ObjectTypes like `"company"` or `"document"`. This is the A/B variable the research harness measures.

Sources: [selfgraph/extract.py:36-60]()

### ASK

`answer_question` performs keyword-overlap retrieval over node data fields. There is no embedding model, no BM25, no semantic understanding — the score is literally the count of matching tokens between your question and each node's stored data. The answer cites node IDs so you can inspect the raw objects.

### PROPOSE

`propose_patch_for` scans the graph for `Behavior`, `EventType`, and `ObjectType` nodes and then assembles a `PatchProposal` object with a `changes` list. The logic is:

1. Add an `ObjectType` state bucket derived from the goal text.
2. Add a `Task` object so work is trackable.
3. Try to bind existing `Behavior` nodes whose `on=` event types overlap goal keywords (keyword match, not semantic). If any match, emit `bind_behavior` changes.
4. **If no behavior matched** — the FALLBACK branch — emit the built-in atom/snapshot/`ROLLS_UP_INTO` scaffold instead. The rationale string is prefixed `[FALLBACK]` and scaffold objects carry `"source": "selfgraph-fallback-scaffold"` and `"used_fallback_scaffold": true` on the proposal.
5. Add a scoped `Policy`.
6. Add `Evaluation` criteria.

The FALLBACK branch is not a degraded path you can avoid — it fires whenever the keyword overlap is zero, which is most novel goals. Recognizing `[FALLBACK]` in the output is important: it tells you the structure was templated, not discovered.

Sources: [selfgraph/propose.py:84-121]()

### VALIDATE

`validate_proposal` in `guardrails.py` runs two checks:

| Check | Mechanism | Limitation |
|---|---|---|
| Banned-token scan | Substring search over the entire proposal payload | Easy to evade; also false-positives on docs that *mention* `subprocess` |
| Structural check | Each change must have a `kind` in `ALLOWED_KINDS`; `bind_behavior` names must exist in the graph's `Behavior` nodes; `add_object` for `AuthorityRule`/`Capability` requires `approved_by`; policies may not declare `can_approve` | Only fires on the kinds it knows about |

`ALLOWED_KINDS` is exactly: `add_object`, `add_relation`, `add_policy`, `add_state_bucket`, `add_task`, `add_evaluation`, `bind_behavior`. Any other `kind` value is rejected.

Sources: [selfgraph/guardrails.py:22-126]()

### SANDBOX

`sandbox_apply` forks the runtime before applying any changes. When a SQLite-backed runtime is present it calls `Runtime.fork(at_event=last_event)`, giving a real copy-on-write fork at the current event cursor. When the runtime is in-memory it falls back to replaying events into a fresh `Graph` via the internal `_replay_event` entry point.

After applying the proposal's changes in the fork, it emits a synthetic `TestEvent` object and computes a structural diff (added objects vs. added relations). If `promote=False` (the default during `propose`) the live graph is untouched.

Sources: [selfgraph/sandbox.py:16-73]()

### PROMOTE

`cmd_promote` does **not** trust a cached `validated` status. It calls `validate_proposal(graph, pid, mutate_status=False)` fresh against the current persisted graph, then if that passes calls `sandbox_apply(..., promote=True)` which applies changes to the live graph with `actor="promote"` and stamps the proposal status to `"applied"`.

Sources: [selfgraph/cli.py:83-101]()

---

## PatchProposal Lifecycle

```text
             propose_patch_for()
                     │
                     ▼
                 [ draft ]
                     │
         validate_proposal()
           ┌─────────┴──────────┐
           ▼                    ▼
      [ validated ]         [ rejected ]
           │
    sandbox_apply(promote=False)
    (diff shown; live graph unchanged)
           │
      user runs `promote`
           │
    validate_proposal(mutate_status=False)
    ← re-check against current graph ─────►  [ rejected ]
           │
    sandbox_apply(promote=True)
           │
           ▼
       [ applied ]
```

The transitions are enforced by convention at two call sites, not by a state machine. `sandbox_apply` refuses to act on a proposal that does not carry `status == "validated"`, and `cmd_promote` re-runs validation before calling it.

Sources: [selfgraph/cli.py:83-101](), [selfgraph/sandbox.py:28-35]()

---

## Safety Boundaries: What the Agent Cannot Do

These are hard boundaries enforced by the allowed change-kind list, not soft guidelines:

| Boundary | Mechanism |
|---|---|
| **No code authoring** | There is no change kind that emits a Python function. `bind_behavior` only references a behavior name already discovered in the graph. |
| **No shell execution** | `subprocess`, `os.system`, `popen`, `/bin/sh` are in `_BANNED_TOKENS`. No change kind can invoke a subprocess. |
| **No network or file I/O** | `urllib`, `requests.`, `socket.`, `open(` are banned tokens. Patches write only to the SQLite event store. |
| **No authority escalation** | `AuthorityRule` and `Capability` are `_PROTECTED_TYPES`; adding them without `approved_by` is a guardrail violation. Policies may not declare `can_approve`. |
| **No unknown behaviors** | `bind_behavior` with a name not already in `graph.objects(type="Behavior")` is rejected at validation time. |

Sources: [selfgraph/guardrails.py:21-38](), [README.md:109-130]()

---

## Known Limitations to Keep in Mind

These are documented design choices in v1, not bugs:

**1. FALLBACK scaffold is a built-in template, not discovered structure.**
When `_pick_behavior_bindings` returns an empty list, `propose.py` emits the atom/snapshot/`ROLLS_UP_INTO` shape that ships with selfgraph. Only the trigger `EventType` is observed from the graph; everything else is defaulted. The `[FALLBACK]` prefix in the rationale and `"source": "selfgraph-fallback-scaffold"` on the ObjectType objects are your signals.

**2. Keyword-only retrieval in `ask`.**
`answer_question` scores nodes by token overlap. Synonyms, paraphrasing, and domain terminology that differs from the node's stored text are invisible to it.

**3. No multi-step planning.**
`propose_patch_for` generates one `PatchProposal` per call. It does not chain proposals, reason about dependencies between proposals, or build a campaign. Each goal is treated independently.

**4. Capability extraction is a sketch.**
The regex heuristics miss some `@behavior` decorators, can misclassify API surface, and do not trace dynamic dispatch. The graph is good enough to ground proposals; it is not an authoritative manifest.

**5. Substring banlist is demo-grade.**
`_BANNED_TOKENS` can be evaded (e.g., string concatenation) and false-positives on documentation that mentions banned tokens. A production-grade version would use AST analysis.

**6. No UI, no auth, no remote store.**
The only persistence layer is a local SQLite file at `.selfgraph/graph.db`. There is no web interface, no access control, and no multi-user story.

Sources: [README.md:158-188](), [selfgraph/guardrails.py:27-33]()

---

## Concrete Next Experiments

### 1. Run the scripted demo end-to-end

```bash
pip install -r requirements.txt
python demo.py
```

This runs `build → ask → propose` in sequence against `.selfgraph-demo/graph.db` and prints the grounding trace. Read the output carefully: if the goal triggers the FALLBACK branch you will see `[FALLBACK]` in the rationale and `source: selfgraph-fallback-scaffold` on the added ObjectType nodes.

### 2. Inspect the graph directly after build

```bash
python -m selfgraph build .
sqlite3 .selfgraph/graph.db "SELECT id, type, json(data) FROM objects LIMIT 20;"
```

Every object the agent can see when composing a proposal is in this table. Inspecting it makes the "graph-grounded" claim concrete: you can verify which `Behavior` nodes were extracted and why a particular proposal chose (or failed to choose) a `bind_behavior` change.

### 3. Deliberately trigger the FALLBACK branch

```bash
python -m selfgraph propose "synthesize quarterly earnings into a dashboard"
```

A goal with no keyword overlap with any extracted `Behavior` name or `EventType` will produce a `[FALLBACK]` proposal. Compare this to a goal that does match — for example `"track project updates"` — and observe the structural difference in the `changes` list.

### 4. Add a regex to extract.py and verify the sha changes

Open `selfgraph/extract.py` and add a new pattern under the deterministic extraction section. Then run the reproduce harness:

```bash
unset ANTHROPIC_API_KEY
bash harness/reproduce.sh
```

The harness will report `MISMATCH` for the affected result file because the new pattern changes which nodes get extracted, which changes the object IDs proposals cite, which changes the JSONL bytes. This is the intended behavior: the sha-match table in `REPRODUCE.md` is a reproducibility contract, not a test you are supposed to pass after modifying the extractor.

Sources: [harness/reproduce.sh:23-46](), [REPRODUCE.md:1-30]()

### 5. Explore the adversarial guardrail slice

```bash
SELFGRAPH_OBJECTTYPE_MATCH=relaxed python -m harness.run_adversarial
cat harness/results/adversarial.jsonl | python -c "import sys,json; [print(json.loads(l)['caught'], json.loads(l)['goal'][:60]) for l in sys.stdin]"
```

This runs 28 mechanical injection attempts — banned-token payloads, unknown-behavior bindings, protected-type additions, disallowed change kinds — and shows which guardrail rule fired for each. It is the fastest way to build intuition for where the validator is strict and where it relies on the substring banlist.

---

## Summary

After 30 minutes in this repository, the central insight is that selfgraph's safety guarantee is **structural**, not semantic. The agent cannot propose shell execution, network calls, or new Python functions because there are no change kinds for those operations — the allowed surface is a closed list that the guardrails enforce on every proposal before it reaches the sandbox or the live graph. The LLM (when present) is additive and confined to the `build` phase; the proposal, validation, sandbox, and promote loop are deterministic by construction, which is why `harness/reproduce.sh` can assert byte-identical output across machines. The limitations — fallback scaffold, keyword retrieval, no multi-step planning, sketch-grade extraction — are all first-class design choices documented in the code and in the README, not implementation gaps waiting to be closed.

Sources: [README.md:109-130](), [REPRODUCE.md:52-68]()