# Setup, CLI Commands & State Persistence

> How to install (pip install -r requirements.txt, no API key required), the six CLI commands (build, ask, propose, promote, chat, demo), how state persists to .selfgraph/graph.db via Runtime.load/persist_to, and when to delete graph.db to force a cold rebuild.

- Repository: yoheinakajima/activegraph-selfgraph
- GitHub: https://github.com/yoheinakajima/activegraph-selfgraph
- Human wiki: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393
- Complete Markdown: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393/llms-full.txt

## Source Files

- `requirements.txt`
- `selfgraph/cli.py`
- `selfgraph/__main__.py`
- `REPRODUCE.md`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [requirements.txt](requirements.txt)
- [selfgraph/cli.py](selfgraph/cli.py)
- [selfgraph/__main__.py](selfgraph/__main__.py)
- [selfgraph/sandbox.py](selfgraph/sandbox.py)
- [selfgraph/ingest.py](selfgraph/ingest.py)
- [selfgraph/query.py](selfgraph/query.py)
- [REPRODUCE.md](REPRODUCE.md)
</details>

# Setup, CLI Commands & State Persistence

This page explains how to install and run `activegraph-selfgraph`, describes each of its six CLI commands, and explains exactly how state is stored and reloaded between separate process invocations. It is the entry point for a developer's first 30 minutes in the repo — no prior knowledge of `activegraph` is assumed.

The pipeline is **LLM-free by design**: the canonical measured results require no API key. An optional LLM augmentation pass exists in `selfgraph/extract.py`, but the reproducibility harness actively refuses to run while `ANTHROPIC_API_KEY` is set, unless overridden with `SELFGRAPH_HARNESS_ALLOW_LLM=1`. Every result file carries an `llm_augment_active: false` audit stamp.

---

## Installation

### Requirements

```bash
# Python 3.11+ required
pip install -r requirements.txt
```

The only runtime dependency is `activegraph==1.0.5.post2`, pinned for paper reproducibility. The optional `anthropic` package is commented out in `requirements.txt`; do not install it unless you explicitly want the LLM augmentation variant (which produces different output hashes).

Sources: [requirements.txt:1-8]()

### Verifying the LLM-free invariant

You can confirm no LLM calls are wired into the core pipeline:

```bash
grep -nE 'anthropic|Anthropic|messages\.create|claude|llm_provider|LLMProvider' \
     selfgraph/propose.py selfgraph/guardrails.py selfgraph/sandbox.py \
     harness/*.py
# (no output expected)
```

Sources: [REPRODUCE.md:70-76]()

---

## Entry Point

`selfgraph/__main__.py` is a two-line shim that delegates entirely to `selfgraph/cli.py`:

```python
# selfgraph/__main__.py
from selfgraph.cli import main
import sys

if __name__ == "__main__":
    sys.exit(main())
```

All commands are invoked as:

```bash
python -m selfgraph <command> [args...]
```

Sources: [selfgraph/__main__.py:1-5](), [selfgraph/cli.py:1-12]()

---

## The Six CLI Commands

The command registry in `selfgraph/cli.py` maps six names to handler functions:

```python
_COMMANDS = {
    "build":   cmd_build,
    "ask":     cmd_ask,
    "propose": cmd_propose,
    "promote": cmd_promote,
    "chat":    cmd_chat,
    "demo":    cmd_demo,
}
```

Sources: [selfgraph/cli.py:118-125]()

### Command Summary Table

| Command | Requires existing graph? | Writes to graph.db? | Typical use |
|---------|--------------------------|---------------------|-------------|
| `build [repo_path]` | No (always cold-creates) | Yes | Initial or forced rebuild |
| `ask "question"` | Yes | No | Query the capability graph |
| `propose "goal"` | Yes | Yes (sandbox only) | Draft + validate a patch |
| `promote <proposal_id>` | Yes | Yes (live graph) | Apply a validated proposal |
| `chat` | Yes | No | Interactive question REPL |
| `demo` | No | Yes | Scripted end-to-end demo |

### `build` — Ingest and Extract

```bash
python -m selfgraph build [repo_path]
```

Wipes any existing `graph.db`, walks the repository at `repo_path` (defaults to `.`), and ingests the `activegraph` module docs. Then runs the capability extractor.

Under the hood `cmd_build` calls `_open(create=True)`, which unconditionally removes an existing database and creates a new `Graph` + `Runtime(graph, persist_to=_DB_PATH)` pair:

```python
def cmd_build(args: list[str]) -> int:
    repo = args[0] if args else "."
    graph, _rt = _open(create=True)
    ingest_paths(graph, [repo])
    ingest_module_docs(graph, "activegraph", max_submodules=40)
    extract_capabilities(graph)
    print(summarize_capabilities(graph))
    return 0
```

Sources: [selfgraph/cli.py:48-57]()

**What gets ingested:**
- Every text file (`.py`, `.md`, `.toml`, `.yaml`, `.json`, `.cfg`, `.ini`, `.rst`, `.txt`) under `repo_path`, excluding `.git`, `__pycache__`, `.venv`, `node_modules`, and files larger than 200 KB.
- Each file becomes a `File` object in the graph; files over 2000 characters are split into linked `Chunk` objects via `FILE_HAS_CHUNK` relations.
- The `activegraph` package is introspected live (up to 40 submodules), producing synthetic `module://name` file objects from docstrings and signatures.

Sources: [selfgraph/ingest.py:26-30](), [selfgraph/ingest.py:83-122](), [selfgraph/ingest.py:125-169]()

### `ask` — Query the Capability Graph

```bash
python -m selfgraph ask "what can you do?"
python -m selfgraph ask "how would you implement forking?"
python -m selfgraph ask "list constraints"
```

Loads the persisted graph and routes the question to one of several graph-readers in `selfgraph/query.py`. This is keyword-overlap retrieval, not LLM inference — every answer cites the node IDs it came from.

Routing logic:
- Questions starting with `"what can you do"` or containing `"capabilities"` → `summarize_capabilities()`
- Questions starting with `"how would you implement"` → `_explain_implementation()` (stem-matched graph walk)
- Questions starting with `"list "` or `"show "` → `_list_by_type()` (type filter over graph)
- Everything else → `_grep_graph()` (substring search across all object data)

Sources: [selfgraph/query.py:34-52]()

### `propose` — Draft and Validate a Patch

```bash
python -m selfgraph propose "track project updates"
```

Calls `propose_patch_for(graph, goal)` to generate a `PatchProposal` object, then immediately runs `validate_proposal` to check it against guardrails. If validation passes, `sandbox_apply(..., promote=False)` runs the proposal in an isolated fork and prints a diff summary. The proposal is not applied to the live graph.

```python
def cmd_propose(args: list[str]) -> int:
    goal = " ".join(args) or "track project updates"
    graph, rt = _open()
    pid = propose_patch_for(graph, goal)
    report = validate_proposal(graph, pid)
    if report["ok"]:
        sandbox = sandbox_apply(graph, pid, runtime=rt, promote=False)
        print(f"[propose] to promote: python -m selfgraph promote {pid}")
    return 0 if report["ok"] else 1
```

The command prints the `proposal_id` to copy-paste into `promote`.

Sources: [selfgraph/cli.py:67-80]()

### `promote` — Apply a Validated Proposal

```bash
python -m selfgraph promote <proposal_id>
```

Re-validates the proposal against the **current** persisted graph (the graph may have changed since `propose` ran), then calls `sandbox_apply(..., promote=True)` to write the changes to the live graph. The re-validation uses `mutate_status=False` to prevent overwriting the existing lifecycle status:

```python
report = validate_proposal(graph, pid, mutate_status=False)
if not report["ok"]:
    print(f"[promote] revalidation failed: {report['violations']}")
    return 1
sandbox_report = sandbox_apply(graph, pid, runtime=rt, promote=True)
```

A `PatchProposal` moves through the lifecycle: `draft → validated → applied`. The transition to `applied` is stamped in the event log under `actor="promote"`.

Sources: [selfgraph/cli.py:83-101](), [selfgraph/sandbox.py:63-69]()

### `chat` — Interactive REPL

```bash
python -m selfgraph chat
```

Opens an interactive prompt backed by `repl(graph)` from `selfgraph/query.py`. Every question typed is passed to `answer_question(graph, q)`. Type `quit`, `exit`, or `:q` to exit. No writes to the graph are made.

```
selfgraph chat — try: 'what can you do?', 'how would you implement forking?', 'list constraints', 'quit'
> what can you do?
...
```

Sources: [selfgraph/cli.py:104-108](), [selfgraph/query.py:326-339]()

### `demo` — Scripted End-to-End Run

```bash
python -m selfgraph demo
```

Imports `demo` and calls `demo.run()`, which executes a canned ingest → ask → propose → promote sequence. This is equivalent to running `demo.py` directly and is useful for a quick smoke-test of the full pipeline.

Sources: [selfgraph/cli.py:110-115]()

---

## State Persistence: How `graph.db` Works

### Storage Location

All state is stored in `.selfgraph/graph.db` relative to the working directory. The constants in `cli.py` define this:

```python
_DB_DIR  = ".selfgraph"
_DB_PATH = f"{_DB_DIR}/graph.db"
_RUN_ID  = "selfgraph"
```

The directory is created automatically on first use. `graph.db` is a **SQLite event store** — the activegraph `SQLiteEventStore` appends one event record per `add_object`, `add_relation`, `patch_object`, etc.

Sources: [selfgraph/cli.py:30-32]()

### The `_open()` Function

Every command goes through `_open()` to get a `(Graph, Runtime)` pair:

```python
def _open(create: bool = False) -> tuple[Graph, Runtime]:
    Path(_DB_DIR).mkdir(exist_ok=True)
    if create and Path(_DB_PATH).exists():
        os.remove(_DB_PATH)
    if create or not Path(_DB_PATH).exists():
        graph = Graph(ids=IDGen(), run_id=_RUN_ID)
        rt = Runtime(graph, persist_to=_DB_PATH)
        return graph, rt
    # Reuse existing log: load() rebuilds the graph from the event store.
    rt = Runtime.load(_DB_PATH, run_id=_RUN_ID)
    return rt.graph, rt
```

Two distinct paths:

| Path | Trigger | What happens |
|------|---------|--------------|
| **Cold create** | `create=True` or no `graph.db` exists | Deletes existing file, makes fresh `Graph`, wraps in `Runtime(graph, persist_to=_DB_PATH)`. Future mutations are appended to the new store. |
| **Warm load** | `graph.db` already exists, `create=False` | Calls `Runtime.load(_DB_PATH, run_id=_RUN_ID)` which replays the full event log into a fresh in-memory `Graph`. The resulting `rt.graph` reflects the complete current state. |

Sources: [selfgraph/cli.py:35-45]()

### The Append-Only Event Log

The SQLite file is not a snapshot — it is an ordered event log. When `Runtime.load` opens it, it replays every event through `Graph._replay_event` in insertion order to reconstruct the current graph state. This is the same mechanism used by `Runtime.fork` (for sandboxed proposals) and the rollback precondition test in the harness.

The rollback test demonstrates this property explicitly: replaying all events for a run up to (but not including) the first `promote`-actor event reconstructs a snapshot **byte-identical** to the pre-promote state. All 72 relaxed-corpus proposals verified this on the reference machine.

Sources: [selfgraph/sandbox.py:102-108](), [REPRODUCE.md:189-205]()

### The Sandbox Fork

When `propose` or `promote` runs, `sandbox_apply` creates an isolated copy of the graph using `Runtime.fork`:

```
graph.db (live, run_id="selfgraph")
    └─ Runtime.fork(at_event=last_event, label="selfgraph-sandbox")
           └─ fork_graph  (separate run_id, same SQLite file)
```

The fork shares the SQLite file but operates under a distinct `run_id`, so sandbox changes neither contaminate the main pipeline nor show up in the live graph unless `promote=True` is passed. If the runtime is not SQLite-backed, `sandbox_apply` falls back to replaying all events into a fresh in-memory `Graph`.

Sources: [selfgraph/sandbox.py:79-99]()

---

## Lifecycle of a PatchProposal

```text
propose "goal"
  └─ propose_patch_for()  → PatchProposal (status: "draft")
  └─ validate_proposal()  → status: "validated" (or "rejected")
  └─ sandbox_apply(promote=False) → fork diff preview

promote <proposal_id>
  └─ validate_proposal(mutate_status=False) → re-check against current graph
  └─ sandbox_apply(promote=True) → apply to live graph
       └─ patch_object(status: "applied", actor="promote")
```

Sources: [selfgraph/cli.py:67-101](), [REPRODUCE.md:259-267]()

---

## When to Delete `graph.db` (Forcing a Cold Rebuild)

Delete `.selfgraph/graph.db` whenever you need to:

1. **Start fresh** — e.g., after large changes to the repo that should be re-ingested from scratch.
2. **Run `build` against a different target directory** — `build` always calls `_open(create=True)` and removes the existing file automatically.
3. **Reproduce canonical harness results** — `harness/reproduce.sh` wipes persisted state before running the pipeline cold to ensure stable output hashes.
4. **Clear stale proposals** — proposals are stored in the graph; if you want to discard all previous `PatchProposal` objects, deleting and rebuilding is the simplest path (there is no `reset` command).

Running `python -m selfgraph build` is itself a safe delete-and-rebuild: the `_open(create=True)` path removes the file before starting. You do not need to delete it manually before a `build`.

Sources: [selfgraph/cli.py:37-38](), [REPRODUCE.md:10-14]()

---

## Data Flow Summary

```text
Working directory
  └─ .selfgraph/
       └─ graph.db   ← SQLite event store (append-only)

python -m selfgraph build .
  ┌──────────────────────────────────────────────┐
  │  ingest_paths(graph, ["."])                   │  → File + Chunk objects
  │  ingest_module_docs(graph, "activegraph")     │  → module:// File objects
  │  extract_capabilities(graph)                  │  → Capability, API, Behavior…
  └──────────────────────────────────────────────┘
        ↓ all mutations persisted via SQLiteEventStore

python -m selfgraph ask "…"
  └─ Runtime.load(graph.db)   ← replays event log into Graph
  └─ answer_question(graph, q) ← keyword lookup, no LLM

python -m selfgraph propose "…"
  └─ Runtime.load(graph.db)
  └─ propose_patch_for() + validate_proposal()
  └─ sandbox_apply(promote=False) ← fork, diff, no live write

python -m selfgraph promote <pid>
  └─ Runtime.load(graph.db)
  └─ sandbox_apply(promote=True) ← writes to live graph → event log
```

Sources: [selfgraph/cli.py:35-101](), [selfgraph/ingest.py:83-122]()

---

Every command reads from and writes to the same `.selfgraph/graph.db` file, which means `build`, `ask`, `propose`, and `promote` can safely run as separate processes — the event-store replay in `Runtime.load` ensures each process sees a consistent graph regardless of which process last wrote to the store. The single exception is `build`, which wipes and recreates the file, so it should not run concurrently with any other command.

Sources: [selfgraph/cli.py:11-45]()