# propose.py & query.py — Graph-Grounded Proposals and Answers

> How propose_patch_for composes a PatchProposal from extracted Behaviors, EventTypes, and ObjectTypes already in the graph (and the [FALLBACK] scaffold path when no matching Behavior is found), and how answer_question uses keyword-overlap retrieval over node data — not semantic search — to answer questions. Covers the node and relation types emitted by a proposal (PatchProposal, Evaluation, Policy, BehaviorBinding, Task) and the GROUNDED_IN / PATCH_PROPOSES relations.

- Repository: yoheinakajima/activegraph-selfgraph
- GitHub: https://github.com/yoheinakajima/activegraph-selfgraph
- Human wiki: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393
- Complete Markdown: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393/llms-full.txt

## Source Files

- `selfgraph/propose.py`
- `selfgraph/query.py`
- `selfgraph/cli.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [selfgraph/propose.py](selfgraph/propose.py)
- [selfgraph/query.py](selfgraph/query.py)
- [selfgraph/cli.py](selfgraph/cli.py)
- [selfgraph/guardrails.py](selfgraph/guardrails.py)
- [selfgraph/extract.py](selfgraph/extract.py)
</details>

# propose.py & query.py — Graph-Grounded Proposals and Answers

`propose.py` and `query.py` are the two output surfaces of the selfgraph agent. `propose.py` turns a free-text user goal into a structured `PatchProposal` object wired into the capability graph — it composes changes only from primitives already extracted (Behaviors, EventTypes, ObjectTypes) and falls back to a labelled scaffold when none match. `query.py` answers questions about the graph using keyword-overlap retrieval over node data, explicitly avoiding LLM re-prompting; every answer cites the node IDs it came from.

Together they form the read-then-act loop: `query.py` lets a user inspect what the graph knows, and `propose.py` generates a proposal that modifies that same graph using only graph-native operations — no file writes, no shell, no arbitrary code.

---

## propose.py — Building a PatchProposal

### Entry point: `propose_patch_for`

`propose_patch_for(graph, goal, *, proposed_by="selfgraph")` is the single public function. It always returns the string ID of the newly created `PatchProposal` object.

Sources: [selfgraph/propose.py:31-215]()

The function follows a fixed five-step composition sequence:

```text
Step 1  add_object      ObjectType   ← state bucket named from goal keywords
Step 2  add_task        Task         ← goal encoded as a graph object
Step 3  bind_behavior   ...          ← re-use existing Behaviors (happy path)
        OR [FALLBACK]   add_object + add_relation ← atom/snapshot scaffold
Step 4  add_policy      Policy       ← scope + allowed creates
Step 5  add_evaluation  Evaluation × 4  ← testable success criteria
```

### Step 1 — State bucket (ObjectType)

`_bucket_name(goal)` turns the goal string into a safe ObjectType identifier by title-casing the first three alphanumeric words longer than two characters.

```python
# selfgraph/propose.py:230-234
def _bucket_name(goal: str) -> str:
    words = [w for w in goal.replace("/", " ").split() if w.isalnum()]
    keep = [w.capitalize() for w in words[:3] if len(w) > 2]
    return "".join(keep) or "GoalBucket"
```

The result (e.g., `"TrackProject"` for goal `"track project updates"`) becomes the `scope` key for the Policy added in Step 4.

### Step 2 — Task object

A `Task` with `goal`, `bucket`, and `status: "pending"` is always added. This ensures the user's intent is itself a first-class graph node that downstream Behaviors can subscribe to.

Sources: [selfgraph/propose.py:56-67]()

### Step 3 — Behavior binding vs. fallback scaffold

`_pick_behavior_bindings(extracted, goal)` scans every `Behavior` in the graph, checks whether any goal token (words longer than 3 chars) overlaps the behavior's `name` or `on=` event types, and returns up to three `(behavior_name, event_type)` pairs.

```python
# selfgraph/propose.py:265-278
def _pick_behavior_bindings(extracted: dict, goal: str) -> list[tuple[str, str]]:
    goal_tokens = {t.lower() for t in goal.split() if len(t) > 3}
    out: list[tuple[str, str]] = []
    for b in extracted["behaviors"]:
        name = b.data.get("name", "")
        on_list = b.data.get("on") or []
        for ev in on_list:
            if any(tok in (name + " " + ev).lower() for tok in goal_tokens):
                out.append((name, ev))
                if len(out) >= 3:
                    return out
    return out
```

**Happy path:** each matching pair becomes a `bind_behavior` change, pointing to an already-extracted Behavior. The guardrail in `validate_proposal` will later verify the named Behavior exists in the graph.

**Fallback path (no matching Behavior):** when `bound` is empty, the proposer switches to the built-in atom/snapshot scaffold. This is explicitly labelled in both the `rationale` string and the `used_fallback_scaffold: true` flag on the proposal object. The scaffold path:

1. Calls `_dominant_event_type(extracted)` to pick the most-referenced EventType from ingested Behaviors as the trigger — observed from the graph, not hardcoded.
2. Adds two new ObjectTypes: `{Bucket}Atom` (individual records) and `{Bucket}Snapshot` (aggregated view).
3. Adds a `ROLLS_UP_INTO` relation from atom to snapshot.
4. Calls `_related_object_types(extracted, goal)` (keyword overlap scoring) to find up to two existing ObjectTypes to wire via `GROUNDED_IN` — so even the fallback structure is anchored to real extracted nodes when possible.

Sources: [selfgraph/propose.py:84-137]()

```python
# selfgraph/propose.py:92-103  (fallback label)
trigger = _dominant_event_type(extracted, default="object.created")
atom_type = f"{bucket}Atom"
snapshot_type = f"{bucket}Snapshot"
rationale_lines.append(
    f"[FALLBACK] No discovered Behavior matched the goal. "
    f"Falling back to the built-in atom/snapshot scaffold "
    ...
)
```

### Step 4 — Scoped Policy

The `add_policy` change derives its `can_create` list from the set of ObjectType names introduced by `add_object`, `add_state_bucket`, and `add_task` changes in the same proposal — not from a hardcoded whitelist.

```python
# selfgraph/propose.py:141-145
creatable = sorted({
    c.get("type") or c.get("data", {}).get("name")
    for c in changes
    if c.get("kind") in ("add_object", "add_state_bucket", "add_task")
} - {None})
```

The policy always marks `AuthorityRule` as requiring approval, consistent with the `no-authority-mutation` rule seeded by `extract.py`.

Sources: [selfgraph/propose.py:139-160]()

### Step 5 — Evaluation criteria

Four testable success criteria are added as `Evaluation` objects:

| Criterion | What it checks |
|---|---|
| `A {bucket} object exists after apply` | State bucket was materialized |
| `At least one Task with goal='{goal}' exists` | Goal was encoded in graph |
| `No PatchProposal with status='rejected' was produced` | Apply was clean |
| `AuthorityRule objects are unchanged` | No authority mutation occurred |

Sources: [selfgraph/propose.py:163-174]()

### Graph materialization and edge wiring

After building the `changes` list, `propose_patch_for` calls `graph.add_object("PatchProposal", {...})` to write the proposal as a first-class Object. Two sets of edges are then added:

| Relation | From | To | Meaning |
|---|---|---|---|
| `PATCH_PROPOSES` | PatchProposal | Capability | Capabilities this proposal exercises |
| `PATCH_MODIFIES` | PatchProposal | ObjectType | Extracted types the proposal grounds itself in (via GROUNDED_IN changes) |

Sources: [selfgraph/propose.py:198-215]()

```python
# selfgraph/propose.py:200-213
for cap in graph.objects(type="Capability"):
    if cap.data.get("name") in {"propose-patch", "extract-capability"}:
        graph.add_relation(proposal.id, cap.id, "PATCH_PROPOSES", actor=proposed_by)
grounded_targets = {
    c.get("to_name") for c in changes
    if c.get("kind") == "add_relation"
    and c.get("rel_type") == "GROUNDED_IN"
    and c.get("to_type") == "ObjectType"
}
for ot in graph.objects(type="ObjectType"):
    if ot.data.get("name") in grounded_targets:
        graph.add_relation(proposal.id, ot.id, "PATCH_MODIFIES", actor=proposed_by)
```

### Node and relation types emitted by a proposal

```text
Objects added to the graph:
  PatchProposal   — the proposal itself; data.changes holds the full change list
  Task            — encodes the user goal as a graph node
  ObjectType(s)   — state bucket, and atom/snapshot types in fallback path
  Evaluation(s)   — testable success criteria (4 per proposal)
  Policy          — scoped permission boundary for the new bucket

Relations added:
  PATCH_PROPOSES   PatchProposal → Capability
  PATCH_MODIFIES   PatchProposal → ObjectType
  GROUNDED_IN      {Bucket}Atom  → existing ObjectType   (fallback path only)
  ROLLS_UP_INTO    {Bucket}Atom  → {Bucket}Snapshot       (fallback path only)
```

### `used_fallback_scaffold` flag

The boolean `used_fallback_scaffold` on the proposal's `data` dict is the machine-readable signal downstream readers use to surface when the structure came from the default scaffold rather than from discovered Behaviors. `trace_grounding` in `query.py` reads this flag and reports it in the citation header.

Sources: [selfgraph/propose.py:180-192](), [selfgraph/query.py:145-149]()

---

## query.py — Keyword-Overlap Retrieval

### Design philosophy

`query.py`'s module docstring states the intent precisely: this is keyword-overlap retrieval over the capability graph — not semantic understanding. Every answer cites node IDs; if nothing matches, the function says so rather than inventing an answer.

Sources: [selfgraph/query.py:1-9]()

### `answer_question` — routing logic

`answer_question(graph, question)` is the public dispatcher. It pattern-matches the lowercased question against five branches:

```python
# selfgraph/query.py:34-52
def answer_question(graph: Graph, question: str) -> str:
    q = question.strip().lower()
    if q.startswith("what can you do") or "capabilities" in q:
        return summarize_capabilities(graph)
    if q.startswith("how would you implement") or q.startswith("how would you "):
        topic = question.split(maxsplit=4)[-1] ...
        return _explain_implementation(graph, topic)
    if q.startswith("can you configure yourself") or "configure yourself" in q:
        return "(hardcoded guidance string)"
    if q.startswith("list ") or q.startswith("show "):
        return _list_by_type(graph, question)
    return _grep_graph(graph, question)
```

| Branch | Trigger | Implementation |
|---|---|---|
| `summarize_capabilities` | "what can you do", "capabilities" | Sorted list of Capability nodes with API edge count |
| `_explain_implementation` | "how would you implement" | Top-3 Capabilities, top-5 APIs/Behaviors by keyword hits |
| configure-yourself | "can you configure yourself" | Hardcoded guidance string pointing to `propose_patch_for` |
| `_list_by_type` | "list X", "show X" | Enumerate nodes by type name, up to 25 |
| `_grep_graph` | fallback | Substring search across all node data |

### `_explain_implementation` — the keyword-overlap engine

The core retrieval function converts the topic into tokens, applies crude stemming (strips `s`, `ing`, `ed` suffixes), then scores each node by counting stem hits against the full concatenation of its data values.

```python
# selfgraph/query.py:55-76
def _explain_implementation(graph: Graph, topic: str) -> str:
    tokens = [t for t in cleaned.lower().split() if len(t) > 2]
    stems = {t.rstrip("s").rstrip("ing").rstrip("ed") for t in tokens} | set(tokens)

    def hits(o):
        text = " ".join(str(v) for v in o.data.values()).lower()
        return sum(1 for s in stems if s in text)

    relevant_caps = sorted(graph.objects(type="Capability"), key=hits, reverse=True)[:3]
    relevant_apis = sorted(graph.objects(type="API"), key=hits, reverse=True)[:5]
    relevant_behaviors = sorted(graph.objects(type="Behavior"), key=hits, reverse=True)[:5]
    constraints = [c for c in graph.objects(type="Constraint") if hits(c)]
```

Results are shown only when `hits > 0`; node IDs are always included in the output so the answer is graph-cited. When no node overlaps the topic, the function explicitly says so rather than fabricating an answer.

Sources: [selfgraph/query.py:55-106]()

### `_list_by_type` — node type enumeration

`_list_by_type` checks which of a fixed candidate-type list (`Capability`, `API`, `Behavior`, `ObjectType`, `RelationType`, `Example`, `Constraint`, `AuthorityRule`, `PatchProposal`, `Evaluation`, `File`, `EventType`) appear in the question, then returns up to 25 objects per requested type.

Sources: [selfgraph/query.py:109-129]()

### `_grep_graph` — substring fallback

For unrecognised questions, `_grep_graph` does a case-insensitive substring search over the concatenated data values of every object in the graph, returning up to 20 matches. This is the last-resort path; it is not semantic search.

Sources: [selfgraph/query.py:304-320]()

### `trace_grounding` — citation walker

`trace_grounding(graph, proposal_id)` is a separate reader that walks `PATCH_PROPOSES`, `PATCH_MODIFIES`, and per-change `GROUNDED_IN` edges from a proposal and renders a full citation chain. Each extracted node is traced back to its `source_file_path` (set by `extract.py`'s `_scan_chunk`). Scaffold objects carry `source: selfgraph-fallback-scaffold` and are labelled `"[scaffold: built-in fallback shape, not extracted]"` rather than a real file path.

The per-change classification is delegated to `classify_change`, which returns one of four categories:

| Category | When |
|---|---|
| `grounded-in-extracted` | GROUNDED_IN targets an existing ObjectType, or bind_behavior targets a known Behavior |
| `built-in-scaffold` | `add_object` with `source=selfgraph-fallback-scaffold` |
| `self-authored` | `add_task`, `add_evaluation`, `add_policy`, `add_state_bucket` |
| `domain-new` | Any other `add_object` or `add_relation` introducing new state |

Sources: [selfgraph/query.py:132-301]()

---

## Full data-flow diagram

```mermaid
sequenceDiagram
    participant CLI as cli.py cmd_propose
    participant P as propose.py
    participant G as Graph (activegraph)
    participant GR as guardrails.py
    participant Q as query.py trace_grounding

    CLI->>P: propose_patch_for(graph, goal)
    P->>G: graph.objects(type="Behavior/EventType/ObjectType")
    note over P: _scan_self → extracted dict

    alt Behaviors match goal keywords
        P->>P: _pick_behavior_bindings → bind_behavior changes
    else No match
        P->>P: _dominant_event_type → trigger
        P->>P: _related_object_types → GROUNDED_IN targets
        P->>P: atom/snapshot scaffold changes
    end

    P->>G: graph.add_object("PatchProposal", {...})
    P->>G: graph.add_relation(proposal, capability, "PATCH_PROPOSES")
    P->>G: graph.add_relation(proposal, objecttype, "PATCH_MODIFIES")
    P-->>CLI: proposal_id

    CLI->>GR: validate_proposal(graph, proposal_id)
    GR-->>CLI: report {ok, violations}

    CLI->>Q: trace_grounding(graph, proposal_id)
    Q->>G: relations(source=proposal_id) PATCH_PROPOSES / PATCH_MODIFIES
    Q->>Q: classify_change per change entry
    Q-->>CLI: citation chain text
```

---

## Guardrails interaction

After `propose_patch_for` returns, `cli.py cmd_propose` immediately calls `validate_proposal`. The guardrail checks every change in `proposal.data["changes"]`:

- `kind` must be in `ALLOWED_KINDS` (`add_object`, `add_relation`, `add_policy`, `add_state_bucket`, `add_task`, `add_evaluation`, `bind_behavior`)
- No `subprocess`, `eval`, `exec`, or network tokens in any string value
- `add_object` targeting `AuthorityRule` or `Capability` requires `approved_by`
- `bind_behavior` names must exist in the graph — unknown behavior names are rejected

A `bind_behavior` referencing a Behavior that was never extracted will therefore fail validation, which means the happy-path binding route is only valid when the named Behavior is actually in the graph. This is intentional: it prevents the proposer from inventing capability names.

Sources: [selfgraph/guardrails.py:21-126]()

---

## CLI commands

| Command | Function | What it does |
|---|---|---|
| `python -m selfgraph ask "<question>"` | `cmd_ask` → `answer_question` | Queries the graph |
| `python -m selfgraph propose "<goal>"` | `cmd_propose` → `propose_patch_for` | Generates, validates, and sandbox-applies a proposal |
| `python -m selfgraph chat` | `cmd_chat` → `repl` | Interactive REPL loop over `answer_question` |

Sources: [selfgraph/cli.py:60-80](), [selfgraph/cli.py:104-107]()

---

## What to read first

1. **`selfgraph/propose.py:31-215`** — the single `propose_patch_for` function is self-contained and its comment blocks explain every design decision inline.
2. **`selfgraph/query.py:34-52`** — `answer_question` is four readable `if` branches; understanding the routing takes two minutes.
3. **`selfgraph/query.py:220-261`** — `classify_change` defines the four change categories that `trace_grounding` renders; knowing these categories makes grounding traces immediately readable.
4. **`selfgraph/guardrails.py:21-32`** — `ALLOWED_KINDS` and `_BANNED_TOKENS` define the entire v1 safety surface in 15 lines.

The key invariant tying these modules together: every node emitted by `propose.py` either cites an extracted graph node (and carries a `source_file_path` traceable to an ingested file) or carries `source: selfgraph-fallback-scaffold`, and `query.py`'s `trace_grounding` makes that distinction explicit for every change in the proposal. Sources: [selfgraph/query.py:188-199]()
