# guardrails.py — Validation Rules and PatchProposal Lifecycle

> The allowed v1 change kinds (add_object, add_relation, add_policy, add_state_bucket, add_task, add_evaluation, bind_behavior), the substring banlist (_BANNED_TOKENS), the _PROTECTED_TYPES list blocking AuthorityRule/Capability mutation, and the draft → validated → applied (or rejected) state machine enforced at two call sites. Explains why cmd_promote re-runs validate_proposal with mutate_status=False before applying so a stale validated marker cannot bypass the check.

- Repository: yoheinakajima/activegraph-selfgraph
- GitHub: https://github.com/yoheinakajima/activegraph-selfgraph
- Human wiki: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393
- Complete Markdown: https://grok-wiki.com/public/wiki/yoheinakajima-activegraph-selfgraph-41747ef30393/llms-full.txt

## Source Files

- `selfgraph/guardrails.py`
- `selfgraph/cli.py`
- `tests/test_smoke.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [selfgraph/guardrails.py](selfgraph/guardrails.py)
- [selfgraph/cli.py](selfgraph/cli.py)
- [selfgraph/propose.py](selfgraph/propose.py)
- [selfgraph/sandbox.py](selfgraph/sandbox.py)
- [tests/test_smoke.py](tests/test_smoke.py)
</details>

# guardrails.py — Validation Rules and PatchProposal Lifecycle

`selfgraph/guardrails.py` is the security boundary between the LLM-driven proposer and the live ActiveGraph event store. Every `PatchProposal` must pass through it before any graph mutation is allowed. It defines what changes the system can make at all (the v1 allowed-kind allowlist), what strings are unconditionally forbidden (the banned-token scan), which object types are off-limits without explicit human approval (the protected-types list), and the state machine—`draft → validated → applied` (or `rejected`)—that prevents a stale approval token from bypassing a re-check at promote time.

Understanding guardrails is essential before touching `propose.py`, `sandbox.py`, or the CLI, because every other subsystem defers to this module's verdict before writing anything to the persistent SQLite event store.

---

## Allowed v1 Change Kinds

The validator maintains a closed allowlist of the only `kind` values a change dict may carry. Any value outside this set is immediately rejected with a `disallowed-kind` violation.

```python
# selfgraph/guardrails.py:21-24
ALLOWED_KINDS = {
    "add_object", "add_relation", "add_policy", "add_state_bucket",
    "add_task", "add_evaluation", "bind_behavior",
}
```

| Kind | Effect when applied | Notes |
|---|---|---|
| `add_object` | Creates a new typed node | Blocked for `AuthorityRule`/`Capability` without approval |
| `add_relation` | Creates a directed edge between two existing nodes | Resolved by `(type, name)` index in `sandbox.py` |
| `add_policy` | Creates a `Policy` node scoped to an object type | `can_approve` key is also blocked (permission escalation) |
| `add_state_bucket` | Convenience alias for an `ObjectType`-class object | Treated as `add_object` internally in `sandbox_apply` |
| `add_task` | Creates a `Task` node with lifecycle fields | Same path as `add_object` in `_apply_changes` |
| `add_evaluation` | Creates an `Evaluation` node for acceptance criteria | Used by `propose_patch_for` to record success criteria |
| `bind_behavior` | Creates a `BehaviorBinding` linking an existing behavior to a scoped event | Behavior name must already exist in the graph |

Sources: [selfgraph/guardrails.py:21-24](), [selfgraph/sandbox.py:114-155]()

---

## The Banned-Token Scan (`_BANNED_TOKENS`)

Before per-change checks run, the validator performs a full-payload substring scan. The entire proposal data blob—including nested dicts and lists—is walked recursively, and any occurrence of any token in `_BANNED_TOKENS` fires a `banned-token` violation with a dotted path identifying where the match was found.

```python
# selfgraph/guardrails.py:27-32
_BANNED_TOKENS = (
    "subprocess", "os.system", "__import__", "exec(", "eval(",
    "shutil.rmtree", "open(", "urllib", "requests.", "socket.",
    "rm -rf", "curl ", "wget ", "/bin/sh", "/bin/bash", "popen",
    "compile(", "globals()", "setattr",
)
```

The recursive walker (`_scan_banned`) handles `str`, `dict`, and `list`/`tuple` values, building a dotted path string like `.changes[2].data.recipe` so violation reports name exactly where the hit occurred.

```python
# selfgraph/guardrails.py:129-140
def _scan_banned(payload, _path: str = "") -> Iterable[str]:
    if isinstance(payload, str):
        low = payload.lower()
        for tok in _BANNED_TOKENS:
            if tok in low:
                yield f"{_path}: {tok}"
    elif isinstance(payload, dict):
        for k, v in payload.items():
            yield from _scan_banned(v, f"{_path}.{k}")
    elif isinstance(payload, (list, tuple)):
        for i, v in enumerate(payload):
            yield from _scan_banned(v, f"{_path}[{i}]")
```

The module docstring explicitly flags this as "demo-grade substring matching"—it is not a sandbox or AST parser and can be bypassed with encoding tricks. The README carries the same caveat. The design intent is to catch straightforward injection attempts in a prototype, not to provide production-hardened sandboxing.

The test suite exercises the rejection path by injecting `"subprocess.Popen(['rm', '-rf', '/'])"` into a change's `data.recipe` field and asserting `banned-token` appears in `report["violations"]`.

Sources: [selfgraph/guardrails.py:27-32](), [selfgraph/guardrails.py:129-140](), [tests/test_smoke.py:53-68]()

---

## Protected Types (`_PROTECTED_TYPES`)

```python
# selfgraph/guardrails.py:37
_PROTECTED_TYPES = {"AuthorityRule", "Capability"}
```

`AuthorityRule` and `Capability` are the nodes that define what the agent is allowed to do. A proposal that adds one of these without an explicit human approval string fires a `protected-type` violation. The check only applies to `add_object` changes (you cannot add these types at all in v1 without approval; mutating existing ones is blocked implicitly because no mutation kind exists in `ALLOWED_KINDS`).

```python
# selfgraph/guardrails.py:101-107
if kind == "add_object":
    t = change.get("type")
    if t in _PROTECTED_TYPES and not approved_by:
        report["violations"].append(
            ("protected-type", i,
             f"cannot add {t} without explicit approval")
        )
```

The `approved_by` argument to `validate_proposal` is the human-approval bypass. It defaults to `None`; passing a non-empty string (e.g., a human reviewer's identifier) lifts the block. Neither `cmd_propose` nor `cmd_promote` in `cli.py` currently pass `approved_by`, so both call sites treat all protected-type additions as violations.

The smoke test `test_proposal_rejected_for_protected_type_add` injects both an `AuthorityRule` and a `Capability` change and asserts at least two `protected-type` violations appear in the report.

Sources: [selfgraph/guardrails.py:37](), [selfgraph/guardrails.py:101-107](), [tests/test_smoke.py:89-109]()

---

## Additional Per-Kind Rules

Beyond kind allowlisting and protected types, two more per-change rules fire:

### Permission Escalation (`add_policy` with `can_approve`)

A `Policy` change that declares a `can_approve` key is blocked with a `permission-escalation` violation. This prevents a proposal from granting itself the ability to approve its own future patches.

```python
# selfgraph/guardrails.py:108-113
if kind == "add_policy":
    policy = change.get("policy", {})
    if "can_approve" in policy:
        report["violations"].append(
            ("permission-escalation", i,
             "policies may not declare can_approve")
        )
```

### Unknown Behavior (`bind_behavior`)

A `bind_behavior` change must name a `Behavior` node that already exists in the graph. The validator queries the live graph for all `Behavior` objects and compares against the set of their `name` fields.

```python
# selfgraph/guardrails.py:115-123
if kind == "bind_behavior":
    beh_name = change.get("behavior")
    known = {b.data.get("name") for b in graph.objects(type="Behavior")}
    if beh_name not in known:
        report["violations"].append(
            ("unknown-behavior", i,
             f"behavior {beh_name!r} not in capability graph; "
             f"v1 only binds existing behaviors")
        )
```

This enforces the proposer's documented design principle: "Bind existing behaviors instead of inventing new ones." (`selfgraph/propose.py:69`)

Sources: [selfgraph/guardrails.py:108-123](), [selfgraph/propose.py:69]()

---

## Violation Report Shape

`validate_proposal` returns a dict with the following structure, regardless of `mutate_status`:

```python
{
    "checked": int,           # number of changes examined
    "violations": [           # list of 3-tuples
        (rule_name, change_index, detail_string),
        ...
    ],
    "ok": bool,               # True iff violations is empty
}
```

`change_index` is `-1` for the banned-token scan (which operates on the whole payload, not a specific change), and `0`-based for per-change checks. The `rule_name` string is one of: `banned-token`, `malformed-change`, `disallowed-kind`, `protected-type`, `permission-escalation`, `unknown-behavior`.

Sources: [selfgraph/guardrails.py:81-126]()

---

## The PatchProposal State Machine

A `PatchProposal` is a first-class `Object` in the ActiveGraph event store. Its `status` field drives the lifecycle enforced across three modules.

```stateDiagram-v2
    [*] --> draft : propose_patch_for() creates proposal
    draft --> validated : validate_proposal(mutate_status=True) → ok
    draft --> rejected : validate_proposal(mutate_status=True) → violations
    validated --> applied : sandbox_apply(promote=True)
    validated --> validated : validate_proposal(mutate_status=False) re-check (cmd_promote)
    rejected --> [*]
    applied --> [*]
```

### State Transitions in Detail

| From | To | Trigger | Location |
|---|---|---|---|
| (new) | `draft` | `propose_patch_for()` creates the object | `selfgraph/propose.py:180-193` |
| `draft` | `validated` | `validate_proposal()` with `mutate_status=True`, no violations | `selfgraph/guardrails.py:61-74` |
| `draft` | `rejected` | `validate_proposal()` with `mutate_status=True`, violations found | `selfgraph/guardrails.py:61-74` |
| `validated` | `applied` | `sandbox_apply(promote=True)` after passing re-check | `selfgraph/sandbox.py:63-69` |

Sources: [selfgraph/propose.py:180-193](), [selfgraph/guardrails.py:61-74](), [selfgraph/sandbox.py:63-69]()

---

## Two Call Sites: `cmd_propose` and `cmd_promote`

### `cmd_propose` — First validation (with status mutation)

```python
# selfgraph/cli.py:70-80
def cmd_propose(args: list[str]) -> int:
    goal = " ".join(args) or "track project updates"
    graph, rt = _open()
    pid = propose_patch_for(graph, goal)
    report = validate_proposal(graph, pid)          # mutate_status=True (default)
    ...
    if report["ok"]:
        sandbox = sandbox_apply(graph, pid, runtime=rt, promote=False)
```

At propose time, `validate_proposal` is called with the default `mutate_status=True`. A passing proposal is stamped `validated`; a failing one is stamped `rejected`. The sandbox is then run with `promote=False`—changes are applied in an isolated fork so the caller can preview the diff without touching the live graph.

### `cmd_promote` — Re-validation without status mutation

```python
# selfgraph/cli.py:83-101
def cmd_promote(args: list[str]) -> int:
    ...
    # Re-validate against the current persisted state — the graph may
    # have changed between propose and promote (new ingestions, other
    # patches), so a stale 'validated' marker is not enough.
    # mutate_status=False so a re-check doesn't overwrite the existing
    # lifecycle status on the proposal.
    report = validate_proposal(graph, pid, mutate_status=False)
    if not report["ok"]:
        print(f"[promote] revalidation failed: {report['violations']}")
        return 1
    sandbox_report = sandbox_apply(graph, pid, runtime=rt, promote=True)
```

The `mutate_status=False` parameter is the key design decision here. Between `propose` and `promote`:

1. Other proposals may have been applied, adding new `Behavior` names or altering the protected-type landscape.
2. The proposal's own `validated` stamp could be arbitrarily old.

Re-running the full validator before promote ensures the proposal is still clean against the **current** graph state. Using `mutate_status=False` means this re-check is a pure read—it does not overwrite `status: "validated"` with `status: "validated"` (which would be harmless) and, critically, cannot accidentally overwrite `status: "applied"` if `sandbox_apply` is called concurrently.

The smoke test `test_validate_proposal_mutate_status_false` directly asserts this invariant:

```python
# tests/test_smoke.py:148-159
assert g.get_object(pid).data["status"] == "draft"
report = validate_proposal(g, pid, mutate_status=False)
assert report["ok"]
assert g.get_object(pid).data["status"] == "draft"   # unchanged
```

Sources: [selfgraph/cli.py:83-101](), [tests/test_smoke.py:148-159]()

---

## `sandbox_apply` Enforces `validated` Status

The `sandbox.py` module adds a second guard: it refuses to fork-and-apply any proposal that is not already in `validated` status, regardless of what the caller passes.

```python
# selfgraph/sandbox.py:31-35
if proposal.data.get("status") != "validated":
    raise ValueError(
        f"proposal {proposal_id} has status "
        f"{proposal.data.get('status')!r}; expected 'validated'"
    )
```

This means even if `cmd_promote` skipped the re-check (e.g., called `sandbox_apply` directly), the sandbox would still refuse to apply a `draft` or `rejected` proposal. The smoke test `test_promote_lifecycle_requires_validated_status` validates this by skipping `validate_proposal` entirely and asserting a `ValueError` containing `"validated"` is raised.

Sources: [selfgraph/sandbox.py:31-35](), [tests/test_smoke.py:195-207]()

---

## Complete Validation Flow

```text
cmd_propose / cmd_promote
        │
        ▼
validate_proposal(graph, pid, mutate_status=True|False)
        │
        ├─ fetch PatchProposal object from graph
        ├─ _check_proposal_data()
        │       ├─ _scan_banned(data)          ← whole-payload substring scan
        │       ├─ per-change kind allowlist   ← ALLOWED_KINDS
        │       ├─ protected-type check        ← _PROTECTED_TYPES + approved_by
        │       ├─ permission-escalation check ← add_policy + can_approve
        │       └─ unknown-behavior check      ← bind_behavior vs live graph
        │
        ├─ if mutate_status=True:
        │       graph.patch_object(pid, {status: "validated"|"rejected"})
        │
        └─ return report {ok, checked, violations}
                │
                ▼ (if ok)
        sandbox_apply(promote=False)  ← preview only
                │
                ▼ (cmd_promote, after re-check passes)
        sandbox_apply(promote=True)   ← writes to live graph + status="applied"
```

Sources: [selfgraph/guardrails.py:45-78](), [selfgraph/cli.py:67-101](), [selfgraph/sandbox.py:16-73]()

---

## Summary

`guardrails.py` enforces a closed surface for graph mutation: seven allowed change kinds, a substring banlist against injection payloads, a two-type list protecting the agent's authority substrate, and two extra per-kind rules (no `can_approve` in policies, no phantom behavior bindings). The lifecycle—`draft → validated → applied` or `rejected`—is owned jointly by `guardrails.py` (which stamps `validated`/`rejected`) and `sandbox.py` (which stamps `applied`). The critical safety property is that `cmd_promote` calls `validate_proposal(mutate_status=False)` before `sandbox_apply`, so a stale `validated` marker from a prior run cannot bypass a fresh check against the current graph state. This is tested explicitly in `test_validate_proposal_mutate_status_false` and `test_promote_lifecycle_requires_validated_status`.

Sources: [selfgraph/guardrails.py:45-78](), [selfgraph/cli.py:83-101](), [tests/test_smoke.py:148-207]()
