# Invariants, Failure Modes & Safe-Change Rules

> A synthesis of the core invariants that hold across all subsystems, the failure modes that break them, and how to change the codebase safely. Covers: TurnState as the only cross-process contract; memory pack rebuild triggers (the three-event rule); prompt-cache stability conditions; state machine history append-only guarantee; PGlite cross-process lock; transient-error retry scope; and which files are safe to change in isolation versus which touch multiple invariants.

- Repository: dzhng/duet-agent
- GitHub: https://github.com/dzhng/duet-agent
- Human wiki: https://grok-wiki.com/public/wiki/dzhng-duet-agent-82dbe2572d3a
- Complete Markdown: https://grok-wiki.com/public/wiki/dzhng-duet-agent-82dbe2572d3a/llms-full.txt

## Source Files

- `src/turn-runner/transient-error.ts`
- `src/guardrails/firewall.ts`
- `src/guardrails/semantic.ts`
- `src/memory/session.ts`
- `src/memory/migrations.ts`
- `evals/state-machine-real-session-carry-forward.eval.ts`
- `evals/source-of-truth-first.eval.ts`
- `test/transient-error.test.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [src/turn-runner/transient-error.ts](src/turn-runner/transient-error.ts)
- [src/guardrails/firewall.ts](src/guardrails/firewall.ts)
- [src/guardrails/semantic.ts](src/guardrails/semantic.ts)
- [src/memory/session.ts](src/memory/session.ts)
- [src/memory/migrations.ts](src/memory/migrations.ts)
- [src/memory/pglite.ts](src/memory/pglite.ts)
- [src/memory/context-pack.ts](src/memory/context-pack.ts)
- [src/types/protocol.ts](src/types/protocol.ts)
- [src/types/state-machine.ts](src/types/state-machine.ts)
- [src/turn-runner/turn-runner.ts](src/turn-runner/turn-runner.ts)
- [evals/state-machine-real-session-carry-forward.eval.ts](evals/state-machine-real-session-carry-forward.eval.ts)
- [evals/source-of-truth-first.eval.ts](evals/source-of-truth-first.eval.ts)
- [test/transient-error.test.ts](test/transient-error.test.ts)
</details>

# Invariants, Failure Modes & Safe-Change Rules

This page synthesizes the core contracts that must hold across the duet-agent subsystems, explains which failure modes break each contract, and gives concrete rules for making changes safely. Understanding these invariants is the difference between a clean feature addition and a subtle regression that degrades only under production load or across concurrent processes.

The system has six interacting areas where invariants are load-bearing: the `TurnState` cross-process contract, the three-event rule for memory pack rebuilds, prompt-cache stability conditions, the state-machine history append-only guarantee, the PGlite cross-process open lock, and the transient-error retry scope. Each area constrains what you can change in adjacent code without breaking the whole.

---

## 1. TurnState — The Only Cross-Process Contract

### What it is

`TurnState` is the serializable snapshot that survives process boundaries. Every terminal event (`complete`, `ask`, `interrupted`, `sleep`) carries a `TurnState` back to the caller. The caller — CLI, daemon, HTTP server, or parent process — persists it and passes it back into `TurnRunner.start({ state })` on resume.

```typescript
// src/types/protocol.ts:169-208
export interface TurnState {
  status: TurnStateStatus;
  mode: TurnMode;
  options?: TurnOptions;
  agent: AgentSession;
  stateMachine?: StateMachineSession;
  todos?: TurnTodo[];
  followUpQueue?: TurnFollowUpQueueEntry[];
  queuedCommands?: TurnCommand[];
}
```

### Invariants

| Field | Invariant |
|---|---|
| `status` | Always reflects the terminal lifecycle: `running`, `waiting_for_human`, `sleeping`, `interrupted`, `completed`, `failed`, `cancelled`. |
| `mode` | Frozen at session start. An `"auto"` session keeps this to continue creating definitions across turns; explicit-definition sessions stay constrained to their state set. |
| `agent` | Always present. Owns the conversation transcript regardless of whether a state machine is active. |
| `stateMachine` | Present iff the session is in state-machine mode. Contains the full append-only history. |

### Failure modes

- **Dropping fields on serialize/deserialize**: Any field omitted from persistence breaks resume. `followUpQueue` and `queuedCommands` are easy to miss — they carry pending prompts that must replay on resume. Missing them silently drops user input.
- **Mutating `mode` post-start**: Changing `mode` between turns of the same session causes the routing guard to make wrong classifications. `"auto"` sessions that get converted to `"agent"` stop creating definitions; explicit-definition sessions that flip to `"auto"` ignore the caller's intent.
- **Losing `stateMachine.history`**: The history array is the only durable audit log for state transitions. Truncating it breaks replay evals and causes carry-forward failures (see Section 4).

### Safe-change rules

- Add optional fields freely — the runner ignores unknown fields on resume.
- Never remove or rename a field without a migration path that rewrites persisted `TurnState` blobs.
- Never mutate `mode` after `start()` has been called for a session.

Sources: [src/types/protocol.ts:169-208](src/types/protocol.ts)

---

## 2. Memory Pack Rebuild — The Three-Event Rule

### What it is

The memory pack is a frozen prefix rendered above the agent's message tail. It contains global cross-session observations/reflections and local session-scoped ones. Rebuilding it is an expensive DB query; doing it too often breaks prompt-cache stability (Section 3). `context-pack.ts` enforces exactly three rebuild triggers.

```typescript
// src/memory/context-pack.ts:8-23
// Three events trigger a refresh and exactly three:
//   1. `loadStoredMemory()` finishes — initial seed.
//   2. The reflector replaces observations — condensed view changed.
//   3. The wire-shaping eviction horizon advances — prompt cache is
//      already invalidating, so piggyback the refresh for free.
//
// Any other path (observer appending a row mid-turn, recall_memory
// tool returning rows) deliberately does NOT refresh...
```

### The three events in code

| Event | Where | Why |
|---|---|---|
| `loadStoredMemory()` completes | `turn-runner.ts:1714-1718` | Initial seed so the first dispatched turn sees a frozen prefix. |
| Reflector writes reflections | `turn-runner.ts:1559-1566` | Reflector condensed the observation set; pack must pick up the new view. |
| Wire-shaping eviction horizon advances | `turn-runner.ts:1641-1646` | Cache is already invalidating; piggyback the refresh instead of paying two misses. |

### Invariants

- The observer writing a row mid-turn does **not** trigger a rebuild. The prefix stays stable for that turn's LLM calls.
- The `recall_memory` tool returning rows does **not** trigger a rebuild. Tool results do not alter the frozen prefix.
- Only reflector output (condensed summaries, not raw observations) earns a rebuild.

### Failure modes

- **Adding a fourth rebuild trigger**: Any observer write path that calls `refreshMemoryContextPack()` will bust the prompt cache on every tool result, turning a cache hit into a cache miss per observation — catastrophic for long agentic turns.
- **Suppressing the reflector trigger**: If `result.reflections.length > 0` check is removed, the pack falls stale after a reflection run. The agent then works from an outdated condensed view for the rest of the session.
- **Rebuilding outside an idle-close window**: `rebuildMemoryContextPack` calls `session.withDb(...)`, which pins the cross-process open lock. Rebuilding too often inside a tight loop keeps the lock held longer, starving a concurrent CLI.

### Safe-change rules

- If you add a new memory writer (e.g., a new tagging step), verify it does **not** call `refreshMemoryContextPack`. It should only write rows; the reflector cycle will pick them up in the next condensation.
- When changing the reflector output shape, update the check at `turn-runner.ts:1564` to match the new reflections field, or the trigger will silently stop firing.

Sources: [src/memory/context-pack.ts:1-54](src/memory/context-pack.ts), [src/turn-runner/turn-runner.ts:1559-1649](src/turn-runner/turn-runner.ts)

---

## 3. Prompt-Cache Stability Conditions

### What it is

The provider's prompt cache survives between turns when the rendered system prompt prefix (including the frozen memory pack) is byte-identical between calls. A cache miss wastes tokens and increases latency. The codebase has three explicit stability guarantees that protect the cache.

### Stability guarantees

```text
Stable between turns (cache hits expected):
  [system prompt + AGENTS.md]
  [frozen global memory pack]
  [frozen local memory pack]
  ──────────────────────────── (frozen prefix)
  [message history tail]
  [latest user turn]

Unstable (invalidates cache):
  • Memory pack rebuild (three-event rule)
  • Wire-shaping advances eviction horizon
  • Model change mid-session (TurnOptions.model)
```

1. **Memory pack is frozen per turn**: Observer appends and `recall_memory` tool calls do not alter the frozen pack (see Section 2). The prefix bytes stay identical across all LLM calls in a single turn.
2. **Wire-shaping uses a sticky horizon**: `applyEvictionHorizon` applies the same eviction decision across all calls in a turn so the provider sees the same message tail. The runner creates `createInitialHorizon` once at session start and advances it only when wire bytes would exceed the cap (`turn-runner.ts:1631-1635`).
3. **Memory transform re-injects synthetic wrappers transiently**: The `createObservationalContextTransform` re-injects memory wrappers on each request using the frozen pack, not new DB queries. They do not count toward `TurnState.messages` bytes (`protocol.ts:545-548`).

### Failure modes

- **Calling `refreshMemoryContextPack` from an observer**: Covered in Section 2, but from the cache perspective: any extra rebuild mid-turn breaks the byte-stability of the prefix and guarantees a cache miss for every subsequent call in that turn.
- **Changing the memory rendering template without bumping version**: If the template that produces the frozen prefix changes (whitespace, tag names, ordering), every existing cached prefix becomes stale. New renders produce different bytes and the provider cold-starts every user's session.
- **Not applying the sticky horizon**: If the eviction horizon is recalculated per-call instead of per-turn, the provider sees a different message tail on each call within the same turn — guaranteed cache misses.

### Safe-change rules

- When editing `observational.ts` rendering logic, make a benchmark comparison of cache hit rates before and after.
- Never reset `wireGuardHorizon` between calls within the same turn. It is `private` on `TurnRunner` and should remain reset only at `start()`.
- The `contextWindowUsage` breakdown in `TurnState` is heuristic and read-only for display; changing its calculation does not affect cache.

Sources: [src/memory/context-pack.ts:8-23](src/memory/context-pack.ts), [src/types/protocol.ts:544-548](src/types/protocol.ts), [src/turn-runner/turn-runner.ts:1631-1648](src/turn-runner/turn-runner.ts)

---

## 4. State-Machine History — Append-Only Guarantee

### What it is

`StateMachineSession.history` is the audit log of every state transition. It is append-only by design: events are pushed as `state_started`, `state_completed`, `state_failed`, `state_interrupted`, and `state_machine_completed`. Nothing ever removes or edits a row.

```typescript
// src/types/state-machine.ts:194
/** Append-only audit log used for debugging, replay, and persistence. */
history: StateMachineSessionEvent[];
```

The eval in `evals/state-machine-real-session-carry-forward.eval.ts` verifies this guarantee directly:

```typescript
// evals/state-machine-real-session-carry-forward.eval.ts:188-192
const history = terminal.state.stateMachine?.history ?? [];
const fixOutput = history.find(
  (event) => event.type === "state_completed" && event.state === "fix_and_recover",
);
expect(fixOutput).toBeTruthy();
```

### The carry-forward invariant

When the parent agent transitions to a new state, it **must** inline findings from previous states into `override.prompt` or `input`. A fresh sub-agent for the new state cannot see prior states' output — it only sees its own rendered prompt. The eval reproduces a real production failure (`session c_cGfNEIotLU`) where the third sub-agent asked three clarifying questions because the parent passed `"Fix the root cause ... from the corrupted DBs found earlier"` with no concrete antecedent.

The transition must carry:
- Specific repo file paths (e.g. `src/memory/pglite.ts`)
- Root cause facts (e.g. the process-level open race, concurrent migration paths)

### Failure modes

- **Truncating history on serialize**: Dropping old entries to save payload bytes breaks replay evals and means the `terminalAcknowledged` flag (which gates the one-time terminal acknowledgment turn) can fire again for a stale terminal.
- **Not carrying findings forward**: The sub-agent for the new state asks clarifying questions the parent already answered, degrading quality and burning tokens. The eval gates this at `evals/state-machine-real-session-carry-forward.eval.ts:174-181`.
- **Mutating a completed entry**: Any code that rewrites a `state_completed` event destroys the audit trail and makes replay evals non-deterministic.
- **Re-acknowledging a terminal**: `terminalAcknowledged` is a one-time flag per session. If it is reset or not persisted, the runner runs the terminal acknowledgment turn again, which creates a duplicate conversation turn.

### Safe-change rules

- Add event types to `StateMachineSessionEvent` with new discriminants — never change existing discriminants.
- If history needs trimming for payload size, trim from the **oldest** end, never from the middle. Better: expose a compact `progress` summary alongside history rather than trimming history itself (the `progress` field at `state-machine.ts:192` is already designed for this).
- When adding a new state kind, ensure the runner appends a `state_started` before execution and a `state_completed` / `state_failed` after — not doing so leaves gaps that confuse carry-forward logic.

Sources: [src/types/state-machine.ts:168-218](src/types/state-machine.ts), [evals/state-machine-real-session-carry-forward.eval.ts:153-198](evals/state-machine-real-session-carry-forward.eval.ts)

---

## 5. PGlite Cross-Process Open Lock

### What it is

PGlite is a single-writer embedded database. If two processes call `PGlite.create` on the same data directory simultaneously, the directory's WAL can corrupt. The cross-process open lock (`.duet-open.lock` inside the data directory) ensures only one process holds an open handle at a time.

```typescript
// src/memory/pglite.ts:28-34
// Filename used for the cross-process open-lock written into each
// managed data directory. The first line is the holder's pid; other
// processes only proceed past O_EXCL if that pid is no longer alive.
const OPEN_LOCK_FILE = ".duet-open.lock";
```

### The lock lifecycle

```text
tryAcquireOpenLock(dataDir)
  ├─ O_EXCL create succeeds → hold lock (write pid to file)
  ├─ File exists, holder PID alive → return { holderPid } (retry later)
  └─ File exists, PID dead (stale) → unlink + re-create atomically

MemorySession.withDb():
  refs++
  → ensureOpen() → pollAcquireOpenLock() → openPGliteHoldingLock()
  → fn(db)
  refs--
  → if refs == 0: scheduleIdleClose() [2s default]

idle timer fires:
  → db.close() → releaseOpenLock(lockPath)
```

### Invariants

- The lock file must exist for the entire duration a `PGlite` handle is open. Releasing it before `db.close()` allows a second process to open, creating two concurrent writers.
- The idle-close window (default 2 seconds) keeps the lock held for short write bursts (observer + reflector + embedding upserts in the same turn) without permanently blocking a peer CLI.
- The lock must be released on process exit. An exit handler registered by `installExitCleanup()` does this via `process.on("exit", ...)`.

### Failure modes

- **Releasing the lock before closing `db`**: `closeNow()` in `MemorySession` clears `this.lockPath` before calling `db.close()`. If `db.close()` throws (PGlite occasionally does on aborted ops), the lock is still released because `releaseOpenLock` runs in the `finally` block of the wrapping open path.
- **Skipping the idle-close refcount check**: If `closeNow` is called while `refs > 0`, a concurrent `withDb` call will run against a closed handle. The refcount drain in `runDispose` guards against this.
- **Quarantine without lock release**: During backup restoration, the lock must be released before the data directory is renamed aside (the lock file lives inside it). `openPGliteHoldingLock` calls `releaseOpenLock(lockPath)` before `quarantineDataDirectory` (`pglite.ts:300-302`).
- **ENOENT on `pglite.data` during upgrade**: A `duet upgrade` running concurrently with an active CLI rewrites `node_modules` mid-flight. The `isExternalAssetError` guard prevents quarantining a healthy `memory.db` when the failure is an ENOENT on a PGlite runtime asset outside the data directory.

### Safe-change rules

- Never call `PGlite.create` directly. Always go through `openPGlite`, `openPGliteWaitingForLock`, or `MemorySession.withDb`.
- The backup rotation (`MAX_BACKUPS = 5`, `BACKUP_DEDUPE_WINDOW_MS = 5 min`) is fine to tune, but the snapshot must only happen **after** a successful open + init, never before (a corrupted directory snapshotted would rotate out the last good copy).
- Adding a new memory operation? Use `session.withDb(...)` — it handles refcounting and the idle-close timer correctly.

Sources: [src/memory/pglite.ts:28-34, 154-184, 251-338](src/memory/pglite.ts), [src/memory/session.ts:61-201](src/memory/session.ts)

---

## 6. Transient-Error Retry Scope

### What it is

`TurnRunner` drives the `Agent` directly (not through `AgentSession`) to own turn semantics. This means `AgentSession`'s built-in retry path does not apply. The transient-error module in `src/turn-runner/transient-error.ts` mirrors `AgentSession`'s detection regex exactly to ensure consistent behavior.

```typescript
// src/turn-runner/transient-error.ts:61-62
const TRANSIENT_PATTERN =
  /overloaded|provider.?returned.?error|rate.?limit|...|stream ended|.../i;
```

### Retry policy

```typescript
// src/turn-runner/transient-error.ts:112-116
export const DEFAULT_TRANSIENT_RETRY_POLICY: TransientRetryPolicy = {
  maxAttempts: 3,    // total prompt+continue attempts (initial + 2 retries)
  baseDelayMs: 2_000,
  maxDelayMs: 30_000,
};
```

### The progress-reset invariant

The retry counter resets when the agent makes forward progress between failures. "Forward progress" means at least one non-error assistant message was appended between two failures. This is tested explicitly:

```typescript
// test/transient-error.test.ts:169-215
// Both retry log lines should read "attempt 2/3": the first
// failure burned the implicit attempt 1, and the second failure
// resets to attempt 1 because the agent emitted an intermediate
// success between them.
expect(retryAttemptLabels(events)).toEqual(["2/3", "2/3"]);
```

Without a reset, a session that fails, recovers, fails again would exhaust `maxAttempts` across the two separate transient events — far too aggressive.

### What is NOT retried

| Pattern | Reason |
|---|---|
| `400 Bad Request` | Retrying with the same payload will not change the outcome |
| `401 Unauthorized` | Auth failure is permanent |
| `403 Forbidden` | Same |
| `404 Not Found` | Wrong endpoint; retrying doesn't help |
| Context overflow (`prompt is too long`) | Handled separately by `tryRecoverFromContextOverflow`, not this module |

### Failure modes

- **Adding a 4xx to the retry list**: `NON_RETRYABLE_PATTERN` explicitly blocks 4xx codes other than 429. Adding a code here (e.g., treating 422 as transient) wastes retries on payloads that will always fail.
- **Diverging the regex from `AgentSession._isRetryableError`**: The comment at `transient-error.ts:29` explains that the regex deliberately mirrors `pi-coding-agent`'s detection. Drift means some pi-ecosystem callers retry a class of error while duet-agent does not, producing inconsistent behavior.
- **Suppressing the progress reset**: If the counter is not reset on intermediate success, long agentic tasks that suffer one transient blip per state will burn through `maxAttempts` globally — causing the runner to give up during a valid multi-state session.
- **Retrying context overflow**: Context overflow must be handled by compaction, not by `agent.continue()` — the same oversized payload will fail identically.

### Safe-change rules

- When adding new provider error shapes to retry, add them to `TRANSIENT_PATTERN` and add a test row to `test/transient-error.test.ts` under the `retries:` table.
- When excluding new error shapes, add them to `NON_RETRYABLE_PATTERN` and add a `does not retry:` row.
- Keep `DEFAULT_TRANSIENT_RETRY_POLICY.maxAttempts` and `baseDelayMs` in sync with `AgentSession`'s values. The test at `test/transient-error.test.ts:106` explicitly asserts this synchronization.

Sources: [src/turn-runner/transient-error.ts:1-127](src/turn-runner/transient-error.ts), [test/transient-error.test.ts:1-256](test/transient-error.test.ts)

---

## 7. Schema Migrations — Forward-Only Guarantee

### What it is

`src/memory/migrations.ts` applies schema changes to PGlite in monotonically increasing version order. Each migration runs inside a single transaction; failure rolls back to the prior version. The `schema_version` table records every applied migration. The invariant is: **we never go backwards**.

```typescript
// src/memory/migrations.ts:22-23
// Forward-only by design. A bad migration is fixed by writing the
// next migration, not by reverting.
```

### Why drop-and-recreate for embedding tables

Migrations 6 and 7 both drop `observation_embeddings` entirely rather than attempting data-preserving rebuilds. The comment in migration 7 explains the root cause:

> PGlite's TOAST storage of `vector(3072)` columns hit `missing chunk number 0 for toast value ...` on production databases that had been heavily UPSERTed by the worker.

Backfill is cheap (embedding worker repopulates within minutes); TOAST corruption is not recoverable. This is a codebase-specific invariant: observation rows are the only data that genuinely needs in-place transforms; embedding tables can always be rebuilt.

### Failure modes

- **Reverting a migration**: Setting `LATEST_SCHEMA_VERSION` to an older value does not undo applied migrations — it just stops newer ones from running. Callers that need to downgrade must write a new forward migration that undoes the schema change.
- **Non-idempotent migrations**: Migration 1 uses `CREATE TABLE IF NOT EXISTS` specifically to allow both fresh installs and pre-migration databases to converge on the same baseline. Removing `IF NOT EXISTS` breaks upgrades.
- **Running migrations outside a transaction**: The `db.transaction(...)` wrapper means a DDL failure on step 3 of a multi-step migration rolls back steps 1 and 2. Removing this wrapper leaves the schema in a half-applied state.

### Safe-change rules

- New migrations always append at the end with `version: LATEST + 1`.
- Embedding table changes: prefer drop-and-recreate (the backfill worker handles repopulation) over data-preserving rebuilds that read `vector(3072)` columns.
- Observation table changes: use in-place transforms with `ALTER TABLE` + backfill DML, as in migrations 2 and 4.

Sources: [src/memory/migrations.ts:1-439](src/memory/migrations.ts)

---

## 8. Guardrail Firewall — Fail-Fast Composition

### What it is

`createFirewall` in `src/guardrails/firewall.ts` composes multiple guardrails into one. It evaluates them in order and short-circuits on the first block: a blocked result is returned immediately, not accumulated.

```typescript
// src/guardrails/firewall.ts:14-21
for (const g of guardrails) {
  const result = await g.evaluate(context);
  if (!result.allowed) {
    return {
      allowed: false,
      reason: `[${g.name}] ${result.reason}`,
      suggestion: result.suggestion,
    };
  }
```

The `SemanticGuardrail` uses an LLM call per evaluation — it is expensive. Ordering pattern-based guardrails before semantic ones is the intended usage pattern.

### Invariants

- Guardrail order is semantically significant: a cheaper pattern guardrail first catches obvious violations without paying the LLM cost.
- Warnings (non-blocking reasons from passing guardrails) are accumulated and returned in the final `allowed: true` result. They are not silently dropped.

### Failure modes

- **Placing `SemanticGuardrail` first**: Every action pays an LLM call even when a trivial pattern rule would have blocked it.
- **Swallowing the `reason` field on block**: The reason and suggestion are the user-visible explanation. Stripping them makes blocks opaque.

### Safe-change rules

- Compose firewall in order: regex/pattern guardrails first, semantic guardrail last.
- The `SemanticGuardrail` model is injected at construction — keep it BYOK/BYOC compatible, do not hardcode a provider model string in the guardrail definition.

Sources: [src/guardrails/firewall.ts:1-34](src/guardrails/firewall.ts), [src/guardrails/semantic.ts:1-54](src/guardrails/semantic.ts)

---

## 9. Which Files Are Safe to Change in Isolation

### Low blast radius (safe to change alone)

| File | Why |
|---|---|
| `src/guardrails/firewall.ts` | Composition only; no state owned |
| `src/guardrails/semantic.ts` | LLM call wrapper; only the policy string and model are observable |
| `src/turn-runner/transient-error.ts` | Self-contained; only affects retry behavior, not state transitions |
| `src/memory/migrations.ts` | Add-only at the end; each migration is transactional |
| `src/types/state-machine.ts` (additive changes only) | Adding new optional fields or new event discriminants is backward-compatible |

### High blast radius (touch multiple invariants)

| File | Why |
|---|---|
| `src/memory/context-pack.ts` | Directly governs the three-event rule. Any new call site breaks prompt-cache stability. |
| `src/memory/pglite.ts` | Lock lifecycle, quarantine, backup rotation. Changes here affect every concurrent-CLI scenario. |
| `src/memory/session.ts` | Refcount logic. Getting refcounting wrong causes use-after-close or lock starvation. |
| `src/turn-runner/turn-runner.ts` | Orchestrates all six invariants. Every section of this file is entangled with at least one other invariant. |
| `src/types/protocol.ts` | Renaming or removing `TurnState` fields breaks cross-process serialization for all callers (CLI, HTTP, daemon). |
| `src/memory/observational.ts` | Wire-shaping transform. Changes to the rendering template invalidate existing prompt caches. |

---

## Summary

The six load-bearing invariants — `TurnState` as the only cross-process contract, the three-event rule for memory pack rebuilds, prompt-cache stability through frozen prefixes, append-only state-machine history, PGlite's per-directory cross-process lock, and transient-error retry scope mirroring `AgentSession` — form an interconnected fabric. The most dangerous change pattern is adding a fourth memory rebuild trigger, because it simultaneously breaks prompt-cache stability, increases DB lock contention, and can cascade into PGlite lock exhaustion when multiple CLI sessions are running concurrently. The eval at `evals/state-machine-real-session-carry-forward.eval.ts:153-181` and the unit test at `test/transient-error.test.ts:169-215` are the two regression gates most directly protecting these invariants at CI time.

Sources: [src/memory/context-pack.ts:8-23](src/memory/context-pack.ts), [src/types/protocol.ts:169-208](src/types/protocol.ts)