# Observational Memory — How Transcripts Become Durable Rows

> Memory and compaction are the same primitive. After each turn an observer model reads the transcript and appends Observation rows to PGlite; when rows grow beyond a threshold a reflector condenses them. Embeddings run in a background worker (embedding-worker.ts) so foreground turns never block. This page covers the observe → reflect → embed pipeline, trigger conditions, failure isolation, and the image-to-text path that keeps screenshots recallable.

- Repository: dzhng/duet-agent
- GitHub: https://github.com/dzhng/duet-agent
- Human wiki: https://grok-wiki.com/public/wiki/dzhng-duet-agent-82dbe2572d3a
- Complete Markdown: https://grok-wiki.com/public/wiki/dzhng-duet-agent-82dbe2572d3a/llms-full.txt

## Source Files

- `src/memory/observational.ts`
- `src/memory/observational-prompts.ts`
- `src/memory/observation-groups.ts`
- `src/memory/embedding-worker.ts`
- `src/memory/embedding.ts`
- `src/memory/storage.ts`
- `evals/memory-reflect.eval.ts`
- `test/memory-reflect-planner.test.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [src/memory/observational.ts](src/memory/observational.ts)
- [src/memory/observational-prompts.ts](src/memory/observational-prompts.ts)
- [src/memory/observation-groups.ts](src/memory/observation-groups.ts)
- [src/memory/embedding-worker.ts](src/memory/embedding-worker.ts)
- [src/memory/embedding.ts](src/memory/embedding.ts)
- [src/memory/storage.ts](src/memory/storage.ts)
- [src/memory/context-pack.ts](src/memory/context-pack.ts)
- [src/memory/loader.ts](src/memory/loader.ts)
- [src/memory/migrations.ts](src/memory/migrations.ts)
- [evals/memory-reflect.eval.ts](evals/memory-reflect.eval.ts)
- [test/memory-reflect-planner.test.ts](test/memory-reflect-planner.test.ts)
</details>

# Observational Memory — How Transcripts Become Durable Rows

Duet's observational memory system converts raw conversation transcripts into structured `Observation` rows stored in a per-user PGlite (embedded Postgres) database. After each turn, an *observer* model reads the unobserved message tail and appends new rows; when the local session's accumulated rows exceed a token threshold, a *reflector* condenses them. Embeddings are generated in a background worker so that the foreground turn is never blocked waiting for a vector computation. A separate global `duet memory reflect` command cross-session prunes the entire pool into atomic reflection rows.

This page covers the full pipeline: how transcripts are serialized for the observer, how observation groups anchor progress markers, how reflection compaction works at session and global scope, and how the embedding backfill worker runs without interfering with foreground turns. Understanding this pipeline is essential for predicting memory latency, debugging blank or stale recall, and reasoning about what gets preserved across long-running or resumed sessions.

---

## The Observe → Reflect → Embed Pipeline

```text
Turn N completes
     │
     ▼
updateObservationalMemory()
     │
     ├─ agentMessagesToRaw()           serialise AgentMessage → RawMemoryMessage
     │                                 (text, images pass through; tool-calls flattened)
     │
     ├─ getUnobservedMessageTail()     skip messages already covered by an
     │                                 <observation-group> range marker
     │
     ├─ [unobserved tail > 0 ?]
     │     └─ observe()               LLM call → ObserverResult
     │           └─ appendObservation() → PGlite  (wrapped in <observation-group>)
     │
     ├─ [session obs tokens ≥ trigger ?]
     │     └─ reflectObservations()   LLM call → ReflectorResult
     │           └─ replaceSessionObservations() → PGlite
     │
     └─ return { observations, reflections }

Background (EmbeddingBackfillWorker, every 10 s):
     SELECT rows WHERE embedding IS NULL
     → embed()  (POST /api/v1/embed)
     → INSERT INTO observation_embeddings
```

Sources: [src/memory/observational.ts:609-696](), [src/memory/storage.ts:145-303]()

---

## Budget Arithmetic

All numeric token budgets flow from one caller-supplied value: `effectiveContext` — the actor model's effective context window size, already clamped to the provider's hard limit. A single `deriveMemoryBudgets()` call derives every threshold at a fixed ratio.

| Budget name | Ratio of `effectiveContext` | Default at 200k |
|---|---|---|
| `messageTokens` (raw tail trigger) | 0.60 | 120,000 |
| `observationTokens` (reflection trigger) | 0.325 | 65,000 |
| `globalContextTokenBudget` | 0.075 | 15,000 |
| `bufferActivation` (target after compaction) | 0.5 × trigger | 60,000 / 32,500 |

`bufferActivation` is always half the trigger. After a compaction event the retained content shrinks to half the trigger size, giving the session room to grow again before retriggering. Two fixed, context-independent caps guard the observer call itself:

| Cap | Value | Purpose |
|---|---|---|
| `maxTranscriptTokens` | 35,000 | Observer call never sees more than this from the unobserved tail |
| `maxObservationLogTokens` | 8,000 | Hard upper bound on the observation text the observer may produce |
| `previousObserverTokens` | 4,000 | Prior observations shown to the observer for dedupe context |

Sources: [src/memory/observational.ts:73-134](), [src/memory/observational.ts:170-195]()

---

## Message Serialization for the Observer

`agentMessageToRaw()` projects each `AgentMessage` into a `RawMemoryMessage` — a compact, observer-readable form. The key decisions:

- **Text and image blocks** pass through verbatim. Image blocks appear inline in the observer's multimodal prompt so screenshots are directly inspectable.
- **Tool-call blocks** become compact `[toolCall name(args)]` snippets, with arguments truncated to 1,500 characters.
- **Tool-result blocks** are prefixed with `[toolResult toolName (error?)]` and truncated to 1,500 characters. This keeps the observer aware of what was called without drowning the batch budget in raw JSON payloads.
- **`thinking` blocks are dropped entirely.** The observer records decisions, not the assistant's reasoning steps.
- **Synthetic memory-reminder messages** injected by `createObservationalContextTransform` are stripped before the observer ever sees them via `stripObservationalContextMessages()`.

```ts
// src/memory/observational.ts:1652-1697
function serializeMessageForObserver(message: AgentMessage): ObserverMessagePreview {
  // ... text/image pass through; toolCall flattened; thinking dropped
}
```

Each message gets a stable, deterministic `id` (`msg_assistant_<responseId>`, `msg_tool_<toolCallId>`, or a hash-based fallback). These ids are the backbone of the progress-marker system.

Sources: [src/memory/observational.ts:1563-1812]()

---

## Observation Groups — Progress Markers

Every observation row's content is wrapped in an `<observation-group>` XML element carrying four key attributes:

```xml
<observation-group id="a3f7c2b1d8e4" range="msg_user_1716000000_abc:msg_assistant_resp_xyz" kind="observation" cwd="/home/user/projects/duet">
  🟡 (14:33) Fixed null-check in auth.ts:45 …
</observation-group>
```

The `range` attribute is a `firstMessageId:lastMessageId` span. `getUnobservedMessageTail()` parses all observation groups from all local session observations, finds the highest message index covered by any range, and returns only messages beyond that index as the "unobserved tail." This is what prevents the observer from re-observing turns it already summarized.

```ts
// src/memory/observational.ts:1528-1561
export function getUnobservedMessageTail(
  messages: RawMemoryMessage[],
  observations: Observation[],
): RawMemoryMessage[] {
  const lastObservedIndex = getLastObservedMessageIndex(messages, observations);
  // ...
}
```

The `cwd` attribute persists the working directory onto each row so the reflector and any downstream reader can identify which project the row belongs to — essential when memory is read back weeks later or across multiple repositories.

Sources: [src/memory/observation-groups.ts:74-118](), [src/memory/observational.ts:1528-1561]()

---

## The Observer Model Call

`observe()` builds a structured-output prompt containing:

1. A system prompt instructing the observer to extract decision traces (trigger → investigation → decision → rationale), priority-tagged observations (`🔴`/`🟡`/`🟢`/`✅`), and temporal anchors.
2. Prior local observations (for dedupe) and cross-session global pack rows (with explicit `[memory id: mem_xxx]` markers so the observer can attribute `usedObservationIds`).
3. The serialized message history, trimmed to `maxTranscriptTokens` (35k) from the newest end.

The observer returns a structured `ObserverResult` with:

- `hasMemory: boolean` — gating flag. When false, no row is written and the message range is left unobserved.
- `observations: string` — the extracted observation log text.
- `usedObservationIds: string[]` — prior memory ids whose content actually informed this turn's response. These trigger a `bumpLastUsed()` update to refresh the `lastUsedAt` freshness signal for those rows.
- `currentTask`, `suggestedContinuation`, `threadTitle` — continuity metadata.

If the observation text exceeds `maxObservationLogTokens`, a retry is attempted with an explicit token count. A final hard trim via `trimObservationTextToTokenBudget()` enforces the hard cap.

Sources: [src/memory/observational.ts:1325-1399](), [src/memory/observational-prompts.ts:248-343]()

### Image-to-Text Path

Images pass through `serializeMessageForObserver` as inline `ImageContent` blocks in the observer's multimodal `content` array. The observer prompt's final instruction explicitly asks the model to "inspect [images] directly and summarize relevant visual details, user-visible text, UI state, diagrams, errors, or other facts needed for future continuity." This is the only path by which screenshots become recallable: the observer converts visual state into prose that lands in a durable `Observation` row.

Sources: [src/memory/observational-prompts.ts:329-343](), [src/memory/observational.ts:1666-1731]()

---

## In-Session Reflection

After each observer call, `updateObservationalMemory()` re-reads the session's observation token count. If it has reached `settings.reflection.observationTokens` (default ~65k at 200k context), `reflectObservations()` fires.

The reflector receives all current session observations, rendered via `renderObservationGroupsForReflection()` into a `## Group <id>` / `_range: <range>_` format. It returns an array of atomic `ReflectorReflection` rows — each a self-contained narrative of 150–600 tokens covering trigger → investigation → decision → rationale.

After reflection:

1. The array is joined into a single blob via `joinReflectorRows()`.
2. Token budget is enforced (retry + hard trim).
3. `reconcileObservationGroupsFromReflection()` re-wraps the output in `<observation-group>` elements using provenance derived from which source groups' content lines appear in the reflected sections.
4. `replaceSessionObservations()` atomically deletes this session's old rows and inserts the single new reflection row (kind=`"reflection"`).

```ts
// src/memory/observational.ts:875-931
async function reflectObservations(args): Promise<Observation[] | undefined> {
  // ...
  await replaceSessionObservations(session, sessionId, [reflected]);
  return [reflected];
}
```

This is a session-scoped operation. Other sessions' rows in the database are never touched.

Sources: [src/memory/observational.ts:875-932](), [src/memory/observation-groups.ts:219-254]()

---

## Global Cross-Session Reflection

`reflectAllObservations()` (exposed as `duet memory reflect`) condenses the entire global pool. It differs from in-session reflection in important ways:

### Eligibility Rules (enforced by `planReflectionBatches`)

| Row type | Rule |
|---|---|
| Global reflection row (`kind="reflection"`, `sessionId="__global_reflection__"`) | **Always preserved.** Re-reflecting condensed text degrades specificity. |
| Local reflection row (`kind="reflection"` with real `sessionId`) | **Eligible.** `duet memory reflect` breaks these into atomic global rows. |
| Raw observation row younger than 3 days | **Preserved.** Resume-info-loss risk too high. |
| Raw observation row older than 3 days | **Eligible.** |

Eligible rows are sorted chronologically and packed into batches up to `batchTokens` (default = reflection trigger). The 3-day minimum age is documented with explicit tradeoff reasoning: the specifics of a session still matter for resume continuity within 3 days; beyond that, the higher-level shape the reflector captures is what survives in human memory too.

Sources: [src/memory/observational.ts:940-1001](), [src/memory/observational.ts:1113-1160](), [test/memory-reflect-planner.test.ts:48-80]()

### Batch Processing

Each eligible batch runs through `reflectBatch()`:

1. A `generateStructuredOutput` call produces an array of atomic rows.
2. Each row is sanitized (lines capped at 10,000 chars) and trimmed to a per-row share of `targetTokens`.
3. A combined budget cap drops trailing rows if cumulative tokens would exceed the caller's `targetTokens`.
4. Each surviving row becomes its own `Observation` with `sessionId = "__global_reflection__"` and tags including `"global-prune"`.

After all batches complete, `replaceAllObservations()` atomically swaps the entire store: eligible rows disappear and new global reflection rows appear in a single transaction. This prevents peer CLIs from seeing a half-pruned pool.

Sources: [src/memory/observational.ts:1261-1315](), [src/memory/storage.ts:241-258]()

---

## The Embedding Backfill Worker

`EmbeddingBackfillWorker` runs as a persistent background loop started by `loadStoredMemory()`. Its contract: never block a foreground turn.

### Tick Shape

```
while not aborted:
  withDb(session):                    # acquires cross-process lock
    loop:
      SELECT observations WHERE no embedding AND id NOT IN cooldown
        ORDER BY priority DESC, created_at DESC
        LIMIT 50
      if empty: break
      embed(batch.map(r => r.content))  # POST /api/v1/embed
      INSERT INTO observation_embeddings (ON CONFLICT DO UPDATE)
  sleep(10s)
```

The worker exits `withDb` after draining the current batch so the idle-close timer releases the cross-process lock. A peer duet CLI can acquire it between drain ticks.

### Failure Isolation

- Per-embedding errors log and back off for 60 seconds before the next tick.
- `EmbeddingUnavailableError` (missing `DUET_API_KEY`, 4xx from the endpoint) propagates as a typed error; callers degrade to keyword-only retrieval rather than crashing.
- 5xx responses retry with exponential backoff up to 3 attempts.
- A per-id cooldown (default 5 minutes) prevents unbounded hot loops when the reflector's delete-and-reinsert cycle keeps wiping a row's embedding between drain ticks.
- The log file rotates at 1 MB (`<path>.1` keeps one prior rotation) to prevent unbounded growth.

Sources: [src/memory/embedding-worker.ts:83-273](), [src/memory/embedding.ts:55-176]()

### No FK, No Orphan Risk

Migration 7 dropped the foreign key from `observation_embeddings` to `observations`. An embedding row can survive after its parent observation is deleted (e.g., by a reflection replace). This is intentional: the orphan is harmless — recall queries JOIN back to `observations` and filter it out — and avoids the cascade-delete race that would otherwise force a re-embed on every reflection cycle.

Sources: [src/memory/embedding-worker.ts:216-224]()

---

## Context Pack Rendering and Cache Stability

The frozen context pack — the memory prefix rendered above the message tail — is rebuilt by `rebuildMemoryContextPack()` at exactly three moments:

1. `loadStoredMemory()` finishes — initial seed before turn 1.
2. The reflector replaces observations — condensed view changed.
3. The wire-shaping eviction horizon advances mid-turn — prompt cache is already invalidating, so the refresh piggybacks on a cache miss the model is already paying.

All other paths (observer appending a row, `recall_memory` tool returning rows) deliberately do NOT refresh. This keeps the prefix content-deterministic between compaction events so the provider's prompt cache survives turn over turn.

The pack has two layers, rendered in order:

- `<global_observations>` — highest-ranked cross-session rows, excluded from the current session, greedy-fitted to `globalContextTokenBudget` (default ~15k at 200k context). Ranked by `ln(priority) + ln(kindBias) + lastUsedAt / halfLife` in SQL; reflections rank higher via `reflectionBias` (default 1.3×).
- `<local_observations>` — current session's rows in chronological order, unranked. Represents the session's own compaction summary.

Sources: [src/memory/context-pack.ts:1-54](), [src/memory/loader.ts:1-78](), [src/memory/observational.ts:557-607]()

---

## Storage Schema Invariants

The `observations` table is the single source of truth. Key columns:

| Column | Purpose |
|---|---|
| `id` | `mem_<nanoid(12)>` — stable identifier used by range markers and `usedObservationIds` |
| `session_id` | Session owner. `NULL` for pre-session rows; `__global_reflection__` for global prune reflections |
| `kind` | `"observation"` or `"reflection"` |
| `priority` | `"high"` / `"medium"` / `"low"` — inferred from emoji in content (`🔴`→high, `🟡`→medium) |
| `last_used_at` | Bumped by `bumpLastUsed()` when `usedObservationIds` cites the row |
| `content` | The observation text, wrapped in `<observation-group>` |
| `tags_json` | `["observational-memory"]` base; reflections add `"reflection"` and optionally `"global-prune"` |

`observation_embeddings` stores the 3072-dimension vectors with the model tag echoed from the server. An `ON CONFLICT DO UPDATE` upsert means a re-embed after a model swap safely updates the stale row.

Sources: [src/memory/storage.ts:305-355](), [src/memory/migrations.ts:44-60]()

---

## Failure Isolation Summary

| Failure | Behavior |
|---|---|
| Observer LLM call fails | `activateObservations` propagates the error; `updateObservationalMemory` throws; turn logs but user reply is not blocked by default (runner wraps the update in a background task) |
| Reflector produces over-budget output | Retry with explicit token count, then hard `trimObservationTextToTokenBudget` |
| `withDb` lock contention (peer CLI holds the lock) | Returns `undefined`; `appendObservation`, `bumpLastUsed`, `replaceSessionObservations` all no-op silently. A single warning is surfaced via `onWarn` the first time this happens. |
| Corrupted PGlite directory | `quarantineDataDirectory` renames the directory aside and starts fresh; `onRecover` is called with the backup path |
| `DUET_API_KEY` missing | `EmbeddingUnavailableError` thrown; backfill worker backs off; recall degrades to keyword-only |
| Embedding endpoint 5xx | Exponential backoff, up to 3 retries; then worker sleeps 60s |
| Global reflect produces empty batch | Eligible rows are left unprocessed; store is not written; next `duet memory reflect` run retries them |

The design principle throughout is that memory bookkeeping failures must never crash or visibly stall a foreground turn. The observation pipeline runs after the actor turn completes; the embedding worker is fully decoupled; and all storage calls treat lock contention as a silent no-op rather than an error.

Sources: [src/memory/storage.ts:144-160](), [src/memory/embedding-worker.ts:124-160](), [src/memory/storage.ts:54-129]()