# Recall & the Frozen Context Pack — What Survives Into Every Prompt

> Every turn is prefixed with a frozen two-layer memory pack: global cross-session observations ranked by recency half-life, and local session compaction. The pack rebuilds only on three specific events (initial load, reflector replacement, wire-shaping eviction) so the provider's prompt cache survives turn-over-turn. recall_memory tool uses hybrid RRF retrieval (pgvector cosine + tsvector keyword) to surface anything that missed the pack. This page explains pack structure, rebuild triggers, cache stability invariant, and RRF fusion.

- Repository: dzhng/duet-agent
- GitHub: https://github.com/dzhng/duet-agent
- Human wiki: https://grok-wiki.com/public/wiki/dzhng-duet-agent-82dbe2572d3a
- Complete Markdown: https://grok-wiki.com/public/wiki/dzhng-duet-agent-82dbe2572d3a/llms-full.txt

## Source Files

- `src/memory/context-pack.ts`
- `src/memory/recall.ts`
- `src/memory/loader.ts`
- `src/memory/store.ts`
- `src/memory/pglite.ts`
- `evals/recall-memory-cross-session.eval.ts`
- `evals/recall-memory-implicit-triggers.eval.ts`
- `test/memory-recall.test.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [src/memory/context-pack.ts](src/memory/context-pack.ts)
- [src/memory/store.ts](src/memory/store.ts)
- [src/memory/loader.ts](src/memory/loader.ts)
- [src/memory/recall.ts](src/memory/recall.ts)
- [src/memory/session.ts](src/memory/session.ts)
- [src/memory/pglite.ts](src/memory/pglite.ts)
- [evals/recall-memory-cross-session.eval.ts](evals/recall-memory-cross-session.eval.ts)
- [evals/recall-memory-implicit-triggers.eval.ts](evals/recall-memory-implicit-triggers.eval.ts)
- [test/memory-recall.test.ts](test/memory-recall.test.ts)
</details>

# Recall & the Frozen Context Pack — What Survives Into Every Prompt

Every agent turn is prefixed with a two-layer, frozen memory pack that is inserted between the system prompt and the live message tail. The pack is built once from the durable PGlite observation store and held byte-identical across turns so the provider's prompt cache survives without a single extra dollar of re-encoding. When the model needs something that didn't make it into the pack — older sessions, long-tail observations, exact code symbols — it calls the `recall_memory` tool, which runs a hybrid Reciprocal Rank Fusion (RRF) search combining pgvector cosine similarity and PostgreSQL full-text (`tsvector`) ranking.

This page explains how the pack is structured, when it is rebuilt, why the rebuild triggers are deliberately narrow, and how RRF fusion fills the gap between the frozen pack and the full durable store.

---

## The Two-Layer Pack

The frozen pack is the single piece of memory state kept in-process by the runner. Everything else — raw observations, rankings, vectors — lives in PGlite and is read on demand.

```
┌─────────────────────────────────────┐
│          System prompt              │
├─────────────────────────────────────┤
│  GLOBAL layer  (cross-session,      │
│                ranked by score,     │
│                budget-fitted)       │
├─────────────────────────────────────┤
│  LOCAL layer   (current session,    │
│                chronological,       │
│                full fidelity)       │
├─────────────────────────────────────┤
│  Live message tail  (grows/turn)    │
└─────────────────────────────────────┘
```

The two layers are intentionally separated so the model can tell what comes from this conversation versus what comes from accumulated cross-session knowledge.

### Global Layer

The global layer contains the highest-signal observations from every session except the caller's current one. Rows are ranked by a composite score computed entirely in SQL:

```
rank = ln(priority_weight) + ln(kind_bias) + last_used_at / recency_half_life_ms
```

`priority_weight` maps `high→3`, `medium→2`, `low→1`. `kind_bias` applies a configurable `reflectionBias` multiplier to `kind = 'reflection'` rows, so compacted reflections outrank raw observations. The `last_used_at / h` term is the log of the exponential decay `0.5^((now - last_used_at) / h)` — the constant `now` cancels across candidates, leaving a pure function of stored columns that the `idx_obs_kind_priority_lastused` index can cover directly.

After Postgres returns the ranked candidates, JavaScript performs a greedy token-budget fit. It walks the ranked list, skipping rows that would overflow the budget but continuing past them — a smaller row later in the ranking may still fit, and abandoning the budget early just because the top-ranked row is a long reflection would waste prompt space.

Sources: [src/memory/loader.ts:79-138](src/memory/loader.ts)

### Local Layer

The local layer contains every observation the current session wrote, ordered chronologically and loaded in full regardless of size. It represents the session's own compaction summary — the observations and reflections that replaced earlier transcript. Bounding the local layer is the observer/reflector pipeline's job; their thresholds keep the set from unbounded growth, and a reflection condenses it back down when it gets too large.

Legacy rows with a `NULL` `session_id` (written before session-id tracking was introduced) are kept in the global pool and excluded from the current session's local pack. The loader uses `IS DISTINCT FROM` rather than `<>` so `NULL = sessionId` evaluates to `NULL` instead of silently dropping those rows from every query.

Sources: [src/memory/loader.ts:153-166](src/memory/loader.ts)

---

## The MemoryContextCache: In-Process State

`MemoryContextCache` is the runner's sole in-process memory state. It wraps a single `ContextPack` struct:

```typescript
// src/memory/store.ts
export interface ContextPack {
  /** Cross-session ranked memory; rendered above the local section. */
  global: Observation[];
  /** Current session's chronological compaction summary; rendered below global. */
  local: Observation[];
}
```

The cache exposes only `setContextPack()` and `getContextPack()`. The rendered prefix stays byte-identical between rebuild events because the transform reads `getContextPack()` on every dispatch without recomputing it. Only `rebuildMemoryContextPack()` may call `setContextPack()`.

Sources: [src/memory/store.ts:1-51](src/memory/store.ts)

---

## Three Rebuild Triggers — And Nothing Else

The pack is rebuilt on exactly three events. The comment in `context-pack.ts` names them explicitly:

> 1. `loadStoredMemory()` finishes — initial seed.
> 2. The reflector replaces observations — condensed view changed.
> 3. The wire-shaping eviction horizon advances — prompt cache is already invalidating, so piggyback the refresh for free.

Any other path — the observer appending a row mid-turn, a `recall_memory` tool call returning rows — deliberately does **not** refresh the pack. The prefix stays stable so the provider's prompt cache survives.

Failure during rebuild is non-fatal: a missing database, a planner glitch, or a corrupted index leaves the previous pack in place. The runner logs and continues; the user's turn is never blocked behind memory bookkeeping.

Sources: [src/memory/context-pack.ts:6-24](src/memory/context-pack.ts)

### Why This Invariant Matters

Provider prompt caches (Anthropic, OpenAI) hash the prefix bytes. If the prefix changes on every turn, every turn pays the full re-encoding cost. By rebuilding only on compaction events — which happen orders of magnitude less often than turns — the system pays exactly one cache invalidation per compaction, not one per turn.

The consequence is that new observations written by the observer mid-session are **not visible in the pack until the next rebuild trigger**. This is intentional. The model can still retrieve them immediately via `recall_memory`.

---

## Pack Rebuild Implementation

`rebuildMemoryContextPack` holds the open PGlite handle across both layer queries using one `withDb` call so the cross-process lock is acquired just once:

```typescript
// src/memory/context-pack.ts
await options.session.withDb(async (db) => {
  const [globalPack, localPack] = await Promise.all([
    loadGlobalPack(db, { ... }),
    options.sessionId !== undefined
      ? loadLocalPack(db, { sessionId: options.sessionId })
      : Promise.resolve([]),
  ]);
  options.cache.setContextPack({ global: globalPack, local: localPack });
});
```

The local pack is skipped when the runner has no session id (one-shot tools, tests). The global pack always runs; its `excludeSessionId` parameter is optional and meaningful as `undefined` (includes all rows, used by unrestricted recall).

Sources: [src/memory/context-pack.ts:36-53](src/memory/context-pack.ts)

---

## The `recall_memory` Tool: Hybrid RRF Retrieval

When the model calls `recall_memory`, it reaches into the full durable store for anything that missed the frozen pack. The retrieval runs two search paths in parallel inside a single `withDb` call, then fuses the ranked lists via Reciprocal Rank Fusion.

### Search Paths

| Path | Mechanism | Strength | Fallback behavior |
|---|---|---|---|
| Keyword (tsvector) | `websearch_to_tsquery` + `ts_rank` + GIN index | Exact tokens, proper nouns, code symbols | Always runs if query is non-empty |
| Vector (pgvector) | `embedding <=> $1::vector` cosine distance + HNSW index | Fuzzy paraphrases, semantic similarity | Skipped if no `embed` function provided, or if the embed call throws |

Each path fetches up to `PER_PATH_TOP_K = 30` candidates. The paths degrade independently: if the embedding endpoint is unavailable, the vector path is dropped and the function returns keyword-only results. If both paths fail to return anything, the function returns an empty list rather than throwing.

Sources: [src/memory/recall.ts:28-111](src/memory/recall.ts)

### Reciprocal Rank Fusion

RRF is a score-free rank aggregation method. Each candidate receives a contribution from each ranked list it appears in:

```
score(id) += 1 / (RRF_K + rank)
```

`RRF_K = 60` matches the value recommended in the original Cormack et al. paper and used by systems like gbrain and Zep. Smaller `k` weights the top-of-list more aggressively; 60 trades some top-1 sharpness for stability across lists with different score scales.

Candidates that rank highly in both lists accumulate contributions from both, rising above candidates that only appear in one. Ties resolve by first-seen insertion order, which mirrors gbrain's tiebreak convention.

```typescript
// src/memory/recall.ts
export function reciprocalRankFusion(rankedLists: ScoredHit[][]): string[] {
  const scores = new Map<string, { score: number; firstSeen: number }>();
  let order = 0;
  for (const list of rankedLists) {
    for (const hit of list) {
      const contribution = 1 / (RRF_K + hit.rank);
      const existing = scores.get(hit.id);
      if (existing) {
        existing.score += contribution;
      } else {
        scores.set(hit.id, { score: contribution, firstSeen: order++ });
      }
    }
  }
  // sort descending by score, then by first-seen for ties
  ...
}
```

After fusion, IDs are passed to `hydrate()`, which fetches full `Observation` rows from the database in a single `WHERE id = ANY($1::text[])` query and reorders them to match the fused ranking.

Sources: [src/memory/recall.ts:229-269](src/memory/recall.ts)

### Scope Filtering

The `recallMemory` function accepts a `scope` parameter controlling which sessions are searched:

| Scope | SQL behavior |
|---|---|
| `"all"` | No session filter applied |
| `"session"` | `session_id = $N` |
| `"global"` | `session_id IS DISTINCT FROM $N` (includes legacy `NULL` rows) |

The `IS DISTINCT FROM` form is required for the global scope so that pre-session-id legacy rows with `NULL` session ids remain in the pool.

Sources: [src/memory/recall.ts:193-216](src/memory/recall.ts)

---

## PGlite Handle Management: MemorySession

`MemorySession` manages the single PGlite handle per data directory with refcounted opens and an idle-close timer. The idle-close default is 2 seconds, keeping short write bursts (observer + reflector + embedding upserts in one turn) on a single open handle without permanently holding the cross-process lock against a second CLI process.

```
withDb call 1 ─┐
withDb call 2 ─┤─→ one PGlite.create ─→ fn1, fn2 run concurrently
               └─┐
                 └─ idle timer fires after 2s with no in-flight ops
                    → db.close() + lock released
```

The cross-process open-lock (`~/.duet/memory.db/.duet-open.lock`) ensures two `duet` CLI processes cannot both call `PGlite.create` on the same fresh data directory and corrupt each other's migrations. The lock stores the holder's PID; stale locks from crashed processes are detected via `process.kill(pid, 0)` and taken over atomically.

Sources: [src/memory/session.ts:61-115](src/memory/session.ts), [src/memory/pglite.ts:638-684](src/memory/pglite.ts)

---

## Eval Coverage

Two eval files validate the `recall_memory` trigger behavior end-to-end against the real Anthropic API inside Docker:

**`recall-memory-cross-session.eval.ts`** — explicit trigger scenarios: past-tense markers ("yesterday", "previous session", "already done X"). Asserts that at least one `recall_memory` tool call fires on cross-session questions and exactly zero fire on a self-contained arithmetic prompt.

**`recall-memory-implicit-triggers.eval.ts`** — implicit trigger scenarios: un-anchored named referents (a pet name, a colleague, a release ID, a codenamed project artifact) without past-tense markers. These are harder: the model must infer that "Doughy" refers to a sourdough starter seeded in the durable store rather than hedging or answering generically. The eval documents that `opus-4.7` (the production default) is expected to fail all five implicit positives while `sonnet-4.6` handles them; the current prompt layer is the Pareto-best prose found after iterating against both.

Sources: [evals/recall-memory-cross-session.eval.ts:19-30](evals/recall-memory-cross-session.eval.ts), [evals/recall-memory-implicit-triggers.eval.ts:62-115](evals/recall-memory-implicit-triggers.eval.ts)

---

## Summary

The frozen context pack is the key mechanism that keeps prompt costs predictable as durable memory accumulates across sessions. By rebuilding only on three precisely-defined compaction events and holding the pack byte-identical between them, the system pays exactly one prompt-cache invalidation per compaction. The global layer's SQL-ranked, budget-greedy fitting with recency half-life decay ensures the highest-signal cross-session signal fills the available token budget; the local layer preserves the current session's compaction summary in full. Everything that misses the pack — long-tail observations, exact code symbols, older sessions — is recoverable through `recall_memory`'s hybrid RRF retrieval, which degrades gracefully to keyword-only when the embedding endpoint is unavailable. The invariant that only three events may call `setContextPack()` is what makes this system coherent; violating it by refreshing on observer writes or recall tool calls would silently invalidate the prompt cache on every turn.

Sources: [src/memory/store.ts:6-16](src/memory/store.ts)
