Agent-readable wiki
agentmemory Mental Model Wiki
agentmemory is a persistent, self-managing memory server for AI coding agents — it hooks into any MCP client or agent runtime and automatically stores, searches, decays, and consolidates memories across sessions, requiring zero external databases.
Pages
- The Mental Model — How agentmemory ThinksThe simplest useful model of the system: one long-lived Node process (the iii worker) owns all state, every agent interaction flows through REST or MCP, and memories are first-class objects with confidence scores, TTLs, and graph relationships — not raw text logs.
- Memory Lifecycle — Write, Age, Decay, ForgetHow a memory is born (mem::remember), enriched with concepts and graph edges, ages through hot/warm/cold retention tiers via exponential decay, and is eventually evicted or crystallized into long-term patterns — all driven by access frequency, TTL, and consolidation pipelines.
- State Layer & Hybrid SearchAll persistent data lives in the iii-engine KV store (StateKV wrapping state::get/set/list triggers); reads go through HybridSearch which fuses BM25 (weight 0.4), vector cosine (weight 0.6), and graph traversal (weight 0.3) via Reciprocal Rank Fusion, with an optional reranker pass on top.
- LLM & Embedding Providers — BYOK Designagentmemory is fully provider-neutral: LLM calls (summarize, compress, graph-extract) route through a resilient fallback chain across Anthropic, OpenAI, OpenRouter, MiniMax, or a noop stub; embeddings (text and image/CLIP) are independently switchable across OpenAI, Cohere, Gemini, Voyage, or a local Xenova model — the system degrades to BM25-only if no embedding key is set.
- Hooks & MCP — How Agents ConnectAgents connect via two surfaces: (1) MCP server exposing 53 tools over stdio or HTTP transport, and (2) shell hooks (prompt-submit, post-tool-use, session-start/end, pre-compact, stop) that fire as thin HTTP POSTs to the local REST API at :3111 — the hooks are agent-installed scripts that bridge the agent runtime event stream into the memory server without requiring code changes inside the agent.
- Invariants, Failure Modes & Safe-Change RulesThe core invariants: state lives exclusively in iii-engine (no local SQLite or Postgres required); the worker suppresses unhandledRejection to survive iii SDK 30s timeouts under write bursts; BM25 index is always present so search never fully fails; circuit-breakers isolate provider outages; and the sdk-guard hook prevents recursive hook invocations. Safe-change rules: embedding dimension changes require index migration; adding a new function requires both registerXxx in index.ts and a tools-registry entry; provider fallback order is config-driven, not hardcoded.
Complete Markdown
# agentmemory Mental Model Wiki
> agentmemory is a persistent, self-managing memory server for AI coding agents — it hooks into any MCP client or agent runtime and automatically stores, searches, decays, and consolidates memories across sessions, requiring zero external databases.
## Context Links
- [Agent index](https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/llms.txt)
- [Human interactive wiki](https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc)
- [GitHub repository](https://github.com/rohitg00/agentmemory)
## Repository Metadata
- Repository: rohitg00/agentmemory
- Generated: 2026-05-21T07:09:45.903Z
- Updated: 2026-05-21T19:26:20.598Z
- Runtime: Claude Code
- Format: Mental Model
- Pages: 6
## Page Index
- 01. [The Mental Model — How agentmemory Thinks](https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/01-the-mental-model-how-agentmemory-thinks.md) - The simplest useful model of the system: one long-lived Node process (the iii worker) owns all state, every agent interaction flows through REST or MCP, and memories are first-class objects with confidence scores, TTLs, and graph relationships — not raw text logs.
- 02. [Memory Lifecycle — Write, Age, Decay, Forget](https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/02-memory-lifecycle-write-age-decay-forget.md) - How a memory is born (mem::remember), enriched with concepts and graph edges, ages through hot/warm/cold retention tiers via exponential decay, and is eventually evicted or crystallized into long-term patterns — all driven by access frequency, TTL, and consolidation pipelines.
- 03. [State Layer & Hybrid Search](https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/03-state-layer-hybrid-search.md) - All persistent data lives in the iii-engine KV store (StateKV wrapping state::get/set/list triggers); reads go through HybridSearch which fuses BM25 (weight 0.4), vector cosine (weight 0.6), and graph traversal (weight 0.3) via Reciprocal Rank Fusion, with an optional reranker pass on top.
- 04. [LLM & Embedding Providers — BYOK Design](https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/04-llm-embedding-providers-byok-design.md) - agentmemory is fully provider-neutral: LLM calls (summarize, compress, graph-extract) route through a resilient fallback chain across Anthropic, OpenAI, OpenRouter, MiniMax, or a noop stub; embeddings (text and image/CLIP) are independently switchable across OpenAI, Cohere, Gemini, Voyage, or a local Xenova model — the system degrades to BM25-only if no embedding key is set.
- 05. [Hooks & MCP — How Agents Connect](https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/05-hooks-mcp-how-agents-connect.md) - Agents connect via two surfaces: (1) MCP server exposing 53 tools over stdio or HTTP transport, and (2) shell hooks (prompt-submit, post-tool-use, session-start/end, pre-compact, stop) that fire as thin HTTP POSTs to the local REST API at :3111 — the hooks are agent-installed scripts that bridge the agent runtime event stream into the memory server without requiring code changes inside the agent.
- 06. [Invariants, Failure Modes & Safe-Change Rules](https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/06-invariants-failure-modes-safe-change-rules.md) - The core invariants: state lives exclusively in iii-engine (no local SQLite or Postgres required); the worker suppresses unhandledRejection to survive iii SDK 30s timeouts under write bursts; BM25 index is always present so search never fully fails; circuit-breakers isolate provider outages; and the sdk-guard hook prevents recursive hook invocations. Safe-change rules: embedding dimension changes require index migration; adding a new function requires both registerXxx in index.ts and a tools-registry entry; provider fallback order is config-driven, not hardcoded.
## Source File Index
- `iii-config.yaml`
- `README.md`
- `src/cli/connect/claude-code.ts`
- `src/config.ts`
- `src/functions/access-tracker.ts`
- `src/functions/auto-forget.ts`
- `src/functions/consolidation-pipeline.ts`
- `src/functions/crystallize.ts`
- `src/functions/evict.ts`
- `src/functions/graph-retrieval.ts`
- `src/functions/graph.ts`
- `src/functions/migrate-vector-index.ts`
- `src/functions/remember.ts`
- `src/functions/retention.ts`
- `src/health/monitor.ts`
- `src/health/thresholds.ts`
- `src/hooks/post-tool-use.ts`
- `src/hooks/pre-compact.ts`
- `src/hooks/prompt-submit.ts`
- `src/hooks/sdk-guard.ts`
- `src/hooks/session-end.ts`
- `src/hooks/session-start.ts`
- `src/index.ts`
- `src/mcp/server.ts`
- `src/mcp/tools-registry.ts`
- `src/providers/circuit-breaker.ts`
- `src/providers/embedding/index.ts`
- `src/providers/embedding/local.ts`
- `src/providers/fallback-chain.ts`
- `src/providers/index.ts`
- `src/providers/noop.ts`
- `src/providers/resilient.ts`
- `src/state/hybrid-search.ts`
- `src/state/index-persistence.ts`
- `src/state/kv.ts`
- `src/state/reranker.ts`
- `src/state/schema.ts`
- `src/state/search-index.ts`
- `src/state/vector-index.ts`
- `src/triggers/api.ts`
- `src/types.ts`
---
## 01. The Mental Model — How agentmemory Thinks
> The simplest useful model of the system: one long-lived Node process (the iii worker) owns all state, every agent interaction flows through REST or MCP, and memories are first-class objects with confidence scores, TTLs, and graph relationships — not raw text logs.
- Page Markdown: https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/01-the-mental-model-how-agentmemory-thinks.md
- Generated: 2026-05-21T07:09:24.152Z
### Source Files
- `src/index.ts`
- `src/types.ts`
- `src/config.ts`
- `src/state/schema.ts`
- `README.md`
- `iii-config.yaml`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/index.ts](src/index.ts)
- [src/types.ts](src/types.ts)
- [src/config.ts](src/config.ts)
- [src/state/schema.ts](src/state/schema.ts)
- [src/state/hybrid-search.ts](src/state/hybrid-search.ts)
- [src/functions/observe.ts](src/functions/observe.ts)
- [src/functions/remember.ts](src/functions/remember.ts)
- [src/triggers/api.ts](src/triggers/api.ts)
- [src/mcp/server.ts](src/mcp/server.ts)
- [iii-config.yaml](iii-config.yaml)
</details>
# The Mental Model — How agentmemory Thinks
agentmemory is a long-lived Node.js memory service that captures every event in an agent's working session, promotes the most durable observations into typed Memory objects, and makes all of it retrievable through a triple-stream hybrid search (BM25 + vector + graph). Understanding this page means you can predict which data path any request takes, why memory objects behave differently from raw observations, and where the system can degrade gracefully versus where it refuses to start.
The key architectural invariant: **one process owns all mutable state**. There is no distributed write path. Every agent hook, every REST call, and every MCP tool call ultimately routes through a single registered iii worker, which serializes writes through a keyed mutex and commits to a file-backed KV store. This design makes state consistent at the cost of being single-host only, and it is what makes the decay timers, dedup maps, and in-memory search indexes safe to reason about without distributed locking.
---
## 1. The iii Worker as the Spine
agentmemory is structured as an **iii (triple-i) worker**, registered at startup against the local iii engine via WebSocket:
```ts
// src/index.ts:166-186
const sdk = registerWorker(config.engineUrl, {
workerName: "agentmemory",
invocationTimeoutMs: 180000,
...
});
```
The iii engine (configured in `iii-config.yaml`) is itself a local multi-service runner that provides:
| iii service | Purpose |
|-------------|---------|
| `iii-http` | REST API on port 3111 |
| `iii-state` | File-backed KV store (`./data/state_store.db`) |
| `iii-queue` | Async function dispatch queue |
| `iii-pubsub` | Local pub/sub |
| `iii-stream` | WebSocket streaming on port 3112 |
| `iii-cron` | Scheduled trigger runner |
| `iii-observability` | Metrics and trace collection |
| `iii-exec` | Hot-reload watcher (`src/**/*.ts` → `dist/index.mjs`) |
Sources: [iii-config.yaml:1-53]()
The worker registers every capability as a named function (e.g., `mem::observe`, `mem::remember`, `mem::smart-search`) and two trigger surfaces: REST endpoints and MCP endpoints. Both surfaces call the same underlying functions — the transport layer is a thin wrapper.
Sources: [src/index.ts:204-342](), [src/triggers/api.ts:1-28](), [src/mcp/server.ts:42-58]()
---
## 2. The Two Tiers of Memory Objects
agentmemory maintains a strict two-tier hierarchy. Confusing the tiers is the most common source of wrong mental models about what the system stores.
```text
┌────────────────────────────────────────────────────────┐
│ Tier 1: Observations (ephemeral captures) │
│ RawObservation → CompressedObservation │
│ Keyed by session: mem:obs:<sessionId> │
│ Lifecycle: captured → (optionally) LLM-compressed │
│ → BM25-indexed → vector-embedded │
└──────────────────────────┬─────────────────────────────┘
│ promote via mem::remember
│ or consolidation pipeline
┌──────────────────────────▼─────────────────────────────┐
│ Tier 2: Memories (durable, versioned facts) │
│ Memory (typed, strength-scored, expirable) │
│ Keyed globally: mem:memories │
│ Lifecycle: created → reinforced/superseded → evicted │
└────────────────────────────────────────────────────────┘
```
### 2.1 Observations
Every hook event (tool use, prompt submit, session start/stop) arrives as a `RawObservation`. Unless `AGENTMEMORY_AUTO_COMPRESS=true`, it is immediately converted via **synthetic compression** (zero-LLM, rule-based) into a `CompressedObservation` with:
- `type`: one of `file_read`, `file_write`, `command_run`, `search`, `decision`, `discovery`, `error`, `conversation`, etc.
- `facts`: string array of extracted key claims
- `narrative`: short prose summary
- `concepts`: normalized entity list
- `importance`: numeric weight (0–10)
- `confidence`: optional float
Sources: [src/types.ts:29-79]()
Observations are keyed per session (`mem:obs:<sessionId>`) and are subject to a per-session cap (`MAX_OBS_PER_SESSION`, default 500). Sources: [src/state/schema.ts:6](), [src/config.ts:145]()
### 2.2 Memories
A `Memory` is a promoted, versioned fact with these key fields:
| Field | Type | Meaning |
|-------|------|---------|
| `type` | `"pattern" \| "preference" \| "architecture" \| "bug" \| "workflow" \| "fact"` | Semantic category |
| `strength` | `number` (default 7) | Recall priority weight; higher = retained longer |
| `version` | `number` | Monotonically incremented on supersession |
| `isLatest` | `boolean` | Only `true` entries participate in search |
| `forgetAfter` | `ISO string \| undefined` | TTL; absent means permanent |
| `parentId` / `supersedes` | optional IDs | Immutable chain of versions |
| `relatedIds` | optional IDs | Soft cross-memory links |
| `sourceObservationIds` | optional IDs | Provenance back to raw events |
Sources: [src/types.ts:81-101]()
When a new memory is saved and its Jaccard similarity to an existing memory exceeds 0.7, the old memory is marked `isLatest: false` and the new one inherits `version + 1` and a `parentId` pointing to its predecessor. This creates an **immutable version chain** rather than overwriting.
Sources: [src/functions/remember.ts:52-98]()
---
## 3. Interaction Surfaces
All external interaction flows through one of two surfaces. Internally they call the same registered functions.
```mermaid
flowchart LR
subgraph "Agent / Claude Code"
H[Claude Code Hooks<br/>PreToolUse · PostToolUse<br/>SessionStart · Stop]
M[MCP Client<br/>npx @agentmemory/mcp]
R[REST Client<br/>curl / SDK / viewer]
end
subgraph "agentmemory Node process"
API[REST Triggers<br/>src/triggers/api.ts<br/>:3111]
MCP[MCP Endpoints<br/>src/mcp/server.ts]
FN[Registered Functions<br/>mem::observe<br/>mem::remember<br/>mem::smart-search<br/>…60+ more]
KV[StateKV<br/>src/state/kv.ts]
IDX[In-memory Indexes<br/>BM25 · VectorIndex]
end
subgraph "iii Engine"
STATE[iii-state<br/>KV file store]
QUEUE[iii-queue]
STREAM[iii-stream :3112]
end
H -->|HTTP POST| API
R -->|HTTP| API
M -->|HTTP| MCP
API --> FN
MCP --> FN
FN --> KV
FN --> IDX
KV --> STATE
IDX -->|persisted on shutdown| STATE
```
Sources: [src/index.ts:340-342](), [src/triggers/api.ts:1-5](), [src/mcp/server.ts:42-58]()
The REST surface at startup advertises **124 endpoints** and the MCP surface exposes **tools, 6 resources, and 3 prompts** — both are populated from the same registered function list. Sources: [src/index.ts:484-488]()
---
## 4. The Observation Lifecycle
Understanding observation flow prevents surprises about when data appears in search.
```mermaid
stateDiagram-v2
[*] --> RawCapture: hook fires (HookPayload)
RawCapture --> Dedup: DedupMap hash check
Dedup --> Dropped: duplicate within session
Dedup --> Strip: pass dedup
Strip --> SyntheticCompress: stripPrivateData + buildSyntheticCompression
Strip --> LLMCompress: if AGENTMEMORY_AUTO_COMPRESS=true
SyntheticCompress --> KVStore: kv.set(mem:obs:sessionId, obsId, obs)
LLMCompress --> KVStore
KVStore --> BM25Index: getSearchIndex().add(obs)
KVStore --> VectorIndex: embed + store (if provider configured)
BM25Index --> [*]: searchable immediately
VectorIndex --> [*]: searchable after embed()
```
Sources: [src/functions/observe.ts:36-130](), [src/config.ts:266-287]()
Key invariant: **BM25 indexing is synchronous with the KV write**; vector embedding is fire-and-forget. If the embedding provider is unavailable, the observation is still fully searchable via BM25.
---
## 5. The Triple-Stream Search Engine
Search is the most mechanically interesting part of agentmemory. Every `mem::smart-search` call runs three retrieval streams in parallel and fuses their ranks using Reciprocal Rank Fusion (RRF, k=60):
| Stream | Implementation | Default weight |
|--------|---------------|----------------|
| BM25 | `SearchIndex` (in-process) | 0.4 (`BM25_WEIGHT`) |
| Vector | `VectorIndex` cosine similarity | 0.6 (`VECTOR_WEIGHT`) |
| Graph | `GraphRetrieval` entity walk | 0.3 (`AGENTMEMORY_GRAPH_WEIGHT`) |
```ts
// src/state/hybrid-search.ts:82-115
const bm25Results = this.bm25.search(query, limit * 2);
queryEmbedding = await this.embeddingProvider.embed(query);
vectorResults = this.vector.search(queryEmbedding, limit * 2);
graphResults = await this.graphRetrieval.searchByEntities(entities, 2, limit);
```
After fusion, an optional cross-encoder rerank pass can be enabled via `RERANK_ENABLED=true`. The result type carries all three individual scores plus the combined score, which lets callers inspect retrieval provenance.
Sources: [src/state/hybrid-search.ts:22-127](), [src/types.ts:250-258]()
When no embedding provider is configured (no API key present), the system falls back to **BM25-only mode** and logs `BM25+Graph search active` at boot rather than `Triple-stream`. This is not an error state — BM25 alone is fully functional. Sources: [src/index.ts:480-482]()
---
## 6. The Knowledge Graph Layer
When `GRAPH_EXTRACTION_ENABLED=true`, agentmemory maintains a property graph of entities extracted from observations. Nodes and edges are first-class objects:
| Type | Key fields |
|------|-----------|
| `GraphNode` | `type` (file/function/concept/error/decision/pattern/person/…), `name`, `properties`, `aliases`, `stale` |
| `GraphEdge` | `type` (uses/imports/modifies/causes/fixes/depends_on/related_to/…), `weight`, `tcommit`, `tvalid`, `tvalidEnd`, `isLatest`, `supersededBy` |
| `EdgeContext` | `reasoning`, `sentiment`, `alternatives`, `situationalFactors`, `confidence` |
The `tvalid`/`tvalidEnd` fields on edges implement **bi-temporal graph modeling**: each edge records when it became valid and when it stopped being valid, enabling `mem::temporal-query` to answer "what was the relationship between X and Y as of commit sha abc?" without destroying history.
Sources: [src/types.ts:362-431](), [src/types.ts:833-851]()
Graph search runs as the third stream in hybrid search: it extracts entity names from the query text, walks the graph from those anchors, and promotes observations linked to relevant graph nodes. Sources: [src/state/hybrid-search.ts:100-126]()
---
## 7. The Consolidation Memory Tiers
Beyond the basic observation/memory split, agentmemory implements a four-tier consolidation pipeline modeled loosely on cognitive memory systems:
```text
working → episodic → semantic → procedural
(slots, (Memory (SemanticMemory, (ProceduralMemory,
recent obs) objects) confirmed facts) step sequences)
```
| Tier | Type | Distinctive field |
|------|------|------------------|
| Working | `MemorySlot` | `pinned`, `readOnly`, `scope` (project/global) |
| Episodic | `Memory` | `strength`, `forgetAfter`, `version` chain |
| Semantic | `SemanticMemory` | `confidence`, `accessCount`, `lastAccessedAt` |
| Procedural | `ProceduralMemory` | `steps[]`, `triggerCondition`, `frequency` |
The consolidation pipeline (`mem::consolidate-pipeline`) runs every 2 hours by default (`CONSOLIDATION_INTERVAL_MS`) when `CONSOLIDATION_ENABLED=true`. It promotes frequently-reinforced episodic memories into semantic facts and extracts procedural patterns from repeated action sequences.
Sources: [src/types.ts:439-472](), [src/index.ts:531-539]()
---
## 8. Decay, Retention, and Forgetting
Memories are not immortal. Several decay mechanisms run on timers:
| Timer | Default interval | What it does |
|-------|-----------------|--------------|
| `mem::auto-forget` | 1 hour | Evicts memories where `forgetAfter < now` |
| `mem::lesson-decay-sweep` | 24 hours | Reduces `Lesson.confidence` by `decayRate` |
| `mem::insight-decay-sweep` | 24 hours | Same for `Insight` objects |
| Consolidation pipeline | 2 hours | Promotes + prunes stale episodic items |
Retention scores (`RetentionScore`) combine salience, temporal decay, and reinforcement boost into a single float. The `source` field (`"episodic" | "semantic"`) on `RetentionScore` tells the eviction loop which KV scope to target for deletion — missing on pre-0.8.10 entries, where both scopes must be probed for backwards compatibility.
Sources: [src/types.ts:853-876](), [src/index.ts:499-539]()
---
## 9. The Orchestration Layer
Beyond memory, agentmemory exposes a coordination layer for multi-agent workflows:
| Concept | TypeScript type | Role |
|---------|----------------|------|
| `Action` | `Action` | Named task with status (`pending→active→done→blocked→cancelled`) |
| `ActionEdge` | `ActionEdge` | Dependency relationships (`requires`, `unlocks`, `gated_by`, `conflicts_with`) |
| `Lease` | `Lease` | Mutex-style lock on an action by one agent |
| `Routine` | `Routine` | Named multi-step plan promoted from `ProceduralMemory` |
| `Signal` | `Signal` | Typed agent-to-agent message (`info/request/response/alert/handoff`) |
| `Checkpoint` | `Checkpoint` | Blocking gate (`ci/approval/deploy/external/timer`) |
| `Sentinel` | `Sentinel` | Watcher that triggers on `webhook/timer/threshold/pattern/approval` |
| `Sketch` | `Sketch` | Ephemeral draft that can be promoted or discarded |
| `Crystal` | `Crystal` | Distilled narrative from completed action sets |
Agents acquire leases to claim exclusive ownership of actions, send signals to hand off work, and resolve checkpoints when conditions are met. This lets a swarm of agents coordinate through shared memory without a central orchestrator.
Sources: [src/types.ts:585-737]()
---
## 10. State Ownership and KV Schema
All persistent state is stored under `mem:*` keys in the iii-state file store. The `KV` constant in `src/state/schema.ts` is the authoritative list:
```ts
// src/state/schema.ts:3-50 (selected)
export const KV = {
sessions: "mem:sessions",
observations: (sessionId) => `mem:obs:${sessionId}`,
memories: "mem:memories",
graphNodes: "mem:graph:nodes",
graphEdges: "mem:graph:edges",
semantic: "mem:semantic",
procedural: "mem:procedural",
actions: "mem:actions",
leases: "mem:leases",
signals: "mem:signals",
checkpoints: "mem:checkpoints",
retentionScores: "mem:retention",
slots: "mem:slots",
state: "mem:state", // system counters (disk size, flags)
...
}
```
The `StateScope` interface (`src/types.ts:884-886`) types the `mem:state` scope, currently holding `"system:currentDiskSize": number`. Every other scope is typed by its corresponding interface. This pattern means any new key added to `KV` without a matching TypeScript type is a maintainability gap, not a runtime error.
Sources: [src/state/schema.ts:3-50](), [src/types.ts:882-888]()
---
## 11. Startup Invariants and Failure Modes
At startup, agentmemory enforces several hard invariants before declaring itself ready:
**Vector dimension guard.** If the on-disk vector index was written with a different embedding model than the currently configured provider, the process refuses to start with a descriptive error rather than silently corrupting cosine similarity scores (cross-dimension dot products return 0, causing all affected observations to disappear from search). Setting `AGENTMEMORY_DROP_STALE_INDEX=true` bypasses this by discarding the persisted index and rebuilding from live observations over time.
Sources: [src/index.ts:362-410]()
**BM25 rebuild is fire-and-forget.** If the BM25 index is empty at boot (first run or cache deleted), index rebuilding is intentionally **not** awaited. On a large corpus with a rate-limited embedding endpoint, rebuilding can take hours. Blocking on it would leave the viewer server unbound for the full duration. Instead, search degrades gracefully under partial coverage while the index fills in asynchronously.
Sources: [src/index.ts:412-432]()
**Provider detection is opt-in safe.** If no LLM API key is present, the provider resolves to `"noop"` and LLM-backed compression and summarization are disabled entirely. The agent-sdk fallback (which spawns Claude Code child sessions) is guarded behind `AGENTMEMORY_ALLOW_AGENT_SDK=true` because those child sessions inherit Claude Code's Stop hook, which calls agentmemory, producing infinite recursion.
Sources: [src/config.ts:100-132]()
**Unhandled rejections are suppressed, not fatal.** The top-level `unhandledRejection` handler (rate-limited to one log per minute) prevents a single slow `state::set` timeout from crashing the long-lived service. This is the correct tradeoff for a memory daemon: one failed write should not destroy a session's ongoing memory stream.
Sources: [src/index.ts:112-129]()
---
## Summary
agentmemory is a single long-lived Node process whose entire surface area — REST, MCP, and background timers — routes through one set of registered iii worker functions. Memories are structured objects with typed categories, strength scores, TTLs, version chains, and graph relationships, not raw text appended to a log. Search is a three-stream fusion (BM25 + vector + graph), each independently degradable. The startup sequence enforces dimension-consistency on vector indexes and refuses to silently corrupt search rather than accepting mismatched embeddings. Any mental model that treats agentmemory as "a thing that appends hook events to a file" will mispredict retention behavior, search scoring, and why some observations disappear from recall after an embedding provider change.
Sources: [src/index.ts:131-560]()
---
## 02. Memory Lifecycle — Write, Age, Decay, Forget
> How a memory is born (mem::remember), enriched with concepts and graph edges, ages through hot/warm/cold retention tiers via exponential decay, and is eventually evicted or crystallized into long-term patterns — all driven by access frequency, TTL, and consolidation pipelines.
- Page Markdown: https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/02-memory-lifecycle-write-age-decay-forget.md
- Generated: 2026-05-21T07:09:45.901Z
### Source Files
- `src/functions/remember.ts`
- `src/functions/retention.ts`
- `src/functions/evict.ts`
- `src/functions/consolidation-pipeline.ts`
- `src/functions/crystallize.ts`
- `src/functions/auto-forget.ts`
- `src/functions/access-tracker.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/functions/remember.ts](src/functions/remember.ts)
- [src/functions/retention.ts](src/functions/retention.ts)
- [src/functions/evict.ts](src/functions/evict.ts)
- [src/functions/consolidation-pipeline.ts](src/functions/consolidation-pipeline.ts)
- [src/functions/crystallize.ts](src/functions/crystallize.ts)
- [src/functions/auto-forget.ts](src/functions/auto-forget.ts)
- [src/functions/access-tracker.ts](src/functions/access-tracker.ts)
- [src/types.ts](src/types.ts)
- [src/state/schema.ts](src/state/schema.ts)
</details>
# Memory Lifecycle — Write, Age, Decay, Forget
This page describes the full end-to-end journey of a memory in agentmemory: from its initial write through `mem::remember`, through the scoring pipeline that assigns it a hot/warm/cold retention tier, to its eventual eviction or crystallization into a durable long-term form. Understanding this lifecycle lets you predict which memories survive across sessions, why some are silently pruned, and how the system converts raw episodic records into compressed semantic or procedural knowledge.
The lifecycle is governed by two orthogonal timers: a wall-clock TTL set at write time (`forgetAfter`) and a continuously decaying retention score computed from salience, age, and access frequency. Either one can end a memory independently, but the consolidation pipeline can also rescue content by abstracting it into higher-fidelity forms — `SemanticMemory`, `ProceduralMemory`, and `Crystal` — that survive long after the original episodic record would have been pruned.
---
## 1. Birth — `mem::remember`
A memory is created by calling `mem::remember` (registered in `src/functions/remember.ts`). The function accepts `content`, an optional `type`, `concepts`, `files`, a `ttlDays` duration, and `sourceObservationIds`.
### Deduplication before creation
Before writing, `mem::remember` computes Jaccard similarity between the incoming content and every existing memory whose `isLatest === true`. If any existing memory exceeds the 0.7 similarity threshold, the new memory is treated as a supersession of that prior entry:
```ts
// src/functions/remember.ts:58-69
for (const existing of existingMemories) {
if (existing.isLatest === false) continue;
const similarity = jaccardSimilarity(
lowerContent,
existing.content.toLowerCase(),
);
if (similarity > 0.7) {
supersededId = existing.id;
...
break;
}
}
```
The superseded memory has `isLatest` set to `false` and is kept in the KV store (as a historical version), while the new record carries `version: supersededVersion + 1`, `parentId`, and `supersedes: [supersededId]`.
### Initial field values
The newly minted `Memory` object starts with `strength: 7` and the following type-weight mapping governs its salience at scoring time:
| `type` | Base salience |
|---------------|--------------|
| `architecture`| 0.9 |
| `preference` | 0.85 |
| `pattern` | 0.8 |
| `bug` | 0.7 |
| `workflow` | 0.6 |
| `fact` | 0.5 |
Sources: [src/functions/remember.ts:38-48](), [src/functions/retention.ts:100-119]()
### TTL stamp
If `ttlDays` is provided, a `forgetAfter` ISO timestamp is computed immediately:
```ts
// src/functions/remember.ts:92-94
memory.forgetAfter = new Date(Date.now() + data.ttlDays * 86400000).toISOString();
```
This timestamp is checked by both `mem::evict` and `mem::auto-forget` on every maintenance sweep.
### Index registration
After the KV write, the memory is synchronously registered with the BM25 full-text index (`getSearchIndex().add(...)`) and asynchronously registered with the vector index (`vectorIndexAddGuarded`). An indexing failure is caught and logged but does not roll back the KV write — a restart-time rebuild will pick up the memory either way.
Sources: [src/functions/remember.ts:100-121]()
---
## 2. Access Tracking
Every time a memory is retrieved, `recordAccess` (in `src/functions/access-tracker.ts`) is called to update the memory's `AccessLog`:
```ts
export interface AccessLog {
memoryId: string;
count: number; // total lifetime accesses
lastAt: string; // ISO timestamp of most recent access
recent: number[]; // up to 20 recent Unix-ms timestamps (RECENT_CAP = 20)
}
```
Access records are written under the `mem:access` KV namespace, keyed by `memoryId`. The write is guarded by a per-memory keyed mutex (`mem:access:<memoryId>`) to prevent concurrent update races. The `recent` array is capped at 20 entries — older entries are dropped from the front — preserving a sliding window of recent access timestamps for use in the reinforcement boost calculation.
Sources: [src/functions/access-tracker.ts:6-80]()
---
## 3. Retention Scoring — Hot, Warm, Cold, Evictable
`mem::retention-score` (in `src/functions/retention.ts`) computes a `[0, 1]` score for every `isLatest` episodic memory and every semantic memory. This is the central mechanism that determines a memory's tier.
### The decay formula
```
score = min(1, salience × exp(−λ × ΔT_days) + boost)
```
where:
- **salience** = base type weight + min(0.2, accessCount × 0.02)
- **ΔT_days** = age in days since `createdAt`
- **λ (lambda)** = decay rate, default `0.01` (slow decay — a memory halves in ~69 days with no accesses)
- **boost** = Σ(σ / daysSinceAccess) across recent access timestamps, where **σ (sigma)** = `0.3`
The reinforcement boost is large for recent accesses and drops off as access recency fades. Each access event injects `σ / daysSinceAccess` into the score — a memory accessed yesterday contributes `0.3`, one accessed 30 days ago contributes `0.01`.
### Default tier thresholds
| Tier | Score range |
|-------------|---------------------|
| **hot** | ≥ 0.7 |
| **warm** | ≥ 0.4 and < 0.7 |
| **cold** | ≥ 0.15 and < 0.4 |
| **evictable**| < 0.15 |
These thresholds are configurable via a `DecayConfig` object passed to the function. The defaults live in `retention.ts`:
```ts
// src/functions/retention.ts:19-27
const DEFAULT_DECAY: DecayConfig = {
lambda: 0.01,
sigma: 0.3,
tierThresholds: { hot: 0.7, warm: 0.4, cold: 0.15 },
};
```
### Batched writes
After scoring all memories, the results are flushed to the `mem:retention` KV namespace in a single parallel `Promise.all` batch to avoid O(n) sequential round-trips on stores with 1000+ memories.
Sources: [src/functions/retention.ts:80-288](), [src/types.ts:853-876]()
### State diagram of a memory's retention tier
```stateDiagram-v2
[*] --> hot : created (strength=7, young)
hot --> warm : age grows, fewer accesses
warm --> cold : further decay
cold --> evictable : score < 0.15
evictable --> [*] : mem::retention-evict
warm --> hot : frequent accesses (boost)
cold --> warm : re-accessed
hot --> [*] : forgetAfter TTL reached
warm --> [*] : forgetAfter TTL reached
cold --> [*] : forgetAfter TTL reached
```
---
## 4. Eviction Paths
There are two complementary eviction functions. They differ in scope and trigger.
### 4a. `mem::retention-evict` — score-based pruning
Reads all rows from `mem:retention`, filters those whose `score < threshold` (default: `cold` boundary, 0.15), sorts by ascending score (lowest first), and deletes up to `maxEvict` (default 50, capped at 1000). For each candidate, it:
1. Resolves the correct KV namespace (`mem:memories` for episodic, `mem:semantic` for semantic)
2. Deletes the memory record
3. Deletes the retention score row
4. Deletes the access log row via `deleteAccessLog`
Sources: [src/functions/retention.ts:291-407]()
### 4b. `mem::evict` — structural eviction
`mem::evict` (in `src/functions/evict.ts`) handles four distinct eviction scenarios on each sweep:
| Scenario | Condition | Action |
|---|---|---|
| **Stale session** | Session age > `staleSessionDays` (default 30d) and no summary | Attempt `event::session::stopped` recovery; then delete session |
| **Low-importance observations** | Observation age > `lowImportanceMaxDays` (default 90d) and `importance < 3` | Delete observation |
| **Observation cap** | Project exceeds `maxObservationsPerProject` (default 10,000) | Evict lowest-importance observations |
| **Expired memories (TTL)** | `mem.forgetAfter` is past | Delete memory + access log |
| **Old non-latest versions** | `isLatest === false` and age > 90d | Delete superseded version |
All evictions emit an audit record in `mem:audit`.
Sources: [src/functions/evict.ts:15-346]()
### 4c. `mem::auto-forget` — autonomous cleanup
`mem::auto-forget` (in `src/functions/auto-forget.ts`) runs three sub-passes:
1. **TTL pass**: Deletes any memory whose `forgetAfter` has elapsed (identical to `mem::evict`'s TTL pass).
2. **Contradiction pass**: Compares all `isLatest` memories that share concepts using token-level Jaccard similarity. If similarity exceeds 0.9, the older memory has `isLatest` set to `false` (soft-delete, not hard-delete). This avoids redundant or conflicting facts accumulating.
3. **Low-value observation pass**: Deletes observations older than 180 days with `importance ≤ 2`.
Sources: [src/functions/auto-forget.ts:39-186]()
---
## 5. The Consolidation Pipeline
The consolidation pipeline (`mem::consolidate-pipeline` in `src/functions/consolidation-pipeline.ts`) transforms raw episodic content into three long-lived derived types. It is guarded by the `CONSOLIDATION_ENABLED` config flag and runs as a triggered function (usually post-session or on a schedule).
The pipeline supports four named tiers, all run when `tier === "all"`:
### Tier: `semantic`
Requires at least 5 session summaries. Takes the 20 most recent `SessionSummary` records, prompts the provider with `SEMANTIC_MERGE_SYSTEM`, and parses `<fact confidence="0.9">...</fact>` XML tags from the response. Each extracted fact either updates an existing `SemanticMemory` (incrementing `accessCount`, raising `confidence`) or creates a new one with `strength = confidence`.
```ts
// src/functions/consolidation-pipeline.ts:86-118
while ((match = factRegex.exec(response)) !== null) {
const fact = match[2].trim();
const existing = existingSemantic.find(s => s.fact.toLowerCase() === fact.toLowerCase());
if (existing) {
existing.accessCount++;
existing.confidence = Math.max(existing.confidence, confidence);
await kv.set(KV.semantic, existing.id, existing);
} else {
const sem: SemanticMemory = { id: generateId("sem"), fact, confidence, strength: confidence, ... };
await kv.set(KV.semantic, sem.id, sem);
}
}
```
### Tier: `reflect`
Delegates to `mem::reflect`, which clusters recent memories and surfaces cross-session patterns.
### Tier: `procedural`
Requires at least 2 recurring `pattern`-type memories (those with `sessionIds.length >= 2`). Prompts the provider to extract `<procedure name="..." trigger="..."><step>...</step></procedure>` blocks. New procedures are stored in `mem:procedural`; existing ones have `frequency` incremented and `strength` raised by `+0.1`.
### Tier: `decay`
Applies strength decay to all semantic and procedural memories using `applyDecay`, which applies `strength × 0.9^n` per full decay period (default decay interval: `getConsolidationDecayDays()`):
```ts
// src/functions/consolidation-pipeline.ts:21-43
function applyDecay(items, decayDays) {
for (const item of items) {
const daysSince = (now - new Date(item.lastAccessedAt || item.updatedAt).getTime()) / 86400000;
if (daysSince > decayDays) {
const periods = Math.floor(daysSince / decayDays);
item.strength = Math.max(0.1, item.strength * Math.pow(0.9, periods));
}
}
}
```
Note: this decay operates on `strength` (a field on the record) rather than the separate `RetentionScore`. The minimum strength is clamped to `0.1` — semantic and procedural memories are never fully zeroed out by time alone; they require eviction via `mem::retention-evict` if their retention score falls below the threshold.
Sources: [src/functions/consolidation-pipeline.ts:1-270]()
---
## 6. Crystallization — Locking Completed Work Into Long-Term Patterns
Crystallization converts a completed chain of `Action` records into a compact `Crystal` that captures what was accomplished, key outcomes, affected files, and extracted lessons.
`mem::crystallize` (in `src/functions/crystallize.ts`) requires all referenced actions to have `status === "done"` or `"cancelled"`. It sends the action chain to the provider using `CRYSTALLIZE_SYSTEM` prompt and parses a JSON response into a `CrystalDigest`. The resulting `Crystal` is written to `mem:crystals`. Each lesson extracted from the digest is separately stored via `mem::lesson-save` at confidence `0.6`.
```ts
// src/functions/crystallize.ts:60-92
const crystal: Crystal = {
id: generateId("crys"),
narrative: digest.narrative,
keyOutcomes: digest.keyOutcomes,
filesAffected: digest.filesAffected,
lessons: digest.lessons,
sourceActionIds: data.actionIds,
...
};
await kv.set(KV.crystals, crystal.id, crystal);
// Propagate lessons
await Promise.all(digest.lessons.map(lesson => sdk.trigger({ function_id: "mem::lesson-save", ... })));
// Mark source actions as crystallized
for (const action of actions) {
await kv.set(KV.actions, action.id, { ...action, crystallizedInto: crystal.id });
}
```
`mem::auto-crystallize` scans all `done` actions older than `olderThanDays` (default 7) that have no `crystallizedInto` field, groups them by `parentId ?? project ?? "_ungrouped"`, and crystallizes each group in sequence.
A crystal is a terminal form — it has no retention score and is not subject to the normal decay pipeline. Its extracted `Lesson` records carry their own `decayRate` field and can be independently reinforced or decayed over time.
Sources: [src/functions/crystallize.ts:18-229](), [src/types.ts:727-754]()
---
## 7. Full Lifecycle Overview
```text
┌─────────────────────────────────────────────────────────────────┐
│ WRITE │
│ mem::remember → Memory{isLatest=true, strength=7, ttl?} │
│ ↓ Jaccard dedup → supersedes older version (isLatest=false) │
│ ↓ BM25 + vector index registration │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ ACCESS TRACKING │
│ recordAccess → AccessLog{count, lastAt, recent[20]} │
│ drives reinforcement boost in retention scoring │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ RETENTION SCORING (mem::retention-score) │
│ score = min(1, salience × e^(−λ·ΔT) + Σ(σ/daysSinceAccess)) │
│ Tiers: hot ≥0.7 | warm ≥0.4 | cold ≥0.15 | evictable <0.15 │
└──────────┬────────────────────────────┬───────────────────────-─┘
│ score < 0.15 │ score survives
▼ ▼
┌──────────────────┐ ┌──────────────────────────────────┐
│ EVICTION │ │ CONSOLIDATION PIPELINE │
│ retention-evict │ │ semantic: facts → SemanticMemory │
│ evict (TTL, │ │ procedural: patterns → Procedure │
│ stale session, │ │ reflect: cluster → insights │
│ cap, non-latest│ │ decay: strength × 0.9^n │
│ auto-forget │ └──────────────┬───────────────────┘
│ (TTL, contra- │ │
│ diction, low- │ ▼
│ value obs) │ ┌──────────────────────────────────┐
└──────────────────┘ │ CRYSTALLIZATION │
│ completed actions → Crystal │
│ + Lessons (confidence 0.6) │
└──────────────────────────────────┘
```
---
## 8. Configuration Reference
| Parameter | Default | Where set | Effect |
|---|---|---|---|
| `lambda` | `0.01` | `DecayConfig` | Exponential decay rate (per day) |
| `sigma` | `0.3` | `DecayConfig` | Reinforcement boost multiplier |
| `hot` threshold | `0.7` | `DecayConfig.tierThresholds` | Minimum score for hot tier |
| `warm` threshold | `0.4` | `DecayConfig.tierThresholds` | Minimum score for warm tier |
| `cold` threshold | `0.15` | `DecayConfig.tierThresholds` | Eviction boundary |
| `staleSessionDays` | `30` | `EvictionConfig` / `mem:config/eviction` | Session age before eviction |
| `lowImportanceMaxDays` | `90` | `EvictionConfig` | Max age for low-importance observations |
| `lowImportanceThreshold` | `3` | `EvictionConfig` | Importance cutoff for old observations |
| `maxObservationsPerProject` | `10,000` | `EvictionConfig` | Per-project observation cap |
| `CONSOLIDATION_ENABLED` | (unset) | env var | Gates the full consolidation pipeline |
| `olderThanDays` | `7` | `mem::auto-crystallize` | Age before auto-crystallization |
| Contradiction threshold | `0.9` | `auto-forget.ts` constant | Jaccard similarity for near-duplicate detection |
---
## 9. Failure Modes
| Failure | Behavior |
|---|---|
| BM25 index write fails on `mem::remember` | Logged as warning; KV write already succeeded; memory is invisible to BM25 search until restart-time rebuild |
| `mem::retention-score` not run before `mem::retention-evict` | Evict has no scores to work with; `allScores` returns empty; nothing is evicted |
| Pre-0.8.10 retention row has no `source` field | Evict probes both `KV.memories` and `KV.semantic` to find the record; adds one extra KV round-trip per candidate |
| Stale session recovery (`event::session::stopped`) fails | Session is skipped; no eviction; logged as warning |
| Consolidation pipeline disabled (`CONSOLIDATION_ENABLED` unset) | All tiers return `{ skipped: true }` immediately |
| `mem::crystallize` called on non-`done` action | Returns error immediately; no partial write |
Sources: [src/functions/remember.ts:106-114](), [src/functions/retention.ts:326-365](), [src/functions/evict.ts:56-80]()
---
The memory lifecycle in agentmemory is thus a continuous balance: every access event injects a reinforcement boost that slows decay; the absence of access lets the exponential curve push a memory toward the evictable tier; and the consolidation pipeline provides an escape hatch by which frequently-recurring patterns survive indefinitely as semantic facts, procedural workflows, or crystallized action digests regardless of their episodic retention score. Sources: [src/functions/retention.ts:80-94](), [src/functions/consolidation-pipeline.ts:21-43](), [src/functions/crystallize.ts:60-92]()
---
## 03. State Layer & Hybrid Search
> All persistent data lives in the iii-engine KV store (StateKV wrapping state::get/set/list triggers); reads go through HybridSearch which fuses BM25 (weight 0.4), vector cosine (weight 0.6), and graph traversal (weight 0.3) via Reciprocal Rank Fusion, with an optional reranker pass on top.
- Page Markdown: https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/03-state-layer-hybrid-search.md
- Generated: 2026-05-21T07:09:30.705Z
### Source Files
- `src/state/kv.ts`
- `src/state/hybrid-search.ts`
- `src/state/search-index.ts`
- `src/state/vector-index.ts`
- `src/state/reranker.ts`
- `src/state/index-persistence.ts`
- `src/functions/graph.ts`
- `src/functions/graph-retrieval.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/state/kv.ts](src/state/kv.ts)
- [src/state/hybrid-search.ts](src/state/hybrid-search.ts)
- [src/state/search-index.ts](src/state/search-index.ts)
- [src/state/vector-index.ts](src/state/vector-index.ts)
- [src/state/reranker.ts](src/state/reranker.ts)
- [src/state/index-persistence.ts](src/state/index-persistence.ts)
- [src/state/schema.ts](src/state/schema.ts)
- [src/functions/graph.ts](src/functions/graph.ts)
- [src/functions/graph-retrieval.ts](src/functions/graph-retrieval.ts)
</details>
# State Layer & Hybrid Search
The state layer is the persistence and retrieval backbone of agentmemory. All agent observations, memories, graph nodes and edges, and index snapshots flow through a single abstraction—`StateKV`—which delegates to the iii-engine KV store via typed trigger calls. On top of this foundation sits `HybridSearch`, a retrieval engine that fuses three independent ranking signals (BM25 keyword search, dense vector cosine similarity, and knowledge-graph traversal) using Reciprocal Rank Fusion (RRF), with an optional cross-encoder reranking pass as a final stage.
Understanding this layer is essential for predicting how memory recall behaves under partial failures (e.g., no embedding provider, no graph entities), how indexes survive process restarts, and what invariants must hold for search quality to remain stable across model or provider swaps.
---
## StateKV: The Persistence Interface
`StateKV` is a thin, typed wrapper around the iii-engine SDK. Every read and write goes through `sdk.trigger()`, routing to named iii-engine functions: `state::get`, `state::set`, `state::update`, `state::delete`, and `state::list`. There is no local in-process cache—the engine is the authoritative store.
```typescript
// src/state/kv.ts
export class StateKV {
async get<T = unknown>(scope: string, key: string): Promise<T | null> {
return this.sdk.trigger<{ scope: string; key: string }, T | null>({
function_id: 'state::get',
payload: { scope, key },
})
}
async set<T = unknown>(scope: string, key: string, value: T): Promise<T> {
return this.sdk.trigger<{ scope: string; key: string; value: T }, T>({
function_id: 'state::set',
payload: { scope, key, value },
})
}
// ...list, delete, update
}
```
The `scope` parameter maps to a namespace key defined in `KV` constants (e.g., `"mem:obs:<sessionId>"`, `"mem:graph:nodes"`, `"mem:index:bm25"`), making every stored record addressable by its logical category.
Sources: [src/state/kv.ts:1-47]()
### KV Namespace Map
All scopes are declared as constants in `src/state/schema.ts`. Key namespaces:
| KV Key | Content |
|---|---|
| `mem:obs:<sessionId>` | `CompressedObservation` records per session |
| `mem:memories` | Saved long-term `Memory` records |
| `mem:index:bm25` | Serialized BM25 index (`"data"`) + vector index (`"vectors"`) |
| `mem:graph:nodes` | `GraphNode` entities |
| `mem:graph:edges` | `GraphEdge` relationships |
| `mem:sessions` | Session metadata |
| `mem:summaries` | Session summaries |
| `mem:relations` | Explicit entity relations |
Sources: [src/state/schema.ts:1-50]()
---
## Index Persistence
The BM25 and vector indexes are in-memory structures that are serialized and stored back to KV on a debounced schedule. `IndexPersistence` manages this lifecycle.
```typescript
// src/state/index-persistence.ts
const DEBOUNCE_MS = 5000;
export class IndexPersistence {
scheduleSave(): void {
// Debounces saves; resets timer on each call
this.timer = setTimeout(() => {
this.save().catch((err) => this.logFailure(err));
}, DEBOUNCE_MS);
}
async save(): Promise<void> {
await this.kv.set(KV.bm25Index, "data", this.bm25.serialize());
if (this.vector && this.vector.size > 0) {
await this.kv.set(KV.bm25Index, "vectors", this.vector.serialize());
}
}
}
```
Both `SearchIndex` and `VectorIndex` implement `serialize()` / `static deserialize()` methods. `SearchIndex` serializes to JSON (v2 schema: entries, inverted index, per-doc term counts, total doc length). `VectorIndex` encodes each `Float32Array` embedding as a base64 string.
A key invariant: on restore, `VectorIndex.validateDimensions()` checks that all stored embeddings match the current provider's expected dimension. If any mismatch is found—possible when a provider swap occurs mid-session—the index is rejected and rebuilt from scratch rather than producing corrupted results. Failures during `state::set` are throttled to one log line per 60 seconds to avoid noise under iii-engine queue pressure.
Sources: [src/state/index-persistence.ts:1-95](), [src/state/vector-index.ts:77-90]()
---
## BM25 Search Index
`SearchIndex` implements BM25+ scoring entirely in-process. It maintains:
- `entries`: map from `obsId` → `{ obsId, sessionId, termCount }`
- `invertedIndex`: term → set of `obsId`s
- `docTermCounts`: per-doc term frequency map
- `totalDocLength`: needed for average document length
The BM25 parameters are fixed at `k1 = 1.2` and `b = 0.75`—standard Okapi BM25 defaults.
### Tokenization Pipeline
Before indexing or querying, text goes through:
1. Unicode-aware cleaning: strip non-letter/non-digit characters
2. CJK detection: if the token contains CJK characters, it is split via `segmentCjk()` (bigram + unigram fallback); otherwise it is passed through the Porter-family `stem()` function
3. Synonym expansion at query time: synonyms from a bundled table receive a weight factor of 0.7 (vs. 1.0 for exact terms)
4. Prefix matching: a sorted term array supports binary-search prefix expansion during search, scored at half-IDF weight
Text is extracted from `CompressedObservation` fields: `title`, `subtitle`, `narrative`, `facts[]`, `concepts[]`, `files[]`, and `type`.
Sources: [src/state/search-index.ts:19-137](), [src/state/search-index.ts:212-258]()
---
## Vector Index
`VectorIndex` holds dense embeddings in a `Map<obsId, { embedding: Float32Array; sessionId }>`. Search runs a brute-force scan using cosine similarity:
```
cosineSimilarity(a, b) = dot(a, b) / (||a|| · ||b||)
```
A partial min-heap maintains the top-`limit` results without sorting the full list. Embeddings are only computed when an `EmbeddingProvider` is configured; if the provider is absent or throws, the vector stream is silently skipped and `HybridSearch` falls back to BM25 only.
Sources: [src/state/vector-index.ts:9-63]()
---
## HybridSearch: Three-Stream RRF Fusion
`HybridSearch` coordinates all three retrieval signals in `tripleStreamSearch()`. The nominal weights at construction time are:
| Stream | Default weight |
|---|---|
| BM25 (`bm25Weight`) | 0.4 |
| Vector cosine (`vectorWeight`) | 0.6 |
| Graph traversal (`graphWeight`) | 0.3 |
These are **re-normalized** at query time if any stream returns no results, so the effective weights always sum to 1.0:
```typescript
// src/state/hybrid-search.ts:197-206
const totalW = effectiveBm25W + effectiveVectorW + effectiveGraphW;
if (totalW > 0) {
effectiveBm25W /= totalW;
effectiveVectorW /= totalW;
effectiveGraphW /= totalW;
}
```
This means: in a text-only deployment with no embedding provider and no matched graph entities, BM25 receives the full weight of 1.0.
### RRF Score Formula
Each candidate observation is scored using Reciprocal Rank Fusion with `RRF_K = 60`:
```
combinedScore =
w_bm25 * 1/(60 + bm25Rank)
+ w_vector * 1/(60 + vectorRank)
+ w_graph * 1/(60 + graphRank)
```
Observations absent from a stream receive `rank = Infinity`, which collapses `1/(60 + ∞)` to zero—a clean zero-contribution without special casing.
Sources: [src/state/hybrid-search.ts:20-20](), [src/state/hybrid-search.ts:194-219]()
### Search Flow
```text
query
│
├─► BM25 SearchIndex.search() (always)
├─► EmbeddingProvider.embed()
│ └─► VectorIndex.search() (if provider + index non-empty)
└─► extractEntitiesFromQuery()
└─► GraphRetrieval.searchByEntities() (if entities found)
└─► expandFromChunks() (from top-5 vector hits)
│
├─► Merge into per-obsId rank map
├─► Re-normalize weights
├─► Compute RRF combinedScore
├─► diversifyBySession() (max 3 results per session)
├─► enrichResults() (fetch full Observation from KV)
└─► rerank() (optional, env-gated)
└─► Xenova/ms-marco-MiniLM-L-6-v2
```
Sources: [src/state/hybrid-search.ts:77-240]()
### Query Expansion
`searchWithExpansion()` accepts a `QueryExpansion` object containing `reformulations`, `temporalConcretizations`, and `entityExtractions`. It runs `tripleStreamSearch` in parallel for every expanded query string via `Promise.all`, then keeps only the highest-scoring result per `obsId` across all result sets before re-ranking.
Sources: [src/state/hybrid-search.ts:42-75]()
---
## Graph Retrieval
The graph component gives `HybridSearch` a third signal grounded in extracted entity relationships. `GraphRetrieval` operates over two KV scopes: `mem:graph:nodes` and `mem:graph:edges`.
### Entity Extraction and Matching
Entity names are extracted from the query string by `extractEntitiesFromQuery()` (from `src/functions/query-expansion.ts`). Node matching is case-insensitive substring overlap in both directions (query entity contains node name, or node name contains query entity).
### Graph Traversal: Dijkstra over Edge Weights
Traversal replaced a prior BFS with a Dijkstra implementation using `cost = 1 / max(edge.weight, 0.01)`. This prioritizes higher-weight edges (stronger relationships) and runs in O((V + E) log V) using an inline binary min-heap. Traversal is bounded by `maxDepth` (default 2 hops).
Score for a path of length `L` with edge weights `w₁…wₙ`:
```
score = avgWeight(w₁, …, wₙ) * (1 / pathLength)
```
Direct matches on the start node score 1.0. Stale nodes and edges (`.stale === true`) are pre-filtered before traversal.
```typescript
// src/functions/graph-retrieval.ts:82-88
const avgWeight = edgeWeights.length > 0
? edgeWeights.reduce((a, b) => a + b, 0) / edgeWeights.length
: 0.5;
const score = avgWeight * (1 / pathLength);
```
### Graph Expansion from Vector Hits
After standard entity search, the top-5 vector results are used as seeds: `expandFromChunks()` looks up which graph nodes cite those observations, then traverses 1 hop from each. This bridges the vector and graph signals—a semantically close result can pull in structurally related observations even if the query contained no named entities.
Sources: [src/functions/graph-retrieval.ts:44-156]()
---
## Session Diversification
Before enrichment, `diversifyBySession()` caps results to 3 observations per session ID. If fewer than the requested limit are selected in the first pass, remaining observations are admitted without the cap. This prevents a single prolific session from dominating every recall result.
Sources: [src/state/hybrid-search.ts:242-276]()
---
## Reranker (Optional)
When `RERANK_ENABLED=true`, `rerank()` applies a cross-encoder pass over the top 20 RRF candidates using `Xenova/ms-marco-MiniLM-L-6-v2` (quantized, loaded lazily via `@xenova/transformers`). Input pairs are formed as `"<query> [SEP] <title> <narrative>"` truncated to 512 characters. The reranker reassigns `combinedScore` to the cross-encoder logit; the tail beyond the reranker window is appended unmodified.
If the model fails to load (missing optional dependency, incompatible runtime), `pipelineUnavailable` is set to `true` and all subsequent calls return results unchanged. This makes the reranker fully optional with no impact on the core retrieval path.
Sources: [src/state/reranker.ts:1-74]()
---
## Failure Modes and Invariants
| Condition | Behavior |
|---|---|
| No embedding provider | Vector stream skipped; BM25 weight re-normalized to 1.0 |
| No graph entities extracted | Graph stream skipped; weights re-normalized across BM25 + vector |
| Vector embed throws | `try/catch` swallows error; falls through to BM25-only |
| Graph search throws | `try/catch` swallows; graph weight set to 0 |
| Reranker model missing | `pipelineUnavailable = true`; RRF order preserved |
| KV `state::set` timeout | Logged (throttled to 1/min); index remains in memory, retried on next debounce |
| Wrong vector dimension on restore | `validateDimensions()` rejects index; rebuilt from scratch |
| Observation in index but deleted from KV | `enrichResults()` `.catch(() => null)` drops it silently |
The design ensures every path degrades gracefully: a fully offline or embedding-free deployment still retrieves results via BM25 alone, and graph or reranker failures never propagate exceptions to the caller.
Sources: [src/state/hybrid-search.ts:91-98](), [src/state/hybrid-search.ts:105-115](), [src/state/index-persistence.ts:77-94](), [src/state/vector-index.ts:77-90]()
---
## 04. LLM & Embedding Providers — BYOK Design
> agentmemory is fully provider-neutral: LLM calls (summarize, compress, graph-extract) route through a resilient fallback chain across Anthropic, OpenAI, OpenRouter, MiniMax, or a noop stub; embeddings (text and image/CLIP) are independently switchable across OpenAI, Cohere, Gemini, Voyage, or a local Xenova model — the system degrades to BM25-only if no embedding key is set.
- Page Markdown: https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/04-llm-embedding-providers-byok-design.md
- Generated: 2026-05-21T07:09:04.983Z
### Source Files
- `src/providers/index.ts`
- `src/providers/fallback-chain.ts`
- `src/providers/circuit-breaker.ts`
- `src/providers/resilient.ts`
- `src/providers/embedding/index.ts`
- `src/providers/embedding/local.ts`
- `src/providers/noop.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/providers/index.ts](src/providers/index.ts)
- [src/providers/fallback-chain.ts](src/providers/fallback-chain.ts)
- [src/providers/circuit-breaker.ts](src/providers/circuit-breaker.ts)
- [src/providers/resilient.ts](src/providers/resilient.ts)
- [src/providers/noop.ts](src/providers/noop.ts)
- [src/providers/anthropic.ts](src/providers/anthropic.ts)
- [src/providers/embedding/index.ts](src/providers/embedding/index.ts)
- [src/providers/embedding/local.ts](src/providers/embedding/local.ts)
- [src/providers/embedding/voyage.ts](src/providers/embedding/voyage.ts)
- [src/providers/embedding/clip.ts](src/providers/embedding/clip.ts)
- [src/config.ts](src/config.ts)
- [src/types.ts](src/types.ts)
</details>
# LLM & Embedding Providers — BYOK Design
agentmemory ships zero hard dependencies on any particular model vendor. Both the LLM subsystem (compression, summarization, graph extraction) and the embedding subsystem (text vectors, image/CLIP vectors) are independently switchable purely via environment variables. You bring your own key (BYOK) for whichever provider you prefer, and the system degrades gracefully rather than failing hard when no key is present.
This page explains how provider selection, the resilient fallback chain, the circuit breaker, and the embedding dimension guard all fit together — and what breaks if you misconfigure them.
---
## LLM Provider System
### Provider Types and Auto-Detection
The system recognises six LLM provider types, declared in the `ProviderType` union:
```typescript
// src/types.ts:146
export type ProviderType =
"agent-sdk" | "anthropic" | "gemini" | "openrouter" | "minimax" | "openai" | "noop";
```
On startup, `detectProvider()` in `src/config.ts` reads environment variables in priority order and returns the first matching provider config. The detection order is:
| Priority | Provider | Key(s) Required | Default Model |
|----------|----------|-----------------|---------------|
| 1 | `openai` | `OPENAI_API_KEY` (and `OPENAI_API_KEY_FOR_LLM != "false"`) | `gpt-4o-mini` |
| 2 | `minimax` | `MINIMAX_API_KEY` | `MiniMax-M2.7` |
| 3 | `anthropic` | `ANTHROPIC_API_KEY` | `claude-sonnet-4-20250514` |
| 4 | `gemini` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | `gemini-2.5-flash` |
| 5 | `openrouter` | `OPENROUTER_API_KEY` | `anthropic/claude-sonnet-4-20250514` |
| 6 | `noop` | _(no key found)_ | — |
| 7 | `agent-sdk` | `AGENTMEMORY_ALLOW_AGENT_SDK=true` (opt-in only) | `claude-sonnet-4-20250514` |
Sources: [src/config.ts:50-132]()
The env file at `~/.agentmemory/.env` is merged with `process.env`, so keys can be set either globally or per-project.
### The `MemoryProvider` Interface
Every LLM provider implements a small interface:
```typescript
// src/types.ts:148-153
export interface MemoryProvider {
name: string;
compress(systemPrompt: string, userPrompt: string): Promise<string>;
summarize(systemPrompt: string, userPrompt: string): Promise<string>;
describeImage?(imageData: string, mimeType: string, prompt: string): Promise<string>;
}
```
`compress` and `summarize` are the two entry points for LLM-backed memory operations. `describeImage` is an optional extension only the Anthropic provider currently implements. Sources: [src/providers/anthropic.ts:16-42]()
### Factory Functions
`src/providers/index.ts` exports two factory functions:
- **`createProvider(config)`** — wraps a single base provider in a `ResilientProvider` (circuit breaker only).
- **`createFallbackProvider(config, fallbackConfig)`** — constructs an ordered list of base providers from `FALLBACK_PROVIDERS`, then wraps the list in `FallbackChainProvider`, and finally wraps that in `ResilientProvider`.
```typescript
// src/providers/index.ts:32-58
export function createFallbackProvider(
config: ProviderConfig,
fallbackConfig: FallbackConfig,
): ResilientProvider {
if (fallbackConfig.providers.length === 0) {
return createProvider(config);
}
const providers: MemoryProvider[] = [createBaseProvider(config)];
for (const providerType of fallbackConfig.providers) {
if (providerType === config.provider) continue;
try {
providers.push(createBaseProvider({ ...config, provider: providerType }));
} catch {
// skip unavailable fallback providers
}
}
if (providers.length > 1) {
return new ResilientProvider(new FallbackChainProvider(providers));
}
return new ResilientProvider(providers[0]);
}
```
Fallback providers whose keys are absent are silently skipped at construction time (the `try/catch` at line 43–51), so a partially-configured `FALLBACK_PROVIDERS` list does not crash startup.
### Gemini as OpenRouter-Compatible
Gemini is not wired through its own SDK. Instead, `createBaseProvider` instantiates an `OpenRouterProvider` pointed at Google's OpenAI-compatible endpoint:
```typescript
// src/providers/index.ts:76-89
case "gemini": {
return new OpenRouterProvider(
geminiKey,
config.model,
config.maxTokens,
"https://generativelanguage.googleapis.com/v1beta/openai/chat/completions",
);
}
```
This means `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` overrides give you a path to point any provider at a local proxy or compatible endpoint. Sources: [src/providers/index.ts:61-118]()
---
## Resilience Layer
### Architecture Overview
```text
caller (compress / summarize)
│
▼
┌─────────────────────────────┐
│ ResilientProvider │ ← circuit breaker gate
│ ┌───────────────────────┐ │
│ │ FallbackChainProvider │ │ ← ordered provider list (when FALLBACK_PROVIDERS set)
│ │ Provider A │ │
│ │ Provider B │ │
│ │ Provider C … │ │
│ └───────────────────────┘ │
│ ─ or ─ │
│ ┌───────────────────────┐ │
│ │ Single base provider │ │
│ └───────────────────────┘ │
└─────────────────────────────┘
```
### FallbackChainProvider
`FallbackChainProvider` tries each provider in order and returns the first success. If all throw, it re-throws the last error:
```typescript
// src/providers/fallback-chain.ts:18-30
private async tryAll(fn: (p: MemoryProvider) => Promise<string>): Promise<string> {
let lastError: Error | null = null;
for (const provider of this.providers) {
try {
return await fn(provider);
} catch (err) {
lastError = err instanceof Error ? err : new Error(String(err));
}
}
throw lastError || new Error("No providers available");
}
```
The chain's `name` is a human-readable trace like `fallback(anthropic -> openrouter -> noop)`. Sources: [src/providers/fallback-chain.ts:1-31]()
### CircuitBreaker
`CircuitBreaker` prevents hammering a failing provider. Its state machine has three states:
```
closed ──(failures >= threshold)──► open
▲ │
│ (success in half-open) │ (recoveryTimeoutMs elapsed)
└──────── half-open ◄──────────────┘
```
Default parameters (all overridable via constructor options):
| Parameter | Default | Meaning |
|-----------|---------|---------|
| `failureThreshold` | 3 | Failures within the window to open |
| `failureWindowMs` | 60 000 ms | Window over which failures are counted |
| `recoveryTimeoutMs` | 30 000 ms | How long to stay open before trying half-open |
When the breaker is open, `ResilientProvider.call()` throws `"circuit_breaker_open"` immediately without touching the network. Sources: [src/providers/circuit-breaker.ts:1-82](), [src/providers/resilient.ts:12-24]()
### NoopProvider — Safe Zero-Key Default
When no LLM key is found and `AGENTMEMORY_ALLOW_AGENT_SDK` is not `true`, `detectProvider` returns `provider: "noop"`. The `NoopProvider` always returns empty strings:
```typescript
// src/providers/noop.ts:10-20
export class NoopProvider implements MemoryProvider {
name = "noop";
async compress(): Promise<string> { return ""; }
async summarize(): Promise<string> { return ""; }
}
```
Callers that receive an empty string are expected to short-circuit rather than store the empty result. This prevents the Stop-hook recursion loop documented in issue #149. Sources: [src/providers/noop.ts:1-20]()
---
## Embedding Provider System
Text embeddings and image/CLIP embeddings are **independently** configured. Neither is required; the system falls back to BM25-only search if no embedding key is set.
### Text Embedding — Auto-Detection
`detectEmbeddingProvider()` in `src/config.ts` checks keys in this order, with `EMBEDDING_PROVIDER` as a manual override:
| Priority | Provider | Key Required | Dimensions |
|----------|----------|--------------|------------|
| 0 (override) | _any_ | `EMBEDDING_PROVIDER=<name>` | varies |
| 1 | `gemini` | `GEMINI_API_KEY` | — |
| 2 | `openai` | `OPENAI_API_KEY` | — |
| 3 | `voyage` | `VOYAGE_API_KEY` | 1024 |
| 4 | `cohere` | `COHERE_API_KEY` | — |
| 5 | `openrouter` | `OPENROUTER_API_KEY` | — |
| 6 | `local` | _(forced via `EMBEDDING_PROVIDER=local`)_ | 384 |
| — | _(none)_ | no key found | BM25 only |
`createEmbeddingProvider()` returns `null` when no provider is detected, which signals the search layer to skip vector indexing entirely. Sources: [src/config.ts:197-210](), [src/providers/embedding/index.ts:30-50]()
### Local Xenova Provider (No Key Required)
Setting `EMBEDDING_PROVIDER=local` activates `LocalEmbeddingProvider`, which uses `@xenova/transformers` (an optional peer dependency) to run `Xenova/all-MiniLM-L6-v2` in-process. Vectors are 384-dimensional:
```typescript
// src/providers/embedding/local.ts:13-51
export class LocalEmbeddingProvider implements EmbeddingProvider {
readonly name = "local";
readonly dimensions = 384;
// ... lazy-loads @xenova/transformers on first call
private async getExtractor() {
transformers = await import("@xenova/transformers");
this.extractor = await transformers.pipeline(
"feature-extraction", "Xenova/all-MiniLM-L6-v2"
);
}
}
```
No API key, no network call, no billing. The model file is downloaded and cached on first use. Sources: [src/providers/embedding/local.ts:13-52]()
### Image/CLIP Embeddings
Image embeddings are a separate opt-in path enabled by `AGENTMEMORY_IMAGE_EMBEDDINGS=true`. The `ClipEmbeddingProvider` runs `Xenova/clip-vit-base-patch32` (512-dimensional) locally via the same `@xenova/transformers` peer dependency:
```typescript
// src/providers/embedding/clip.ts:24-54
export class ClipEmbeddingProvider implements EmbeddingProvider {
readonly name = "clip";
readonly dimensions = 512;
async embedImage(src: string): Promise<Float32Array> {
// accepts "data:<mime>;base64,..." or a file path
const image = await loadImage(t, src);
const output = await extractor(image);
return normalize(output.data ?? new Float32Array(output.tolist()[0] || []));
}
}
```
Text and image embeddings use the same CLIP model but different pipeline tasks (`feature-extraction` vs `image-feature-extraction`), guaranteeing shared embedding space for cross-modal retrieval. Sources: [src/providers/embedding/clip.ts:22-81](), [src/providers/embedding/index.ts:23-28]()
### Dimension Guard — Preventing Silent Corruption
Every embedding provider is wrapped with `withDimensionGuard()` before being returned from `createEmbeddingProvider`. This guard catches dimension mismatches at the boundary rather than letting wrong-length vectors silently corrupt the vector index (which returns `0` from cosine similarity on a length mismatch instead of throwing):
```typescript
// src/providers/embedding/index.ts:56-80
export function withDimensionGuard(provider: EmbeddingProvider): EmbeddingProvider {
const expected = provider.dimensions;
const check = (v: Float32Array, where: string): Float32Array => {
if (v.length !== expected) {
throw new Error(
`Embedding dimension mismatch in ${provider.name}.${where}: ` +
`expected ${expected}, got ${v.length}`
);
}
return v;
};
// prototype chain preserved so instanceof checks keep working
const wrapped = Object.create(provider) as EmbeddingProvider;
wrapped.embed = async (t) => check(await provider.embed(t), "embed");
wrapped.embedBatch = async (ts) => { ... };
...
}
```
Sources: [src/providers/embedding/index.ts:52-80]()
---
## Configuration Reference
All keys can be placed in `~/.agentmemory/.env` (loaded before `process.env`, then merged with process env):
### LLM Provider Variables
| Variable | Purpose |
|----------|---------|
| `ANTHROPIC_API_KEY` | Enable Anthropic provider; supports `ANTHROPIC_BASE_URL` override |
| `OPENAI_API_KEY` | Enable OpenAI provider; supports `OPENAI_BASE_URL`, `OPENAI_MODEL` |
| `OPENAI_API_KEY_FOR_LLM` | Set to `"false"` to reserve the key for embeddings only |
| `GEMINI_API_KEY` / `GOOGLE_API_KEY` | Enable Gemini via OpenAI-compatible endpoint |
| `OPENROUTER_API_KEY` | Enable OpenRouter; supports `OPENROUTER_MODEL` |
| `MINIMAX_API_KEY` | Enable MiniMax (raw-fetch, Anthropic-compatible API) |
| `AGENTMEMORY_ALLOW_AGENT_SDK` | Set `"true"` to permit the `agent-sdk` fallback (risk: recursion loop) |
| `FALLBACK_PROVIDERS` | Comma-separated list, e.g. `anthropic,openrouter` |
| `MAX_TOKENS` | Token budget for LLM calls (default: 4096) |
| `AGENTMEMORY_AUTO_COMPRESS` | Set `"true"` to enable per-observation LLM compression (off by default) |
### Embedding Provider Variables
| Variable | Purpose |
|----------|---------|
| `EMBEDDING_PROVIDER` | Force a specific provider (`gemini`, `openai`, `voyage`, `cohere`, `openrouter`, `local`) |
| `VOYAGE_API_KEY` | Enable Voyage AI (model: `voyage-code-3`, 1024-dim) |
| `COHERE_API_KEY` | Enable Cohere embeddings |
| `AGENTMEMORY_IMAGE_EMBEDDINGS` | Set `"true"` to enable CLIP image embedding (requires `@xenova/transformers`) |
| `BM25_WEIGHT` | Hybrid search BM25 weight (default: 0.4) |
| `VECTOR_WEIGHT` | Hybrid search vector weight (default: 0.6) |
---
## Failure Modes and Invariants
| Scenario | Behavior |
|----------|---------|
| No LLM key set | `NoopProvider` returns `""` — no compression/summarization, no crash |
| `AGENTMEMORY_ALLOW_AGENT_SDK=true` without a real key | `agent-sdk` spawns Claude child sessions — risks infinite Stop-hook recursion (#149) |
| Fallback provider key absent at startup | Provider silently dropped from chain; no runtime error |
| All providers in the chain fail | `FallbackChainProvider` re-throws the last error; `CircuitBreaker` records the failure |
| CircuitBreaker open | Calls fail fast with `"circuit_breaker_open"` — no network requests until `recoveryTimeoutMs` |
| No embedding key and `EMBEDDING_PROVIDER` unset | `createEmbeddingProvider` returns `null`; search uses BM25 only |
| Wrong embedding dimensions returned by provider | `withDimensionGuard` throws immediately — bad vector is never written to the index |
| `EMBEDDING_PROVIDER=local` | `@xenova/transformers` is a peer dep; install it manually or get a clear error |
---
## Summary
agentmemory's provider system is a two-layer BYOK design: the LLM layer (auto-detected from keys in priority order, wrapped in a FallbackChain and a CircuitBreaker) and the embedding layer (independently auto-detected, with a local Xenova fallback that needs no key). The `NoopProvider` and a `null` embedding result are the safe degraded states — the system continues to function for capture and BM25 search even with no API keys configured at all. The `withDimensionGuard` wrapper in `src/providers/embedding/index.ts:56-80` is the critical invariant that keeps the vector index from accumulating silent corruption when a provider's output changes shape.
---
## 05. Hooks & MCP — How Agents Connect
> Agents connect via two surfaces: (1) MCP server exposing 53 tools over stdio or HTTP transport, and (2) shell hooks (prompt-submit, post-tool-use, session-start/end, pre-compact, stop) that fire as thin HTTP POSTs to the local REST API at :3111 — the hooks are agent-installed scripts that bridge the agent runtime event stream into the memory server without requiring code changes inside the agent.
- Page Markdown: https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/05-hooks-mcp-how-agents-connect.md
- Generated: 2026-05-21T07:09:22.567Z
### Source Files
- `src/hooks/prompt-submit.ts`
- `src/hooks/post-tool-use.ts`
- `src/hooks/session-start.ts`
- `src/hooks/session-end.ts`
- `src/hooks/pre-compact.ts`
- `src/mcp/server.ts`
- `src/mcp/tools-registry.ts`
- `src/cli/connect/claude-code.ts`
- `src/triggers/api.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/hooks/prompt-submit.ts](src/hooks/prompt-submit.ts)
- [src/hooks/post-tool-use.ts](src/hooks/post-tool-use.ts)
- [src/hooks/session-start.ts](src/hooks/session-start.ts)
- [src/hooks/session-end.ts](src/hooks/session-end.ts)
- [src/hooks/pre-compact.ts](src/hooks/pre-compact.ts)
- [src/hooks/pre-tool-use.ts](src/hooks/pre-tool-use.ts)
- [src/hooks/stop.ts](src/hooks/stop.ts)
- [src/hooks/sdk-guard.ts](src/hooks/sdk-guard.ts)
- [src/hooks/subagent-start.ts](src/hooks/subagent-start.ts)
- [src/mcp/server.ts](src/mcp/server.ts)
- [src/mcp/tools-registry.ts](src/mcp/tools-registry.ts)
- [src/cli/connect/claude-code.ts](src/cli/connect/claude-code.ts)
</details>
# Hooks & MCP — How Agents Connect
Agentmemory provides two distinct integration surfaces for connecting an agent runtime to the memory server. The first is a **Model Context Protocol (MCP) server** that exposes a tool and resource catalog over HTTP transport, allowing an agent to directly call memory operations as first-class tools during its reasoning loop. The second is a **set of shell hook scripts** — thin Node.js processes installed alongside the agent — that fire as fire-and-forget HTTP POSTs to the local REST API on port 3111 whenever the agent runtime emits lifecycle events such as prompt submission, tool execution, session start, and session end.
These two surfaces serve different roles and are designed to coexist without coupling. Hooks operate passively in the background, bridging the agent's event stream into the memory server without any code change inside the agent or its prompts. MCP gives the agent explicit read/write access to memory during inference, so it can query past context or persist decisions at its own discretion. Understanding both is necessary to predict what memory is recorded, when it is available, and what would break if either surface were disabled.
---
## The Hook Layer
### How Hooks Work
Every hook is a standalone compiled `.mjs` script installed into the agent runtime's plugin directory. When the agent runtime fires a lifecycle event, it spawns the hook as a child process, writes a JSON object to the child's `stdin`, and moves on. The hook reads `stdin`, parses the JSON, and sends a single HTTP POST to the memory server — then exits. All hooks are fire-and-forget by design: timeouts are capped tightly, and all `catch` blocks are empty.
```
Agent Runtime → hook process (stdin JSON) → POST :3111/agentmemory/*
```
Each hook shares the same three constants:
```typescript
// src/hooks/prompt-submit.ts:9-10
const REST_URL = process.env["AGENTMEMORY_URL"] || "http://localhost:3111";
const SECRET = process.env["AGENTMEMORY_SECRET"] || "";
```
Authentication is bearer-token based when `AGENTMEMORY_SECRET` is set; otherwise the requests are unauthenticated. The server validates these tokens with a timing-safe compare to prevent timing-oracle attacks.
### SDK Recursion Guard
Every hook checks a recursion guard before doing anything else. When agentmemory uses its own Claude Agent SDK provider to run LLM calls (e.g., during session summarization), the spawned child session inherits the parent's hook environment. Without a guard, the child's `Stop` hook would call `/agentmemory/summarize`, which would spawn another SDK session, and so on indefinitely.
Two signals break the loop:
```typescript
// src/hooks/sdk-guard.ts:20-26
export function isSdkChildContext(payload: unknown): boolean {
if (process.env.AGENTMEMORY_SDK_CHILD === "1") return true;
if (!payload || typeof payload !== "object") return false;
const p = payload as Record<string, unknown>;
if (p["entrypoint"] === "sdk-ts") return true;
return false;
}
```
The `AGENTMEMORY_SDK_CHILD=1` env var is set by the SDK provider before spawning; `payload.entrypoint === "sdk-ts"` is written by Claude Code into hook stdin when the session originates from the Agent SDK. If either condition is true, the hook exits without making any HTTP call.
### Hook Inventory
| Hook file | Fires on | REST endpoint | Notes |
|---|---|---|---|
| `session-start.ts` | Session begins | `POST /agentmemory/session/start` | Registers session; optionally writes project context to stdout |
| `prompt-submit.ts` | User submits a prompt | `POST /agentmemory/observe` (`hookType: "prompt_submit"`) | Records prompt text + cwd |
| `pre-tool-use.ts` | Before each tool call | `POST /agentmemory/enrich` | **No-op by default**; opt-in via `AGENTMEMORY_INJECT_CONTEXT=true` |
| `post-tool-use.ts` | After each tool call | `POST /agentmemory/observe` (`hookType: "post_tool_use"`) | Records tool name, input, truncated output; extracts base64 images |
| `stop.ts` | Agent stops responding | `POST /agentmemory/summarize` | 120s timeout; skipped in SDK child contexts |
| `pre-compact.ts` | Before context compaction | `POST /agentmemory/context` (budget 1500) | Writes context block to stdout for Claude Code to prepend |
| `session-end.ts` | Session terminates | `POST /agentmemory/session/end` | Optionally triggers consolidation pipeline and Claude bridge sync |
| `subagent-start.ts` | Subagent spawns | `POST /agentmemory/observe` (`hookType: "subagent_start"`) | 800ms timeout cap |
Sources: [src/hooks/session-start.ts:18-88](), [src/hooks/prompt-submit.ts:36-51](), [src/hooks/pre-tool-use.ts:9-23](), [src/hooks/post-tool-use.ts:40-59](), [src/hooks/stop.ts:43-51](), [src/hooks/pre-compact.ts:49-65](), [src/hooks/session-end.ts:36-76](), [src/hooks/subagent-start.ts:43-58]()
### Session Start: Two Paths
`session-start.ts` has two distinct execution paths depending on `AGENTMEMORY_INJECT_CONTEXT`:
**Default path** (`INJECT_CONTEXT=false`): calls `session/start` fire-and-forget with an 800ms timeout. The response is never read. This is pure telemetry.
**Inject path** (`INJECT_CONTEXT=true`): awaits the response and writes `result.context` to `stdout`. Claude Code reads PreToolUse and session-start stdout and prepends it to the next model turn, so the project context becomes the agent's first piece of context.
```typescript
// src/hooks/session-start.ts:62-71
if (!INJECT_CONTEXT) {
fetch(url, {
...init,
signal: AbortSignal.timeout(REGISTER_TIMEOUT_MS), // 800ms
}).catch(() => {});
return;
}
```
The comment in the code explains why injection is off by default: on Claude Pro, this silently added ~1000 tokens per tool-touch and burned entire allocations in a few messages (issue #143).
### Pre-Tool-Use: Default No-Op
The `pre-tool-use.ts` hook exits immediately unless `AGENTMEMORY_INJECT_CONTEXT=true`:
```typescript
// src/hooks/pre-tool-use.ts:36-38
if (!INJECT_CONTEXT) return;
```
When enabled, it fires only on file-touching tools (`Edit`, `Write`, `Read`, `Glob`, `Grep`), calls `/agentmemory/enrich` with the file paths and search terms, and writes any returned context to stdout — which Claude Code injects before the tool runs.
### Post-Tool-Use: Image Extraction
`post-tool-use.ts` includes base64 image detection and extraction logic. If the tool output is or contains a base64-encoded PNG/JPEG (detected by data URI prefix or magic bytes), it extracts the image separately and substitutes `"[image data extracted]"` in the text payload to avoid bloating the observation record:
```typescript
// src/hooks/post-tool-use.ts:62-68
function isBase64Image(val: unknown): val is string {
return typeof val === "string" && (
val.startsWith("data:image/") ||
val.startsWith("iVBORw0KGgo") || // PNG magic
val.startsWith("/9j/") // JPEG magic
);
}
```
Tool output is also truncated to 8000 characters before being sent.
### Session End: Optional Pipeline Triggers
When `CONSOLIDATION_ENABLED=true`, `session-end.ts` makes two additional calls after closing the session: one to `/agentmemory/crystals/auto` (auto-crystallization) and one to `/agentmemory/consolidate-pipeline`. When `CLAUDE_MEMORY_BRIDGE=true`, it additionally calls `/agentmemory/claude-bridge/sync` to write memories back to Claude Code's native `MEMORY.md` file.
Sources: [src/hooks/session-end.ts:46-76]()
---
## The MCP Layer
### Transport and Registration
The MCP server is registered as an HTTP transport over the same local REST API at `:3111`. Three endpoint groups form the MCP protocol surface:
| Endpoint | Method | Function ID | Purpose |
|---|---|---|---|
| `/agentmemory/mcp/tools` | GET | `mcp::tools::list` | Enumerate available tools |
| `/agentmemory/mcp/call` | POST | `mcp::tools::call` | Invoke a tool by name |
| `/agentmemory/mcp/resources` | GET | `mcp::resources::list` | List resources |
| `/agentmemory/mcp/resources/read` | POST | `mcp::resources::read` | Read a resource by URI |
| `/agentmemory/mcp/prompts` | GET | `mcp::prompts::list` | List prompt templates |
| `/agentmemory/mcp/prompts/get` | POST | `mcp::prompts::get` | Render a prompt template |
Sources: [src/mcp/server.ts:60-71](), [src/mcp/server.ts:1260-1264](), [src/mcp/server.ts:1313-1317](), [src/mcp/server.ts:1545-1552]()
Each endpoint is registered as an `sdk.registerFunction` + `sdk.registerTrigger` pair using the `iii-sdk` runtime. The tool call dispatcher is a large `switch` statement that maps tool names to internal `sdk.trigger()` calls — essentially proxying MCP tool calls into the internal function bus.
### Tool Visibility: `AGENTMEMORY_TOOLS`
The tools exposed to the agent are controlled by `getVisibleTools()`:
```typescript
// src/mcp/tools-registry.ts:944-948
export function getVisibleTools(): McpToolDef[] {
const mode = process.env["AGENTMEMORY_TOOLS"] || "core";
if (mode === "all") return getAllTools();
return getAllTools().filter((t) => ESSENTIAL_TOOLS.has(t.name));
}
```
In `core` mode (default), only 8 tools are exposed: `memory_save`, `memory_recall`, `memory_consolidate`, `memory_smart_search`, `memory_sessions`, `memory_diagnose`, `memory_lesson_save`, and `memory_reflect`. Setting `AGENTMEMORY_TOOLS=all` exposes all tools across every version tier.
Sources: [src/mcp/tools-registry.ts:920-948]()
### Tool Tiers
Tools are organized into version tiers that reflect the feature they belong to:
| Tier constant | Representative tools | Feature area |
|---|---|---|
| `CORE_TOOLS` | `memory_recall`, `memory_save`, `memory_file_history`, `memory_smart_search`, `memory_commit_lookup` | Core memory + commit tracing |
| `V040_TOOLS` | `memory_graph_query`, `memory_consolidate`, `memory_team_share`, `memory_audit`, `memory_governance_delete` | Knowledge graph, team, governance |
| `V050_TOOLS` | `memory_action_create/update`, `memory_frontier`, `memory_next`, `memory_lease`, `memory_signal_send/read` | Action graph, multi-agent coordination |
| `V051_TOOLS` | `memory_sentinel_create/trigger`, `memory_sketch_create/promote`, `memory_crystallize`, `memory_diagnose/heal` | Event-driven sentinels, ephemeral graphs |
| `V061_TOOLS` | `memory_verify` | Memory provenance |
| `V070_TOOLS` | `memory_lesson_save/recall`, `memory_obsidian_export` | Lessons, Obsidian export |
| `V073_TOOLS` | `memory_reflect`, `memory_insight_list` | Reflection, synthesized insights |
| `V010_SLOTS_TOOLS` | `memory_slot_list/get/create/append/replace/delete` | Persistent editable slots |
Sources: [src/mcp/tools-registry.ts:11-941]()
### MCP Resources
Six URI-addressable resources are available for read-only inspection by agents that support MCP resource access:
| URI | Content |
|---|---|
| `agentmemory://status` | Session count, memory count, health status |
| `agentmemory://project/{name}/profile` | Top concepts, file patterns, conventions |
| `agentmemory://project/{name}/recent` | Last 5 session summaries |
| `agentmemory://memories/latest` | Top 10 latest memories by type and strength |
| `agentmemory://graph/stats` | Node and edge counts by type |
| `agentmemory://team/{id}/profile` | Team shared item count |
Sources: [src/mcp/server.ts:1266-1304]()
### MCP Prompt Templates
Three prompt templates are registered for agents that support the MCP prompts capability:
- `recall_context` — searches observations + memories for a task description and returns a pre-formatted context block
- `session_handoff` — generates a handoff summary from a session ID for continuing work in a new session
- `detect_patterns` — runs pattern detection across sessions and formats the result
Sources: [src/mcp/server.ts:1554-1590]()
### Authentication
Both the hooks and MCP endpoints share the same bearer-token authentication model. When `AGENTMEMORY_SECRET` is configured, every request must include `Authorization: Bearer <secret>`. The server uses `timingSafeCompare` (constant-time string comparison) to prevent timing attacks. MCP endpoints that are called without a valid token receive a `401 { error: "unauthorized" }` response. Hook scripts read the secret from `AGENTMEMORY_SECRET` and include it in every request.
Sources: [src/mcp/server.ts:47-58](), [src/hooks/prompt-submit.ts:12-16]()
---
## Connection to Claude Code
The `agentmemory connect claude-code` CLI command writes the MCP server entry into `~/.claude.json` under `mcpServers.agentmemory`:
```typescript
// src/cli/connect/claude-code.ts:44-90
servers["agentmemory"] = AGENTMEMORY_MCP_BLOCK;
next.mcpServers = servers;
writeJsonAtomic(CLAUDE_JSON, next);
```
The adapter detects an existing installation (`entryMatches`) to avoid duplicate entries, backs up the existing file before modifying it, and verifies the write succeeded by re-reading the file. On first install it also creates the `~/.claude/` directory if it does not exist.
The install note from the adapter clarifies the dual-surface design:
```typescript
// src/cli/connect/claude-code.ts:38-39
protocolNote:
"→ Using MCP. Hooks are also available — see docs/claude-code.md.",
```
Sources: [src/cli/connect/claude-code.ts:33-91]()
---
## Data Flow Overview
```text
Claude Code (agent runtime)
│
├── [session-start event] ──────────────────────→ POST :3111/agentmemory/session/start
│ └── if INJECT_CONTEXT: writes project context to stdout (prepended to first turn)
│
├── [user submits prompt] ──────────────────────→ POST :3111/agentmemory/observe
│ hookType: "prompt_submit"
│
├── [before tool call] ─────────────────────────→ POST :3111/agentmemory/enrich (only if INJECT_CONTEXT=true)
│ writes file history to stdout for model context injection
│
├── [after tool call] ──────────────────────────→ POST :3111/agentmemory/observe
│ hookType: "post_tool_use" (tool name + input + truncated output)
│
├── [before context compaction] ────────────────→ POST :3111/agentmemory/context
│ writes condensed memory context to stdout for Claude Code to include
│
├── [agent stops] ──────────────────────────────→ POST :3111/agentmemory/summarize
│ triggers session summarization (120s timeout)
│
├── [session ends] ─────────────────────────────→ POST :3111/agentmemory/session/end
│ → (optional) POST :3111/agentmemory/crystals/auto
│ → (optional) POST :3111/agentmemory/consolidate-pipeline
│ → (optional) POST :3111/agentmemory/claude-bridge/sync
│
└── [MCP tool call by agent] ───────────────────→ GET/POST :3111/agentmemory/mcp/*
Agent invokes memory_recall, memory_save, etc. as explicit tool calls
```
---
## Key Invariants and Failure Modes
**Hooks never block the agent.** Every hook uses `AbortSignal.timeout()` with short caps (800ms–3000ms for telemetry paths, 30–120s for session-end and stop). A slow or unreachable server causes hooks to silently fail, not to stall the agent.
**Hooks are fire-and-forget except where stdout is read.** `session-start` and `pre-compact` write to stdout, which Claude Code reads synchronously. These two hooks add latency to the agent's hot path if the server is slow; the 1500ms and 5000ms caps respectively bound that exposure.
**Pre-tool-use injection is a token consumption footgun.** When enabled via `AGENTMEMORY_INJECT_CONTEXT=true`, the hook fires on every file-touching tool call and prepends up to 4000 chars of context to each tool turn. On subscription-capped plans this has materially burned allocations (issue #143). The default is off.
**SDK recursion is a hard invariant.** If `isSdkChildContext()` is not checked and a hook re-enters the SDK provider, the result is unbounded token consumption and ghost session accumulation. All hooks check this guard unconditionally as their first operation. Sources: [src/hooks/sdk-guard.ts:1-26]()
**MCP tool visibility is all-or-nothing by tier.** `AGENTMEMORY_TOOLS=all` exposes all 53+ tools across every version tier; the default exposes only 8 essential tools. There is no per-tool selection. Agents that enumerate the tool list will see a very different surface depending on this setting.
**Tool calls that depend on optional features degrade gracefully.** Tools like `memory_graph_query`, `memory_consolidate`, `memory_team_share`, and `memory_mesh_sync` catch internal errors and return a `200` response with a descriptive error string (e.g., `"Knowledge graph not enabled. Set GRAPH_EXTRACTION_ENABLED=true"`) rather than a `5xx`. The MCP protocol sees a success, and the agent sees a human-readable explanation. Sources: [src/mcp/server.ts:449-461](), [src/mcp/server.ts:477-489]()
---
## Summary
The hook layer and MCP layer are architecturally independent: hooks run passively in the background, require no agent-side code, and tolerate server unavailability silently; MCP requires the agent to be MCP-aware and gives it active, synchronous control over memory operations. Together they ensure that even agents with no explicit memory calls accumulate observation history through hooks, while agents that understand the MCP protocol can query and shape their memory in real time. The shared `:3111` REST API and bearer-token authentication model mean both surfaces can be secured and deployed identically regardless of which agent runtime is in use.
Sources: [src/mcp/server.ts:42-65](), [src/hooks/session-start.ts:62-88]()
---
## 06. Invariants, Failure Modes & Safe-Change Rules
> The core invariants: state lives exclusively in iii-engine (no local SQLite or Postgres required); the worker suppresses unhandledRejection to survive iii SDK 30s timeouts under write bursts; BM25 index is always present so search never fully fails; circuit-breakers isolate provider outages; and the sdk-guard hook prevents recursive hook invocations. Safe-change rules: embedding dimension changes require index migration; adding a new function requires both registerXxx in index.ts and a tools-registry entry; provider fallback order is config-driven, not hardcoded.
- Page Markdown: https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/pages/06-invariants-failure-modes-safe-change-rules.md
- Generated: 2026-05-21T07:09:23.551Z
### Source Files
- `src/index.ts`
- `src/providers/circuit-breaker.ts`
- `src/hooks/sdk-guard.ts`
- `src/functions/migrate-vector-index.ts`
- `src/health/monitor.ts`
- `src/health/thresholds.ts`
- `src/mcp/tools-registry.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/index.ts](src/index.ts)
- [src/providers/circuit-breaker.ts](src/providers/circuit-breaker.ts)
- [src/providers/resilient.ts](src/providers/resilient.ts)
- [src/providers/fallback-chain.ts](src/providers/fallback-chain.ts)
- [src/providers/index.ts](src/providers/index.ts)
- [src/hooks/sdk-guard.ts](src/hooks/sdk-guard.ts)
- [src/functions/migrate-vector-index.ts](src/functions/migrate-vector-index.ts)
- [src/functions/search.ts](src/functions/search.ts)
- [src/state/vector-index.ts](src/state/vector-index.ts)
- [src/health/monitor.ts](src/health/monitor.ts)
- [src/health/thresholds.ts](src/health/thresholds.ts)
- [src/mcp/tools-registry.ts](src/mcp/tools-registry.ts)
</details>
# Invariants, Failure Modes & Safe-Change Rules
This page documents the structural invariants that agentmemory relies on to stay correct, the failure modes each invariant guards against, and the rules that safe contributors must follow when modifying the system. Understanding these constraints lets you predict what will break (and why) before touching any subsystem.
The design philosophy is *graceful degradation*: a crashed embedding provider must not stop BM25 search; a slow iii-engine write must not kill the worker process; a recursive hook invocation must not burn tokens in an infinite loop. Each of the invariants below is the concrete mechanism that enforces one of those goals.
---
## Core Invariants
### 1. State lives exclusively in iii-engine (StateKV)
All persistent data — observations, memories, sessions, actions, leases, signals — is stored through `StateKV`, which delegates every read and write to the iii-engine. There is no local SQLite, no local Postgres, and no file-level persistence beyond the optional in-process BM25/vector snapshot written through `IndexPersistence` (itself backed by `kv.set`).
This means:
- The worker is stateless beyond what iii-engine holds. A process restart replays nothing locally; state survives because iii-engine holds it.
- The optional `IndexPersistence` is a read-through cache of the hot search index, not a source of truth. Losing it forces a `rebuildIndex` pass but does not lose any memories.
Sources: [src/index.ts:188-213]()
---
### 2. `unhandledRejection` is suppressed to survive iii SDK 30s timeouts
Under sustained write load (for example, Claude Code hooks firing across many projects), `state::set` can occasionally exceed the SDK's 30-second internal timeout and produce a rejected Promise that no call-site `.catch()` catches. Without a process-level handler, Node.js would terminate the long-lived worker.
The worker installs a throttled global handler at startup:
```typescript
// src/index.ts:119-129
let lastUnhandledLogAt = 0;
process.on("unhandledRejection", (reason) => {
const now = Date.now();
if (now - lastUnhandledLogAt < 60_000) return;
lastUnhandledLogAt = now;
const r = reason as { code?: string; function_id?: string; message?: string };
console.warn(
`[agentmemory] unhandledRejection (suppressed):`,
r?.code ? `${r.code} ${r.function_id ?? ""} ${r.message ?? ""}`.trim() : reason,
);
});
```
The handler **logs once per minute at most** (throttle prevents log storms on write bursts) and then **continues**. The relevant `.catch()` at the individual call site already surfaced the error; the global handler is only the last safety net.
**Failure mode if removed:** A single timeout under load would `process.exit(1)`, destroying the long-lived worker and requiring a manual restart.
Sources: [src/index.ts:112-129]()
---
### 3. BM25 index is always present — search never fully fails
`getSearchIndex()` in `src/functions/search.ts` uses lazy initialization with a module-level singleton and never returns `null`:
```typescript
// src/functions/search.ts:16-19
let index: SearchIndex | null = null
export function getSearchIndex(): SearchIndex {
if (!index) index = new SearchIndex()
return index
}
```
The vector index, by contrast, is `null` when no embedding provider is configured. The worker logs `BM25-only mode` at boot. `HybridSearch` is constructed with both; it short-circuits vector scoring when the vector index is absent. This means:
- BM25 keyword search always works, even with zero providers configured.
- Semantic (cosine) search is layered on top when an embedding provider is present.
- A provider outage at runtime degrades to BM25 without a crash.
Sources: [src/functions/search.ts:12-19](), [src/index.ts:193-195, 324-334]()
---
### 4. Circuit-breakers isolate provider outages
Every LLM provider call is wrapped in a `ResilientProvider`, which owns a `CircuitBreaker` instance. The breaker has three states:
```
closed → (≥3 failures within 60s) → open → (30s recovery) → half-open → (1 success) → closed
↓ failure
open
```
```typescript
// src/providers/circuit-breaker.ts:23-30
constructor(opts?: CircuitBreakerOptions) {
this.failureThreshold = Math.max(1, Math.floor(positiveFinite(opts?.failureThreshold, 3)));
this.failureWindowMs = positiveFinite(opts?.failureWindowMs, 60_000);
this.recoveryTimeoutMs = positiveFinite(opts?.recoveryTimeoutMs, 30_000);
}
```
When the circuit is open, `ResilientProvider.call()` throws `"circuit_breaker_open"` immediately without making a network call. Functions that call compress/summarize (`registerCompressFunction`, `registerSummarizeFunction`, etc.) propagate this error rather than hanging on a downed provider.
When multiple fallback providers are configured (via `AGENTMEMORY_FALLBACK_PROVIDERS`), `FallbackChainProvider` tries each one in order. `ResilientProvider` wraps the entire chain, so the circuit-breaker trips only after the whole chain has exhausted.
Sources: [src/providers/circuit-breaker.ts:13-82](), [src/providers/resilient.ts:4-37](), [src/providers/fallback-chain.ts:4-31]()
---
### 5. The sdk-guard hook prevents recursive hook invocations
When agentmemory spawns a Claude session internally (e.g., via the agent-sdk provider for compress/summarize), the child Claude Code session inherits all parent hook scripts. If a child session's hooks fire and call back into `/agentmemory/*`, the result is unbounded recursion that burns tokens and creates ghost sessions.
Two signals identify a SDK-child context, and hook scripts must test for both:
```typescript
// src/hooks/sdk-guard.ts:20-26
export function isSdkChildContext(payload: unknown): boolean {
if (process.env.AGENTMEMORY_SDK_CHILD === "1") return true;
if (!payload || typeof payload !== "object") return false;
const p = payload as Record<string, unknown>;
if (p["entrypoint"] === "sdk-ts") return true;
return false;
}
```
Signal 1 — `AGENTMEMORY_SDK_CHILD=1`: set by the agent-sdk provider before spawning `query()`, inherited by all child processes.
Signal 2 — `payload.entrypoint === "sdk-ts"`: written by Claude Code into hook stdin when the session was launched by the Agent SDK.
**Any hook script must call `isSdkChildContext(payload)` before doing any work and return silently when it is true.**
Sources: [src/hooks/sdk-guard.ts:1-26]()
---
## Failure Modes
### Embedding dimension mismatch corrupts search silently
`cosineSimilarity` in `VectorIndex` returns `0` when the two arrays have different lengths:
```typescript
// src/state/vector-index.ts:9-11
function cosineSimilarity(a: Float32Array, b: Float32Array): number {
if (a.length !== b.length) return 0;
...
}
```
A mismatch between a stored vector's dimension and the query vector's dimension causes that observation to score zero on every query — it silently disappears from results without an error. The system guards against this at two sites:
1. **Write site** (`vectorIndexAddGuarded`): validates `embedding.length !== ep.dimensions` before calling `vi.add()`. Logs a warning and skips the item.
2. **Persistence load** (`src/index.ts:368-409`): `VectorIndex.validateDimensions()` walks every persisted vector and refuses to restore the index if any mismatches are found. The worker either throws a fatal error (forcing operator action) or discards the stale index when `AGENTMEMORY_DROP_STALE_INDEX=true`.
Sources: [src/state/vector-index.ts:9-11, 77-90](), [src/index.ts:362-409](), [src/functions/search.ts:55-87]()
### Index rebuild blocks the viewer server if awaited
`rebuildIndex` iterates every observation across every session and awaits an embedding provider call per record. On a large corpus with a rate-limited endpoint this can take hours. The worker fires it as a **fire-and-forget** void:
```typescript
// src/index.ts:423-431
void rebuildIndex(kv)
.then((indexCount) => {
if (indexCount > 0) {
bootLog(`Search index rebuilt: ${indexCount} entries`);
indexPersistence.scheduleSave();
}
})
.catch((err) => {
console.warn(`[agentmemory] Failed to rebuild search index:`, err);
});
```
If this were awaited, the viewer server would remain unbound for the rebuild duration. Search degrades (partial coverage) but the viewer starts immediately.
Sources: [src/index.ts:412-431]()
### Health monitor thresholds
`evaluateHealth` in `src/health/thresholds.ts` classifies the worker into three states: `healthy`, `degraded`, or `critical`. Default thresholds:
| Metric | warn | critical |
|---|---|---|
| Event-loop lag | 100 ms | 500 ms |
| CPU usage | 80% | 90% |
| Heap usage | 80% | 95% |
| Engine connection | reconnecting | disconnected / failed |
KV connectivity is actively probed each cycle via a `set`+`get` round-trip with a 5-second timeout. A `kv_probe_failed` alert is raised if either the write or read times out.
Sources: [src/health/thresholds.ts:13-21, 33-80](), [src/health/monitor.ts:48-64]()
---
## Safe-Change Rules
### Embedding dimension changes require index migration
Switching to an embedding provider that declares a different `dimensions` value makes the existing persisted index incompatible. The system refuses to load cross-dimension indexes at startup. Safe procedure:
1. Run `mem::migrate-vector-index` (backed by `migrateVectorIndex` in `src/functions/migrate-vector-index.ts`). It re-embeds every memory and every session's observations against the new provider in a fresh `VectorIndex`, with per-session isolation so one bad session does not abort the rest.
2. Inspect `MigrateVectorIndexResult.failed` and `failedSessions` before swapping the live index.
3. Only then switch the `AGENTMEMORY_EMBEDDING_PROVIDER` env var.
Do **not** set `AGENTMEMORY_DROP_STALE_INDEX=true` on a production install unless you accept losing all vector search history. The flag is meant for development resets.
Sources: [src/functions/migrate-vector-index.ts:44-152](), [src/index.ts:362-409]()
### Adding a new function requires two registrations
Every new capability must be registered in two places:
1. **`src/index.ts`**: call `registerXxxFunction(sdk, kv, ...)` inside `main()`. Without this, the iii-engine never knows the function exists.
2. **`src/mcp/tools-registry.ts`**: add a `McpToolDef` entry to the appropriate version array (`CORE_TOOLS`, `V050_TOOLS`, etc.) and include it in `getAllTools()`. Without this, the MCP surface (used by agents and the `npx @agentmemory/mcp` adapter) does not expose the tool.
The two registrations are deliberately separate: an internal function can exist without an MCP surface, but an MCP tool with no backing function will fail silently or throw at invocation time.
Sources: [src/index.ts:204-306](), [src/mcp/tools-registry.ts:931-948]()
### Provider fallback order is config-driven, not hardcoded
`createFallbackProvider` in `src/providers/index.ts` reads the primary provider from config and appends fallbacks from `loadFallbackConfig()` (driven by `AGENTMEMORY_FALLBACK_PROVIDERS`). The chain order is:
```
primary provider → fallback[0] → fallback[1] → ...
```
`FallbackChainProvider.tryAll()` tries each in sequence; the first success wins. Do not hardcode a fallback inside any individual provider implementation — add it to the config-driven chain instead. This keeps the fallback topology observable and operator-controlled without code changes.
Sources: [src/providers/index.ts:32-59](), [src/providers/fallback-chain.ts:18-30]()
### New hook scripts must guard against SDK-child recursion
Any new Claude Code hook script that calls an agentmemory endpoint must import `isSdkChildContext` from `src/hooks/sdk-guard.ts` and return without action when it returns `true`. Omitting this guard causes the hook to fire inside agent-sdk child sessions, potentially triggering the exact endpoint that spawned the child session and creating an unbounded invocation loop.
Sources: [src/hooks/sdk-guard.ts:1-26]()
---
## Boundary Summary
```text
┌────────────────────────────────────────────────────────────┐
│ agentmemory worker │
│ │
│ ┌──────────────┐ ┌────────────────────────────────┐ │
│ │ BM25 index │ │ VectorIndex (optional) │ │
│ │ (always) │ │ null when no embedder set │ │
│ └──────┬───────┘ └───────────────┬────────────────┘ │
│ │ │ guarded write │
│ └────────────┬───────────────┘ (dim check) │
│ ▼ │
│ HybridSearch (BM25 + vector + graph) │
│ │ │
│ ┌───────────────────▼────────────────────────────────┐ │
│ │ StateKV → iii-engine │ │
│ │ (source of truth; no local DB) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ResilientProvider (CircuitBreaker) │ │
│ │ └─ FallbackChainProvider (config-driven order) │ │
│ │ └─ anthropic | openai | openrouter | ... │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ process.on("unhandledRejection") ← suppresses SDK │
│ 30s timeout rejections (log-throttled, never rethrow) │
└────────────────────────────────────────────────────────────┘
↑ hook invocations
│ isSdkChildContext() guard prevents recursion
└─ sdk-guard.ts
```
---
## Summary
The five invariants — iii-engine as the sole state store, suppressed `unhandledRejection` for timeout survival, always-present BM25 for search availability, circuit-breakers for provider isolation, and the sdk-guard hook for recursion prevention — collectively ensure that no single component failure terminates the worker or corrupts search. Contributors must respect the dimension-migration rule when changing embedding providers, the dual-registration rule when adding functions, and the config-driven fallback rule when modifying provider topology. The system is intentionally designed so that the degraded path (BM25-only, no LLM provider) still provides useful recall.
Sources: [src/index.ts:112-129, 193-195, 362-409](), [src/hooks/sdk-guard.ts:1-26]()
---