# LLM & Embedding Providers — BYOK Design

> agentmemory is fully provider-neutral: LLM calls (summarize, compress, graph-extract) route through a resilient fallback chain across Anthropic, OpenAI, OpenRouter, MiniMax, or a noop stub; embeddings (text and image/CLIP) are independently switchable across OpenAI, Cohere, Gemini, Voyage, or a local Xenova model — the system degrades to BM25-only if no embedding key is set.

- Repository: rohitg00/agentmemory
- GitHub: https://github.com/rohitg00/agentmemory
- Human wiki: https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc
- Complete Markdown: https://grok-wiki.com/public/wiki/rohitg00-agentmemory-94f173bce1dc/llms-full.txt

## Source Files

- `src/providers/index.ts`
- `src/providers/fallback-chain.ts`
- `src/providers/circuit-breaker.ts`
- `src/providers/resilient.ts`
- `src/providers/embedding/index.ts`
- `src/providers/embedding/local.ts`
- `src/providers/noop.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [src/providers/index.ts](src/providers/index.ts)
- [src/providers/fallback-chain.ts](src/providers/fallback-chain.ts)
- [src/providers/circuit-breaker.ts](src/providers/circuit-breaker.ts)
- [src/providers/resilient.ts](src/providers/resilient.ts)
- [src/providers/noop.ts](src/providers/noop.ts)
- [src/providers/anthropic.ts](src/providers/anthropic.ts)
- [src/providers/embedding/index.ts](src/providers/embedding/index.ts)
- [src/providers/embedding/local.ts](src/providers/embedding/local.ts)
- [src/providers/embedding/voyage.ts](src/providers/embedding/voyage.ts)
- [src/providers/embedding/clip.ts](src/providers/embedding/clip.ts)
- [src/config.ts](src/config.ts)
- [src/types.ts](src/types.ts)
</details>

# LLM & Embedding Providers — BYOK Design

agentmemory ships zero hard dependencies on any particular model vendor. Both the LLM subsystem (compression, summarization, graph extraction) and the embedding subsystem (text vectors, image/CLIP vectors) are independently switchable purely via environment variables. You bring your own key (BYOK) for whichever provider you prefer, and the system degrades gracefully rather than failing hard when no key is present.

This page explains how provider selection, the resilient fallback chain, the circuit breaker, and the embedding dimension guard all fit together — and what breaks if you misconfigure them.

---

## LLM Provider System

### Provider Types and Auto-Detection

The system recognises six LLM provider types, declared in the `ProviderType` union:

```typescript
// src/types.ts:146
export type ProviderType =
  "agent-sdk" | "anthropic" | "gemini" | "openrouter" | "minimax" | "openai" | "noop";
```

On startup, `detectProvider()` in `src/config.ts` reads environment variables in priority order and returns the first matching provider config. The detection order is:

| Priority | Provider | Key(s) Required | Default Model |
|----------|----------|-----------------|---------------|
| 1 | `openai` | `OPENAI_API_KEY` (and `OPENAI_API_KEY_FOR_LLM != "false"`) | `gpt-4o-mini` |
| 2 | `minimax` | `MINIMAX_API_KEY` | `MiniMax-M2.7` |
| 3 | `anthropic` | `ANTHROPIC_API_KEY` | `claude-sonnet-4-20250514` |
| 4 | `gemini` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | `gemini-2.5-flash` |
| 5 | `openrouter` | `OPENROUTER_API_KEY` | `anthropic/claude-sonnet-4-20250514` |
| 6 | `noop` | _(no key found)_ | — |
| 7 | `agent-sdk` | `AGENTMEMORY_ALLOW_AGENT_SDK=true` (opt-in only) | `claude-sonnet-4-20250514` |

Sources: [src/config.ts:50-132]()

The env file at `~/.agentmemory/.env` is merged with `process.env`, so keys can be set either globally or per-project.

### The `MemoryProvider` Interface

Every LLM provider implements a small interface:

```typescript
// src/types.ts:148-153
export interface MemoryProvider {
  name: string;
  compress(systemPrompt: string, userPrompt: string): Promise<string>;
  summarize(systemPrompt: string, userPrompt: string): Promise<string>;
  describeImage?(imageData: string, mimeType: string, prompt: string): Promise<string>;
}
```

`compress` and `summarize` are the two entry points for LLM-backed memory operations. `describeImage` is an optional extension only the Anthropic provider currently implements. Sources: [src/providers/anthropic.ts:16-42]()

### Factory Functions

`src/providers/index.ts` exports two factory functions:

- **`createProvider(config)`** — wraps a single base provider in a `ResilientProvider` (circuit breaker only).
- **`createFallbackProvider(config, fallbackConfig)`** — constructs an ordered list of base providers from `FALLBACK_PROVIDERS`, then wraps the list in `FallbackChainProvider`, and finally wraps that in `ResilientProvider`.

```typescript
// src/providers/index.ts:32-58
export function createFallbackProvider(
  config: ProviderConfig,
  fallbackConfig: FallbackConfig,
): ResilientProvider {
  if (fallbackConfig.providers.length === 0) {
    return createProvider(config);
  }
  const providers: MemoryProvider[] = [createBaseProvider(config)];
  for (const providerType of fallbackConfig.providers) {
    if (providerType === config.provider) continue;
    try {
      providers.push(createBaseProvider({ ...config, provider: providerType }));
    } catch {
      // skip unavailable fallback providers
    }
  }
  if (providers.length > 1) {
    return new ResilientProvider(new FallbackChainProvider(providers));
  }
  return new ResilientProvider(providers[0]);
}
```

Fallback providers whose keys are absent are silently skipped at construction time (the `try/catch` at line 43–51), so a partially-configured `FALLBACK_PROVIDERS` list does not crash startup.

### Gemini as OpenRouter-Compatible

Gemini is not wired through its own SDK. Instead, `createBaseProvider` instantiates an `OpenRouterProvider` pointed at Google's OpenAI-compatible endpoint:

```typescript
// src/providers/index.ts:76-89
case "gemini": {
  return new OpenRouterProvider(
    geminiKey,
    config.model,
    config.maxTokens,
    "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions",
  );
}
```

This means `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` overrides give you a path to point any provider at a local proxy or compatible endpoint. Sources: [src/providers/index.ts:61-118]()

---

## Resilience Layer

### Architecture Overview

```text
 caller (compress / summarize)
        │
        ▼
 ┌─────────────────────────────┐
 │  ResilientProvider          │  ← circuit breaker gate
 │  ┌───────────────────────┐  │
 │  │ FallbackChainProvider │  │  ← ordered provider list (when FALLBACK_PROVIDERS set)
 │  │  Provider A           │  │
 │  │  Provider B           │  │
 │  │  Provider C …         │  │
 │  └───────────────────────┘  │
 │   ─ or ─                    │
 │  ┌───────────────────────┐  │
 │  │ Single base provider  │  │
 │  └───────────────────────┘  │
 └─────────────────────────────┘
```

### FallbackChainProvider

`FallbackChainProvider` tries each provider in order and returns the first success. If all throw, it re-throws the last error:

```typescript
// src/providers/fallback-chain.ts:18-30
private async tryAll(fn: (p: MemoryProvider) => Promise<string>): Promise<string> {
  let lastError: Error | null = null;
  for (const provider of this.providers) {
    try {
      return await fn(provider);
    } catch (err) {
      lastError = err instanceof Error ? err : new Error(String(err));
    }
  }
  throw lastError || new Error("No providers available");
}
```

The chain's `name` is a human-readable trace like `fallback(anthropic -> openrouter -> noop)`. Sources: [src/providers/fallback-chain.ts:1-31]()

### CircuitBreaker

`CircuitBreaker` prevents hammering a failing provider. Its state machine has three states:

```
 closed ──(failures >= threshold)──► open
    ▲                                  │
    │ (success in half-open)           │ (recoveryTimeoutMs elapsed)
    └──────── half-open ◄──────────────┘
```

Default parameters (all overridable via constructor options):

| Parameter | Default | Meaning |
|-----------|---------|---------|
| `failureThreshold` | 3 | Failures within the window to open |
| `failureWindowMs` | 60 000 ms | Window over which failures are counted |
| `recoveryTimeoutMs` | 30 000 ms | How long to stay open before trying half-open |

When the breaker is open, `ResilientProvider.call()` throws `"circuit_breaker_open"` immediately without touching the network. Sources: [src/providers/circuit-breaker.ts:1-82](), [src/providers/resilient.ts:12-24]()

### NoopProvider — Safe Zero-Key Default

When no LLM key is found and `AGENTMEMORY_ALLOW_AGENT_SDK` is not `true`, `detectProvider` returns `provider: "noop"`. The `NoopProvider` always returns empty strings:

```typescript
// src/providers/noop.ts:10-20
export class NoopProvider implements MemoryProvider {
  name = "noop";
  async compress(): Promise<string> { return ""; }
  async summarize(): Promise<string> { return ""; }
}
```

Callers that receive an empty string are expected to short-circuit rather than store the empty result. This prevents the Stop-hook recursion loop documented in issue #149. Sources: [src/providers/noop.ts:1-20]()

---

## Embedding Provider System

Text embeddings and image/CLIP embeddings are **independently** configured. Neither is required; the system falls back to BM25-only search if no embedding key is set.

### Text Embedding — Auto-Detection

`detectEmbeddingProvider()` in `src/config.ts` checks keys in this order, with `EMBEDDING_PROVIDER` as a manual override:

| Priority | Provider | Key Required | Dimensions |
|----------|----------|--------------|------------|
| 0 (override) | _any_ | `EMBEDDING_PROVIDER=<name>` | varies |
| 1 | `gemini` | `GEMINI_API_KEY` | — |
| 2 | `openai` | `OPENAI_API_KEY` | — |
| 3 | `voyage` | `VOYAGE_API_KEY` | 1024 |
| 4 | `cohere` | `COHERE_API_KEY` | — |
| 5 | `openrouter` | `OPENROUTER_API_KEY` | — |
| 6 | `local` | _(forced via `EMBEDDING_PROVIDER=local`)_ | 384 |
| — | _(none)_ | no key found | BM25 only |

`createEmbeddingProvider()` returns `null` when no provider is detected, which signals the search layer to skip vector indexing entirely. Sources: [src/config.ts:197-210](), [src/providers/embedding/index.ts:30-50]()

### Local Xenova Provider (No Key Required)

Setting `EMBEDDING_PROVIDER=local` activates `LocalEmbeddingProvider`, which uses `@xenova/transformers` (an optional peer dependency) to run `Xenova/all-MiniLM-L6-v2` in-process. Vectors are 384-dimensional:

```typescript
// src/providers/embedding/local.ts:13-51
export class LocalEmbeddingProvider implements EmbeddingProvider {
  readonly name = "local";
  readonly dimensions = 384;
  // ... lazy-loads @xenova/transformers on first call
  private async getExtractor() {
    transformers = await import("@xenova/transformers");
    this.extractor = await transformers.pipeline(
      "feature-extraction", "Xenova/all-MiniLM-L6-v2"
    );
  }
}
```

No API key, no network call, no billing. The model file is downloaded and cached on first use. Sources: [src/providers/embedding/local.ts:13-52]()

### Image/CLIP Embeddings

Image embeddings are a separate opt-in path enabled by `AGENTMEMORY_IMAGE_EMBEDDINGS=true`. The `ClipEmbeddingProvider` runs `Xenova/clip-vit-base-patch32` (512-dimensional) locally via the same `@xenova/transformers` peer dependency:

```typescript
// src/providers/embedding/clip.ts:24-54
export class ClipEmbeddingProvider implements EmbeddingProvider {
  readonly name = "clip";
  readonly dimensions = 512;

  async embedImage(src: string): Promise<Float32Array> {
    // accepts "data:<mime>;base64,..." or a file path
    const image = await loadImage(t, src);
    const output = await extractor(image);
    return normalize(output.data ?? new Float32Array(output.tolist()[0] || []));
  }
}
```

Text and image embeddings use the same CLIP model but different pipeline tasks (`feature-extraction` vs `image-feature-extraction`), guaranteeing shared embedding space for cross-modal retrieval. Sources: [src/providers/embedding/clip.ts:22-81](), [src/providers/embedding/index.ts:23-28]()

### Dimension Guard — Preventing Silent Corruption

Every embedding provider is wrapped with `withDimensionGuard()` before being returned from `createEmbeddingProvider`. This guard catches dimension mismatches at the boundary rather than letting wrong-length vectors silently corrupt the vector index (which returns `0` from cosine similarity on a length mismatch instead of throwing):

```typescript
// src/providers/embedding/index.ts:56-80
export function withDimensionGuard(provider: EmbeddingProvider): EmbeddingProvider {
  const expected = provider.dimensions;
  const check = (v: Float32Array, where: string): Float32Array => {
    if (v.length !== expected) {
      throw new Error(
        `Embedding dimension mismatch in ${provider.name}.${where}: ` +
        `expected ${expected}, got ${v.length}`
      );
    }
    return v;
  };
  // prototype chain preserved so instanceof checks keep working
  const wrapped = Object.create(provider) as EmbeddingProvider;
  wrapped.embed    = async (t)  => check(await provider.embed(t), "embed");
  wrapped.embedBatch = async (ts) => { ... };
  ...
}
```

Sources: [src/providers/embedding/index.ts:52-80]()

---

## Configuration Reference

All keys can be placed in `~/.agentmemory/.env` (loaded before `process.env`, then merged with process env):

### LLM Provider Variables

| Variable | Purpose |
|----------|---------|
| `ANTHROPIC_API_KEY` | Enable Anthropic provider; supports `ANTHROPIC_BASE_URL` override |
| `OPENAI_API_KEY` | Enable OpenAI provider; supports `OPENAI_BASE_URL`, `OPENAI_MODEL` |
| `OPENAI_API_KEY_FOR_LLM` | Set to `"false"` to reserve the key for embeddings only |
| `GEMINI_API_KEY` / `GOOGLE_API_KEY` | Enable Gemini via OpenAI-compatible endpoint |
| `OPENROUTER_API_KEY` | Enable OpenRouter; supports `OPENROUTER_MODEL` |
| `MINIMAX_API_KEY` | Enable MiniMax (raw-fetch, Anthropic-compatible API) |
| `AGENTMEMORY_ALLOW_AGENT_SDK` | Set `"true"` to permit the `agent-sdk` fallback (risk: recursion loop) |
| `FALLBACK_PROVIDERS` | Comma-separated list, e.g. `anthropic,openrouter` |
| `MAX_TOKENS` | Token budget for LLM calls (default: 4096) |
| `AGENTMEMORY_AUTO_COMPRESS` | Set `"true"` to enable per-observation LLM compression (off by default) |

### Embedding Provider Variables

| Variable | Purpose |
|----------|---------|
| `EMBEDDING_PROVIDER` | Force a specific provider (`gemini`, `openai`, `voyage`, `cohere`, `openrouter`, `local`) |
| `VOYAGE_API_KEY` | Enable Voyage AI (model: `voyage-code-3`, 1024-dim) |
| `COHERE_API_KEY` | Enable Cohere embeddings |
| `AGENTMEMORY_IMAGE_EMBEDDINGS` | Set `"true"` to enable CLIP image embedding (requires `@xenova/transformers`) |
| `BM25_WEIGHT` | Hybrid search BM25 weight (default: 0.4) |
| `VECTOR_WEIGHT` | Hybrid search vector weight (default: 0.6) |

---

## Failure Modes and Invariants

| Scenario | Behavior |
|----------|---------|
| No LLM key set | `NoopProvider` returns `""` — no compression/summarization, no crash |
| `AGENTMEMORY_ALLOW_AGENT_SDK=true` without a real key | `agent-sdk` spawns Claude child sessions — risks infinite Stop-hook recursion (#149) |
| Fallback provider key absent at startup | Provider silently dropped from chain; no runtime error |
| All providers in the chain fail | `FallbackChainProvider` re-throws the last error; `CircuitBreaker` records the failure |
| CircuitBreaker open | Calls fail fast with `"circuit_breaker_open"` — no network requests until `recoveryTimeoutMs` |
| No embedding key and `EMBEDDING_PROVIDER` unset | `createEmbeddingProvider` returns `null`; search uses BM25 only |
| Wrong embedding dimensions returned by provider | `withDimensionGuard` throws immediately — bad vector is never written to the index |
| `EMBEDDING_PROVIDER=local` | `@xenova/transformers` is a peer dep; install it manually or get a clear error |

---

## Summary

agentmemory's provider system is a two-layer BYOK design: the LLM layer (auto-detected from keys in priority order, wrapped in a FallbackChain and a CircuitBreaker) and the embedding layer (independently auto-detected, with a local Xenova fallback that needs no key). The `NoopProvider` and a `null` embedding result are the safe degraded states — the system continues to function for capture and BM25 search even with no API keys configured at all. The `withDimensionGuard` wrapper in `src/providers/embedding/index.ts:56-80` is the critical invariant that keeps the vector index from accumulating silent corruption when a provider's output changes shape.
