# Wire Shaping & Model Resolution — Context Budget Enforcement > wire-shaping.ts enforces a byte budget (15 MB trigger, 80% target) and a token budget (200k default effectiveContext) on the dispatched message list. Eviction advances the WireGuardHorizon, trimming oldest messages in one block to minimize prompt-cache invalidations. Images get a fixed 1,600-token estimate to prevent base64 byte inflation from triggering early eviction. Model resolution (resolver.ts, catalog.ts, duet-gateway.ts) abstracts Anthropic/OpenAI/OpenRouter/Duet Gateway behind a single resolveModelName call. This page explains the two-gate eviction system, cache-miss cost model, and BYOK/BYOC model routing. - Repository: dzhng/duet-agent - GitHub: https://github.com/dzhng/duet-agent - Human wiki: https://grok-wiki.com/public/wiki/dzhng-duet-agent-82dbe2572d3a - Complete Markdown: https://grok-wiki.com/public/wiki/dzhng-duet-agent-82dbe2572d3a/llms-full.txt ## Source Files - `src/turn-runner/wire-shaping.ts` - `src/turn-runner/state-compaction.ts` - `src/model-resolution/resolver.ts` - `src/model-resolution/catalog.ts` - `src/model-resolution/duet-gateway.ts` - `evals/thread-context-loss.eval.ts` - `evals/context-overflow-recovery.eval.ts` - `evals/prompt-cache.eval.ts` ---

Relevant source files

The following files were used as context for generating this wiki page: - [src/turn-runner/wire-shaping.ts](src/turn-runner/wire-shaping.ts) - [src/turn-runner/state-compaction.ts](src/turn-runner/state-compaction.ts) - [src/memory/observational.ts](src/memory/observational.ts) - [src/model-resolution/resolver.ts](src/model-resolution/resolver.ts) - [src/model-resolution/catalog.ts](src/model-resolution/catalog.ts) - [src/model-resolution/duet-gateway.ts](src/model-resolution/duet-gateway.ts) - [src/turn-runner/turn-runner.ts](src/turn-runner/turn-runner.ts) - [evals/thread-context-loss.eval.ts](evals/thread-context-loss.eval.ts) - [evals/prompt-cache.eval.ts](evals/prompt-cache.eval.ts)

# Wire Shaping & Model Resolution — Context Budget Enforcement The duet-agent runner maintains a context window that is bounded by two independent constraints: a **token budget** that governs normal text-and-tool sessions, and a **byte budget** that prevents image-heavy threads from inflating serialized request sizes past any sane limit. These two gates are implemented in `wire-shaping.ts` and called from the observational memory transform. Separately, a three-level model resolution chain (`resolver.ts`, `catalog.ts`, `duet-gateway.ts`) normalizes user-supplied model names into concrete provider-specific model objects — abstracting Anthropic, OpenAI, OpenRouter, Vercel AI Gateway, and the Duet hosted gateway behind a single `resolveModelName` call. Both systems interact at a critical seam: model resolution determines the hard context-window limit that `effectiveContext` is clamped against, and wire-shaping determines how many messages actually reach the provider. Understanding these two mechanisms together is essential for predicting when the system will evict messages, what it will preserve, and how cost-sensitive prompt-cache behavior is protected during eviction. --- ## The Two-Gate Eviction System ### Gate 1: Token Budget (`messageTokens`) The primary eviction gate tracks estimated token consumption across the dispatched message list. The `DEFAULT_EFFECTIVE_CONTEXT` is **200,000 tokens**, a target well below frontier-model windows (Sources: [src/memory/observational.ts:62](src/memory/observational.ts)). This value is clamped to the model's hard `contextWindow` at runtime: ```ts // src/turn-runner/turn-runner.ts (resolveEffectiveContext) protected resolveEffectiveContext(modelWindow?: number): number { const userValue = this.config.effectiveContext ?? DEFAULT_EFFECTIVE_CONTEXT; return modelWindow !== undefined ? Math.min(userValue, modelWindow) : userValue; } ``` Within `effectiveContext`, three ratios govern how the budget is allocated: | Segment | Ratio | Purpose | |---|---|---| | `messageTokens` | 60% | Raw-message tail compaction trigger | | `observationTokens` | 32.5% | Local memory-pack ceiling | | `globalContextTokenBudget` | 7.5% | Cross-session global pack ceiling | Sources: [src/memory/observational.ts:73-80](src/memory/observational.ts) When `messageTokens` is exceeded, the observational context transform calls `findEvictionHorizon` to advance the `WireGuardHorizon` until the token estimate for the post-horizon message slice falls back below the trigger. ### Gate 2: Byte Budget (`WIRE_BYTE_TRIGGER`) The byte gate exists specifically because **image attachments break the token gate**. A single 2 MB inline base64 image would produce a naive `ceil(bytes/4) ≈ 500k` token estimate, which would evict earlier user messages from the wire prematurely. The actual per-image provider charge (Claude's vision tops out near 1,568 tokens; OpenAI high-detail tiles are bounded) is far lower and unrelated to the base64 payload length. Two constants define this gate: ```ts // src/turn-runner/wire-shaping.ts export const WIRE_BYTE_TRIGGER = 15 * 1024 * 1024; // 15 MB export const WIRE_BYTE_TARGET = Math.floor(WIRE_BYTE_TRIGGER * 0.8); // 12 MB ``` Sources: [src/turn-runner/wire-shaping.ts:18-28](src/turn-runner/wire-shaping.ts) The target is 80% of the trigger so that a single block-eviction leaves room for several more turns of growth before tripping the gate again. ### Image Token Estimate: Preventing Base64 Inflation To avoid false-positive token evictions on image messages, image blocks are counted at a fixed **1,600 tokens** regardless of their actual base64 payload size: ```ts // src/turn-runner/wire-shaping.ts export const IMAGE_WIRE_TOKEN_ESTIMATE = 1_600; function calculateMessageTokens(msg: AgentMessage): number { // ... for (const block of content) { if (isImageBlock(block)) total += IMAGE_WIRE_TOKEN_ESTIMATE; else if (isTextBlock(block)) total += Math.ceil(block.text.length / 4); // ... } } ``` Sources: [src/turn-runner/wire-shaping.ts:43-169](src/turn-runner/wire-shaping.ts) Byte size is counted separately (raw `block.data.length`) and compared only against `WIRE_BYTE_TRIGGER` — not against the token budget. This dual-counting model correctly assigns each signal to its appropriate gate. --- ## The WireGuardHorizon: Sticky Eviction to Protect Prompt-Cache ### Why a Sticky Horizon The runner calls `transformContext` on every turn against the **full** untransformed message history. A naïve design that re-computed the eviction cut each turn would advance the cut by one message every turn, invalidating the provider's cached prefix each time. Prompt-cache entries are keyed to an exact token prefix; any change forces a cold miss. The `WireGuardHorizon` pins the cut to a monotonically-advancing timestamp: ```ts // src/turn-runner/wire-shaping.ts export interface WireGuardHorizon { /** Messages with timestamp <= evictionHorizon are dropped from the wire. */ evictionHorizon: number; } ``` Sources: [src/turn-runner/wire-shaping.ts:66-69](src/turn-runner/wire-shaping.ts) Once the horizon is set, every subsequent turn produces the same shape — the same evicted prefix, the same surviving tail — until the budget is exceeded again and the horizon must advance. Advancing the horizon invalidates the cache once; not advancing it preserves the cache across every turn in between. The horizon is held in memory on the runner instance and resets on session resume. This is intentional: provider-side prompt caches typically do not survive resume gaps anyway, so persisting the horizon across a restart would buy at most one avoided cache miss. Sources: [src/turn-runner/turn-runner.ts:1656](src/turn-runner/turn-runner.ts) ### Block Eviction vs. Incremental Trim `findEvictionHorizon` walks messages oldest-first and advances the horizon past whole messages until the caller-supplied predicate (`satisfiesBudget`) returns true. This produces a **block eviction** — a single large jump — rather than one-message-at-a-time trimming. The comment explains the tradeoff: > "One large block-evict per crossing is far cheaper for prompt caching than incrementally trimming on every turn (each advance invalidates the cached prefix once, so fewer advances = fewer invalidations)." Sources: [src/turn-runner/wire-shaping.ts:22-28](src/turn-runner/wire-shaping.ts) ### `applyEvictionHorizon`: Orphan Cleanup After finding the horizon, `applyEvictionHorizon` removes messages at or before it and also drops any orphaned `toolResult` or `assistant` messages that end up at the new head — both Anthropic and OpenAI require the first message to have `role: user`: ```ts // src/turn-runner/wire-shaping.ts:182-193 export function applyEvictionHorizon(messages: AgentMessage[], horizon: number): AgentMessage[] { if (horizon <= 0) return messages; let firstKept = 0; while (firstKept < messages.length && messageTimestamp(messages[firstKept]!) <= horizon) firstKept += 1; while (firstKept < messages.length && messages[firstKept]!.role !== "user") firstKept += 1; // ... } ``` Sources: [src/turn-runner/wire-shaping.ts:182-193](src/turn-runner/wire-shaping.ts) The `MIN_HISTORY_TAIL = 1` constant ensures at least the most recent user message always survives eviction — this message is the actor's current prompt and cannot be dropped. --- ## Gate Interaction Diagram ```text Per-turn dispatch pipeline ═══════════════════════════════════════════════════════════════════ Full transcript (all messages, all time) │ ▼ ┌─────────────────────────┐ │ applyEvictionHorizon │ Drops messages ≤ WireGuardHorizon.evictionHorizon │ (wire-shaping.ts) │ + strips orphaned tool-result / assistant heads └───────────┬─────────────┘ │ Dispatched slice ▼ ┌───────────────────────────────────────┐ │ Gate 1: Token check │ │ calculateWireTokens(slice) │ Images → 1,600 fixed tokens │ vs. 0.60 × effectiveContext │ Text/tools → ceil(chars/4) └───────────┬─────────┬─────────────────┘ OK │ │ Over budget │ └──► findEvictionHorizon → advance horizon → retry │ ┌───────────▼───────────────────────────┐ │ Gate 2: Byte check │ │ calculateWireBytes(slice) │ Images → raw base64 length │ vs. WIRE_BYTE_TRIGGER (15 MB) │ Text → UTF-16 length └───────────┬─────────┬─────────────────┘ OK │ │ Over budget │ └──► findEvictionHorizon → advance to 80% target → retry │ ▼ Provider API call ``` --- ## State-Level Compaction: A Separate Concern `state-compaction.ts` handles a distinct problem: the **in-memory and on-disk TurnState** can grow without bound as tool calls accumulate across a long session. Its ceiling is a **100 MB JSON serialized size** limit (`DEFAULT_STATE_MAX_BYTES`), controlled per-runner via `autoStateCompaction.maxBytes`. Unlike wire-shaping, state compaction operates on the stored transcript, not the dispatched slice. It also performs the same `toolResult`/`assistant` head-fix to ensure the post-eviction message list starts with a `user` role. Sources: [src/turn-runner/state-compaction.ts:14-98](src/turn-runner/state-compaction.ts) These two layers are orthogonal: | Layer | File | Trigger | Target | Scope | |---|---|---|---|---| | Wire shaping | `wire-shaping.ts` | 15 MB bytes or 60% × 200k tokens | 80% of trigger | Dispatched message slice only | | State compaction | `state-compaction.ts` | 100 MB JSON size | ≥1 message retained | Full stored transcript | --- ## Model Resolution: `resolveModelName` ### Entry Point All model references in the runner ultimately pass through a single function: ```ts // src/model-resolution/resolver.ts:40-63 export function resolveModelName(model: string): Model { model = resolveModelReference(model); const separator = model.indexOf(":"); if (separator === -1) throw new Error("Models must use provider:modelId syntax"); const rawProvider = model.slice(0, separator); const rawModelId = model.slice(separator + 1); const provider = resolveProviderShorthand(rawProvider) ?? rawProvider; const modelId = isKnownProvider(provider) ? canonicalizeProviderModelId(provider, rawModelId) : rawModelId; if (provider === "duet-gateway") { const resolved = resolveDuetGatewayModel(modelId); if (!resolved) throw new Error(`Unknown duet-gateway model: ${modelId}`); return resolved; } return getModel(provider, modelId); } ``` Sources: [src/model-resolution/resolver.ts:40-63](src/model-resolution/resolver.ts) The flow is: normalize shorthand → parse `provider:modelId` → canonicalize aliases → dispatch to Duet gateway branch or standard `getModel`. ### Provider Order and Inference `PROVIDER_ORDER` defines both the resolution priority and the env-var inference order. `duet-gateway` must precede `vercel-ai-gateway` because the CLI shims `DUET_API_KEY` into `AI_GATEWAY_API_KEY` at startup, and `duet-gateway` must win before the Vercel entry reads that variable: ```ts // src/model-resolution/catalog.ts:38-47 export const PROVIDER_ORDER: readonly ProviderPreference[] = [ { provider: "duet-gateway", customEnvVar: () => process.env[DUET_GATEWAY_API_KEY_ENV] ? DUET_GATEWAY_API_KEY_ENV : null }, { provider: "vercel-ai-gateway" }, { provider: "openrouter" }, { provider: "anthropic" }, { provider: "openai" }, ]; ``` Sources: [src/model-resolution/catalog.ts:38-47](src/model-resolution/catalog.ts) When no `--model` flag is passed, `resolveCliModel` walks `PROVIDER_ORDER` and infers the provider from whichever env var is present first. This makes the system BYOK-friendly: set `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `OPENROUTER_API_KEY`, or `DUET_API_KEY` and the right provider is selected automatically. ### The Catalog: Shorthands and Aliases `catalog.ts` defines `MODEL_DEFINITIONS` — a list of shorthand names, aliases, and per-provider model IDs. For example, the shorthand `opus-4.7` resolves differently per provider: | Provider | Resolved model ID | |---|---| | `duet-gateway` | `anthropic/claude-opus-4.7` | | `vercel-ai-gateway` | `anthropic/claude-opus-4.7` | | `openrouter` | `anthropic/claude-opus-4.7` | | `anthropic` | `claude-opus-4-7` | Sources: [src/model-resolution/catalog.ts:65-80](src/model-resolution/catalog.ts) This alias table lets users pass `claude-opus-4-7`, `claude-opus-4.7`, or bare `opus-4.7` interchangeably. `canonicalizeModelName` normalizes everything to the canonical shorthand; `canonicalizeProviderModelId` maps it to the correct provider-specific ID. Default models are also separated by use case: ```ts export const DEFAULT_CLI_MODEL = "opus-4.7"; export const DEFAULT_CLI_MEMORY_MODEL = "gpt-5.4-mini"; ``` Sources: [src/model-resolution/catalog.ts:30-31](src/model-resolution/catalog.ts) The memory model is intentionally cheaper — it runs observational summarization, not actor turns. ### Duet Gateway: BYOC Routing The `duet-gateway` provider is a thin proxy layer over Vercel AI Gateway. It reuses Vercel's model registry and request/response contract verbatim, overriding only `baseUrl` to point at the Duet-hosted proxy: ```ts // src/model-resolution/duet-gateway.ts:40-48 export function resolveDuetGatewayModel(modelId: string): Model | undefined { forceDuetGatewayAuth(); const upstream = getModel("vercel-ai-gateway", modelId); if (!upstream) return undefined; return { ...upstream, baseUrl: getDuetGatewayBaseUrl() }; } ``` Sources: [src/model-resolution/duet-gateway.ts:40-48](src/model-resolution/duet-gateway.ts) Auth is handled by `forceDuetGatewayAuth`, which overwrites `AI_GATEWAY_API_KEY` with the `DUET_API_KEY` before every Duet request. This is intentionally stricter than the startup shim (`shimDuetApiKeyToAiGateway`), which avoids clobbering an existing Vercel key. The Duet proxy rejects Vercel-issued `vck_...` tokens with an opaque 500, so the force-overwrite is necessary when the user has both keys set. The base URL is `${DUET_APP_BASE_URL}/api/v1/ai-gateway`, resolved via `resolveDuetAppBaseUrl()`. Users override the app origin with `DUET_APP_BASE_URL`; the gateway path is fixed. Sources: [src/model-resolution/duet-gateway.ts:1-78](src/model-resolution/duet-gateway.ts) --- ## Cache Miss Cost Model The prompt-cache.eval.ts eval verifies that `cacheRead > 0` on the second turn of a resumed session, confirming that the serialized `TurnState` reconstructs an exact byte-for-byte prefix. The `WireGuardHorizon` mechanism is the key invariant that makes this possible across turns: as long as the horizon does not advance, the dispatched slice is content-deterministic and the provider finds its cached entry. Sources: [evals/prompt-cache.eval.ts:33-53](evals/prompt-cache.eval.ts) When the horizon does advance (either from a token/byte budget crossing or from a provider context-overflow recovery), the runner treats that cache miss as an opportunity to also refresh the frozen memory pack — piggybacking on the already-invalidated prefix rather than paying a second miss. Sources: [src/turn-runner/turn-runner.ts:1641-1644](src/turn-runner/turn-runner.ts) The thread-context-loss eval (`evals/thread-context-loss.eval.ts`) documents the original failure mode this system corrects: image-heavy threads where a base64 byte inflation triggered premature eviction of early user messages, making the model appear to "forget" prior conversation. The fix — routing images to `IMAGE_WIRE_TOKEN_ESTIMATE` for token accounting while isolating raw bytes for the byte gate — resolved the mismatch. Sources: [evals/thread-context-loss.eval.ts:20-45](evals/thread-context-loss.eval.ts) --- ## Summary Wire shaping enforces a two-gate eviction system: a token gate (60% of `effectiveContext`, default 200k) for normal sessions, and a byte gate (15 MB trigger, 80% target) that catches image-payload inflation the token gate would miscount. Both gates advance the same `WireGuardHorizon`, which pins the eviction cut to a monotonically-increasing timestamp so subsequent turns reconstruct an identical dispatched prefix and re-hit the provider's prompt cache. Model resolution normalizes any user-supplied shorthand, alias, or `provider:modelId` string into a concrete `Model` object through a five-provider catalog, with `duet-gateway` taking priority when `DUET_API_KEY` is set and reusing Vercel AI Gateway's transport and model registry with only the base URL and auth token swapped. Sources: [src/turn-runner/wire-shaping.ts:1-214](src/turn-runner/wire-shaping.ts), [src/model-resolution/resolver.ts:40-63](src/model-resolution/resolver.ts)