Agent-readable wiki

Pi Agent Harness — Socratic Exploration Wiki

A four-package TypeScript monorepo that assembles a self-extensible coding agent: a unified multi-provider LLM layer, a provider-neutral agent loop, a terminal UI library, and the interactive CLI that binds them. What makes it worth studying is the deliberate separation of concerns—each layer is independently publishable and replaceable—and the extension system that lets the agent modify its own tools at runtime.

Pages

  1. The First Question: Why Is a Simple Script Not Enough?What problem does pi actually solve, and why does the answer demand four separate packages instead of one? This page traces the repo from its name down to its monorepo shape, asking at every step which assumption would break if the system were simpler. The answer reveals the architectural bet: that provider neutrality, session persistence, and a live extension system are inseparable from any serious coding agent.
  2. Why Four Packages? Where Does Each Layer End?The repo ships four independently publishable npm packages: pi-ai, pi-agent-core, pi-tui, and pi-coding-agent. Each boundary is a deliberate seam. This page asks: which concerns forced each split, what can cross each boundary, and what would break if two packages were merged? Reading package.json exports, tsconfig.build.json, and the import graph answers these questions concretely.
  3. BYOK / BYOC: What Does Provider Neutrality Actually Cost?Pi promises that users bring their own keys and providers. This page probes what that claim requires in practice: how env-api-keys.ts discovers credentials, how the OAuth flow is handled, how the add-llm-provider skill teaches the agent to register new providers at runtime, and what invariants must hold for every provider adapter. The skill file .pi/skills/add-llm-provider.md is the primary non-README evidence.
  4. The Registry: How Does a New Provider Become Callable?pi-ai uses a runtime registry pattern: providers are registered by API string key, and the agent loop calls them through that registry without knowing which concrete module it will hit. This page traces registerApiProvider, registerBuiltInApiProviders, and the lazy-load pattern in register-builtins.ts that defers heavy provider modules until first use—exposing the tradeoff between startup time and call overhead.
  5. Nine Providers, One Interface: What Must Every Adapter Guarantee?The providers/ directory contains adapters for Anthropic, OpenAI (completions, responses, Codex), Azure OpenAI, Google AI, Google Vertex, Mistral, Amazon Bedrock, Cloudflare, and a faux test provider. This page asks what the common stream/streamSimple contract is, where adapters diverge (Bedrock's Node-only constraint, GitHub Copilot's custom headers, OpenAI prompt-cache specifics), and what the faux provider reveals about testability.
  6. Streaming All the Way Down: What Happens Between Token and Tool Call?Every LLM call returns an AsyncIterable of AssistantMessageEvents. This page follows an event from the provider stream through AssistantMessageEventStream (utils/event-stream.ts), into the agent loop's stream function, and up to the TUI renderer. It asks: where is backpressure applied, how are partial tool-call arguments accumulated before validation, and what does the overflow utility guard against?
  7. The Loop: What Is the Minimal Unit of Agent Work?packages/agent/src/agent-loop.ts implements the turn-based cycle: add prompt → call LLM → emit events → execute pending tool calls → repeat until stop. This page asks what each AgentEvent type signals, how tool execution mode (sequential vs. parallel) is chosen, and what the difference is between runAgentLoop and runAgentLoopContinue. The test files agent-loop.test.ts and agent.test.ts show which invariants the authors actually enforce.
  8. AgentSession: What State Must Survive a Model Switch or Session Resume?AgentSession (core/agent-session.ts) is the shared abstraction across interactive, print, and RPC modes. It owns session persistence, model/thinking-level management, bash execution, and auto-compaction triggers. This page asks: what is serialized to disk, which events drive session persistence, how session branching works, and why AgentSession is deliberately mode-agnostic.
  9. Compaction: When the Context Window Is the Enemy, What Gets Thrown Away?As conversations grow, token counts approach the context window limit. The compaction subsystem (core/compaction/) answers: when to compact, which messages to summarize, how branch-level summaries differ from turn-level summaries, and how the agent resumes coherently after a compaction round. The tests agent-session-compaction.test.ts and harness/compaction.test.ts reveal the boundary conditions.
  10. Built-In Tools: What Can the Agent Actually Do to a Filesystem?The coding agent ships six built-in tools: Read, Write, Edit, Bash, Grep/Find, and Ls. This page asks how each tool definition wraps the underlying operation (tool-definition-wrapper.ts), what file-mutation-queue.ts serializes to prevent concurrent edits, how bash.ts sandboxes commands, and what output-accumulator.ts does to keep large tool results from overflowing the context. The tools/ directory is the system's ground-level action surface.
  11. pi-tui: Why Build a Terminal UI Library from Scratch?packages/tui implements its own terminal rendering engine with differential output, an undo stack, a kill ring, Emacs-style key bindings, fuzzy search, and inline image display (Kitty/Sixel). This page asks what constraints made off-the-shelf libraries insufficient, how the virtual terminal model in terminal.ts avoids screen-flicker, and what stdin-buffer.ts does to handle raw key events. The regression tests expose the edge cases that forced custom code.
  12. Three Modes, One AgentSession: What Changes Between Interactive, Print, and RPC?The coding agent runs in three surface modes: interactive (full TUI), print (stdout-only for scripting), and RPC (JSONL protocol for IDE integration). All three share AgentSession; each adds its own I/O adapter. This page examines rpc-mode.ts and rpc-types.ts to understand the JSONL protocol, contrasts it with interactive-mode.ts component wiring, and asks what the RPC mode reveals about the true API surface of the agent.

Complete Markdown

# Pi Agent Harness — Socratic Exploration Wiki

> A four-package TypeScript monorepo that assembles a self-extensible coding agent: a unified multi-provider LLM layer, a provider-neutral agent loop, a terminal UI library, and the interactive CLI that binds them. What makes it worth studying is the deliberate separation of concerns—each layer is independently publishable and replaceable—and the extension system that lets the agent modify its own tools at runtime.

## Context Links

- [Agent index](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/llms.txt)
- [Human interactive wiki](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234)
- [GitHub repository](https://github.com/earendil-works/pi)

## Repository Metadata

- Repository: earendil-works/pi

- Generated: 2026-05-22T23:31:33.949Z
- Updated: 2026-05-22T23:54:42.183Z
- Runtime: Claude Code
- Format: Socratic Exploration
- Pages: 12

## Page Index

- 01. [The First Question: Why Is a Simple Script Not Enough?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/01-the-first-question-why-is-a-simple-script-not-enough.md) - What problem does pi actually solve, and why does the answer demand four separate packages instead of one? This page traces the repo from its name down to its monorepo shape, asking at every step which assumption would break if the system were simpler. The answer reveals the architectural bet: that provider neutrality, session persistence, and a live extension system are inseparable from any serious coding agent.
- 02. [Why Four Packages? Where Does Each Layer End?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/02-why-four-packages-where-does-each-layer-end.md) - The repo ships four independently publishable npm packages: pi-ai, pi-agent-core, pi-tui, and pi-coding-agent. Each boundary is a deliberate seam. This page asks: which concerns forced each split, what can cross each boundary, and what would break if two packages were merged? Reading package.json exports, tsconfig.build.json, and the import graph answers these questions concretely.
- 03. [BYOK / BYOC: What Does Provider Neutrality Actually Cost?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/03-byok-byoc-what-does-provider-neutrality-actually-cost.md) - Pi promises that users bring their own keys and providers. This page probes what that claim requires in practice: how env-api-keys.ts discovers credentials, how the OAuth flow is handled, how the add-llm-provider skill teaches the agent to register new providers at runtime, and what invariants must hold for every provider adapter. The skill file .pi/skills/add-llm-provider.md is the primary non-README evidence.
- 04. [The Registry: How Does a New Provider Become Callable?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/04-the-registry-how-does-a-new-provider-become-callable.md) - pi-ai uses a runtime registry pattern: providers are registered by API string key, and the agent loop calls them through that registry without knowing which concrete module it will hit. This page traces registerApiProvider, registerBuiltInApiProviders, and the lazy-load pattern in register-builtins.ts that defers heavy provider modules until first use—exposing the tradeoff between startup time and call overhead.
- 05. [Nine Providers, One Interface: What Must Every Adapter Guarantee?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/05-nine-providers-one-interface-what-must-every-adapter-guarantee.md) - The providers/ directory contains adapters for Anthropic, OpenAI (completions, responses, Codex), Azure OpenAI, Google AI, Google Vertex, Mistral, Amazon Bedrock, Cloudflare, and a faux test provider. This page asks what the common stream/streamSimple contract is, where adapters diverge (Bedrock's Node-only constraint, GitHub Copilot's custom headers, OpenAI prompt-cache specifics), and what the faux provider reveals about testability.
- 06. [Streaming All the Way Down: What Happens Between Token and Tool Call?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/06-streaming-all-the-way-down-what-happens-between-token-and-tool-call.md) - Every LLM call returns an AsyncIterable of AssistantMessageEvents. This page follows an event from the provider stream through AssistantMessageEventStream (utils/event-stream.ts), into the agent loop's stream function, and up to the TUI renderer. It asks: where is backpressure applied, how are partial tool-call arguments accumulated before validation, and what does the overflow utility guard against?
- 07. [The Loop: What Is the Minimal Unit of Agent Work?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/07-the-loop-what-is-the-minimal-unit-of-agent-work.md) - packages/agent/src/agent-loop.ts implements the turn-based cycle: add prompt → call LLM → emit events → execute pending tool calls → repeat until stop. This page asks what each AgentEvent type signals, how tool execution mode (sequential vs. parallel) is chosen, and what the difference is between runAgentLoop and runAgentLoopContinue. The test files agent-loop.test.ts and agent.test.ts show which invariants the authors actually enforce.
- 08. [AgentSession: What State Must Survive a Model Switch or Session Resume?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/08-agentsession-what-state-must-survive-a-model-switch-or-session-resume.md) - AgentSession (core/agent-session.ts) is the shared abstraction across interactive, print, and RPC modes. It owns session persistence, model/thinking-level management, bash execution, and auto-compaction triggers. This page asks: what is serialized to disk, which events drive session persistence, how session branching works, and why AgentSession is deliberately mode-agnostic.
- 09. [Compaction: When the Context Window Is the Enemy, What Gets Thrown Away?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/09-compaction-when-the-context-window-is-the-enemy-what-gets-thrown-away.md) - As conversations grow, token counts approach the context window limit. The compaction subsystem (core/compaction/) answers: when to compact, which messages to summarize, how branch-level summaries differ from turn-level summaries, and how the agent resumes coherently after a compaction round. The tests agent-session-compaction.test.ts and harness/compaction.test.ts reveal the boundary conditions.
- 10. [Built-In Tools: What Can the Agent Actually Do to a Filesystem?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/10-built-in-tools-what-can-the-agent-actually-do-to-a-filesystem.md) - The coding agent ships six built-in tools: Read, Write, Edit, Bash, Grep/Find, and Ls. This page asks how each tool definition wraps the underlying operation (tool-definition-wrapper.ts), what file-mutation-queue.ts serializes to prevent concurrent edits, how bash.ts sandboxes commands, and what output-accumulator.ts does to keep large tool results from overflowing the context. The tools/ directory is the system's ground-level action surface.
- 11. [pi-tui: Why Build a Terminal UI Library from Scratch?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/11-pi-tui-why-build-a-terminal-ui-library-from-scratch.md) - packages/tui implements its own terminal rendering engine with differential output, an undo stack, a kill ring, Emacs-style key bindings, fuzzy search, and inline image display (Kitty/Sixel). This page asks what constraints made off-the-shelf libraries insufficient, how the virtual terminal model in terminal.ts avoids screen-flicker, and what stdin-buffer.ts does to handle raw key events. The regression tests expose the edge cases that forced custom code.
- 12. [Three Modes, One AgentSession: What Changes Between Interactive, Print, and RPC?](https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/12-three-modes-one-agentsession-what-changes-between-interactive-print-and-rpc.md) - The coding agent runs in three surface modes: interactive (full TUI), print (stdout-only for scripting), and RPC (JSONL protocol for IDE integration). All three share AgentSession; each adds its own I/O adapter. This page examines rpc-mode.ts and rpc-types.ts to understand the JSONL protocol, contrasts it with interactive-mode.ts component wiring, and asks what the RPC mode reveals about the true API surface of the agent.

## Source File Index

- `.pi/skills/add-llm-provider.md`
- `AGENTS.md`
- `package.json`
- `packages/agent/package.json`
- `packages/agent/src/agent-loop.ts`
- `packages/agent/src/agent.ts`
- `packages/agent/src/types.ts`
- `packages/agent/test/agent-loop.test.ts`
- `packages/agent/test/agent.test.ts`
- `packages/agent/test/harness/compaction.test.ts`
- `packages/ai/package.json`
- `packages/ai/src/api-registry.ts`
- `packages/ai/src/env-api-keys.ts`
- `packages/ai/src/index.ts`
- `packages/ai/src/models.ts`
- `packages/ai/src/oauth.ts`
- `packages/ai/src/providers/amazon-bedrock.ts`
- `packages/ai/src/providers/anthropic.ts`
- `packages/ai/src/providers/faux.ts`
- `packages/ai/src/providers/github-copilot-headers.ts`
- `packages/ai/src/providers/openai-prompt-cache.ts`
- `packages/ai/src/providers/register-builtins.ts`
- `packages/ai/src/providers/transform-messages.ts`
- `packages/ai/src/session-resources.ts`
- `packages/ai/src/stream.ts`
- `packages/ai/src/types.ts`
- `packages/ai/src/utils/event-stream.ts`
- `packages/ai/src/utils/json-parse.ts`
- `packages/ai/src/utils/overflow.ts`
- `packages/coding-agent/package.json`
- `packages/coding-agent/src/core/agent-session-runtime.ts`
- `packages/coding-agent/src/core/agent-session-services.ts`
- `packages/coding-agent/src/core/agent-session.ts`
- `packages/coding-agent/src/core/auth-guidance.ts`
- `packages/coding-agent/src/core/auth-storage.ts`
- `packages/coding-agent/src/core/compaction/branch-summarization.ts`
- `packages/coding-agent/src/core/compaction/compaction.ts`
- `packages/coding-agent/src/core/compaction/utils.ts`
- `packages/coding-agent/src/core/index.ts`
- `packages/coding-agent/src/core/sdk.ts`
- `packages/coding-agent/src/core/session-manager.ts`
- `packages/coding-agent/src/core/tools/bash.ts`
- `packages/coding-agent/src/core/tools/edit.ts`
- `packages/coding-agent/src/core/tools/file-mutation-queue.ts`
- `packages/coding-agent/src/core/tools/output-accumulator.ts`
- `packages/coding-agent/src/core/tools/tool-definition-wrapper.ts`
- `packages/coding-agent/src/main.ts`
- `packages/coding-agent/src/modes/index.ts`
- `packages/coding-agent/src/modes/interactive/interactive-mode.ts`
- `packages/coding-agent/src/modes/print-mode.ts`
- `packages/coding-agent/src/modes/rpc/jsonl.ts`
- `packages/coding-agent/src/modes/rpc/rpc-mode.ts`
- `packages/coding-agent/src/modes/rpc/rpc-types.ts`
- `packages/coding-agent/test/suite/agent-session-compaction.test.ts`
- `packages/coding-agent/test/suite/agent-session-runtime.test.ts`
- `packages/coding-agent/test/tools.test.ts`
- `packages/tui/package.json`
- `packages/tui/src/kill-ring.ts`
- `packages/tui/src/stdin-buffer.ts`
- `packages/tui/src/terminal-image.ts`
- `packages/tui/src/terminal.ts`
- `packages/tui/src/tui.ts`
- `packages/tui/test/tui-render.test.ts`
- `README.md`

---

## 01. The First Question: Why Is a Simple Script Not Enough?

> What problem does pi actually solve, and why does the answer demand four separate packages instead of one? This page traces the repo from its name down to its monorepo shape, asking at every step which assumption would break if the system were simpler. The answer reveals the architectural bet: that provider neutrality, session persistence, and a live extension system are inseparable from any serious coding agent.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/01-the-first-question-why-is-a-simple-script-not-enough.md
- Generated: 2026-05-22T23:28:09.094Z

### Source Files

- `README.md`
- `package.json`
- `packages/coding-agent/src/main.ts`
- `packages/coding-agent/src/core/index.ts`
- `AGENTS.md`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [README.md](README.md)
- [package.json](package.json)
- [packages/coding-agent/package.json](packages/coding-agent/package.json)
- [packages/agent/package.json](packages/agent/package.json)
- [packages/ai/package.json](packages/ai/package.json)
- [packages/tui/package.json](packages/tui/package.json)
- [packages/coding-agent/src/main.ts](packages/coding-agent/src/main.ts)
- [packages/coding-agent/src/core/index.ts](packages/coding-agent/src/core/index.ts)
- [packages/coding-agent/src/core/extensions/types.ts](packages/coding-agent/src/core/extensions/types.ts)
- [packages/coding-agent/src/core/agent-session-services.ts](packages/coding-agent/src/core/agent-session-services.ts)
- [packages/coding-agent/src/core/sdk.ts](packages/coding-agent/src/core/sdk.ts)
- [packages/coding-agent/src/core/session-manager.ts](packages/coding-agent/src/core/session-manager.ts)
- [packages/ai/src/providers/register-builtins.ts](packages/ai/src/providers/register-builtins.ts)
- [AGENTS.md](AGENTS.md)
</details>

# The First Question: Why Is a Simple Script Not Enough?

At first glance, pi looks like a coding assistant you could prototype in an afternoon: pipe a prompt to an LLM, get shell commands back, execute them. So why does this repository contain four separately published npm packages and a monorepo build pipeline rather than a single file or a straightforward CLI wrapper?

The answer is not complexity for its own sake. Each package boundary in this repo encodes a distinct architectural necessity — a place where "make it simpler" would cause something real to break. This page traces those boundaries from the repo's own name down to the event-driven extension system, asking at every step which assumption would collapse if the system were flatter.

---

## What is the simplest version of this tool?

Start with the dumbest possible coding agent: read a prompt, call one LLM API, pipe the output to a shell, print the result. This single-function version is easy to write. It also has immediate, concrete failure modes:

- It is permanently coupled to one model provider. Swapping from OpenAI to Anthropic requires rewriting the network layer.
- It has no memory. Each invocation starts from scratch. Long tasks — refactoring a module, writing and iterating on tests — cannot be split across time.
- It cannot be extended without forking the source. Any feature not baked in at build time is absent entirely.
- It has no interactive feedback loop. The user cannot see streaming output, cannot approve or redirect tool calls in real time, cannot cancel a runaway bash command.

Pi's four-package shape is the answer to each of these four failure modes, one package per concern.

Sources: [README.md:23-56]()

---

## How the monorepo boundary map answers those failure modes

```text
┌─────────────────────────────────────────────────────────────┐
│  What breaks in the simple script     │  Which package fixes it │
├───────────────────────────────────────┼─────────────────────────┤
│  Single-provider coupling             │  @earendil-works/pi-ai  │
│  No persistent memory / session state │  @earendil-works/pi-agent-core │
│  No live extension surface            │  (extension system in   │
│                                       │   pi-coding-agent core) │
│  No interactive terminal rendering    │  @earendil-works/pi-tui │
│  (orchestration + CLI glue)           │  @earendil-works/pi-coding-agent │
└─────────────────────────────────────────────────────────────┘
```

The coding agent `package.json` shows the dependency direction explicitly: the CLI depends on all three libraries (`@earendil-works/pi-agent-core`, `@earendil-works/pi-ai`, and `@earendil-works/pi-tui`), while each library has no upward dependency on the CLI.

Sources: [packages/coding-agent/package.json:42-48]()

---

## Why provider neutrality demands its own package

### The simplest version would hardcode one SDK

A script that imports `@anthropic-ai/sdk` directly and calls `messages.create()` works fine until users want GPT-4o, Gemini, Mistral, or a corporate proxy. At that point the script needs conditional branches, provider-specific token formats, and divergent error handling — all tangled with the agent logic.

### What pi-ai actually does

`@earendil-works/pi-ai` packages eight provider implementations behind a uniform `streamSimple` interface, registered via lazy-loaded modules to avoid paying the import cost of unused SDKs:

```typescript
// packages/ai/src/providers/register-builtins.ts
interface LazyProviderModule<TApi, TOptions, TSimpleOptions> {
  stream: (model: Model<TApi>, context: Context, options?: TOptions) => AsyncIterable<AssistantMessageEvent>;
  streamSimple: (model: Model<TApi>, context: Context, options?: TSimpleOptions) => AsyncIterable<AssistantMessageEvent>;
}
```

The `package.json` for `pi-ai` lists the provider SDKs as direct dependencies (`@anthropic-ai/sdk`, `openai`, `@google/genai`, `@mistralai/mistralai`, `@aws-sdk/client-bedrock-runtime`), all pinned to exact versions. This is the only package that should ever know those SDK shapes exist.

Sources: [packages/ai/src/providers/register-builtins.ts:23-33](), [packages/ai/package.json:69-79]()

### Why separation matters for extension authors

Extensions can register entirely new providers at runtime without touching the core:

```typescript
// From ExtensionAPI in types.ts
pi.registerProvider("my-proxy", {
  baseUrl: "https://proxy.example.com",
  apiKey: "PROXY_API_KEY",
  api: "anthropic-messages",
  models: [{ id: "claude-sonnet-4-20250514", ... }]
});
```

If the provider abstraction were embedded in the CLI, extensions could not safely add models. The `pi-ai` package boundary is what makes BYOK/BYOC (bring-your-own-key / bring-your-own-credentials) possible at runtime, not just at build time.

Sources: [packages/coding-agent/src/core/extensions/types.ts:1254-1292]()

---

## Why session persistence demands its own package

### What "no memory" actually costs

Without session persistence, every resumed conversation starts from token zero. The user's previous error messages, the agent's previous tool outputs, partial file edits — all gone. For toy tasks this is acceptable. For multi-step coding tasks that span hours (refactoring, debugging, writing a test suite), it is a show-stopper.

### What pi-agent-core actually stores

`@earendil-works/pi-agent-core` is described as a "general-purpose agent with transport abstraction, state management, and attachment support." The session file format it defines (`SessionManager`) stores messages as an append-only JSONL tree, versioned (`CURRENT_SESSION_VERSION = 3`), with typed entries for messages, model changes, compaction summaries, branch summaries, and arbitrary extension-defined custom entries:

```typescript
// packages/coding-agent/src/core/session-manager.ts (excerpt)
export interface SessionHeader {
  type: "session";
  version?: number;
  id: string;
  timestamp: string;
  cwd: string;
  parentSession?: string;
}
```

The `cwd` field in the session header is architectural: sessions are bound to the working directory they were created in. When you resume a session from a different project, `main.ts` detects the mismatch via `getMissingSessionCwdIssue()` and prompts you to fork rather than silently corrupt the context.

Session branching (fork, tree navigation, compaction) is managed entirely within this layer, invisible to the LLM layer and the UI layer.

Sources: [packages/coding-agent/src/core/session-manager.ts:1-37](), [packages/coding-agent/src/main.ts:507-519]()

### Why it's a separate published package

`pi-agent-core` is `"description": "General-purpose agent with transport abstraction, state management, and attachment support"`. It is published independently precisely so other tools (e.g., [earendil-works/pi-chat](https://github.com/earendil-works/pi-chat)) can reuse the agent runtime without inheriting the interactive TUI or the coding-specific tools.

Sources: [packages/agent/package.json:2-3](), [README.md:57]()

---

## Why a live extension system is inseparable from a serious agent

### The question the simple script cannot answer

What happens when users want: a Jira tool, a custom compaction strategy, a vim-keybindings editor, a corporate OAuth provider, or a status bar widget showing open PRs? In a simple script, each of these requires a fork or a config flag. There is no principled answer.

### How the extension API is structured

The extension system in `pi-coding-agent` is a first-class event bus, not a plugin loader bolted on afterward. Extensions receive a typed `ExtensionAPI` object at load time and can:

- Subscribe to the full agent lifecycle (`session_start`, `before_agent_start`, `turn_start`, `turn_end`, `tool_call`, `tool_result`, `agent_end`, and more)
- Register LLM-callable tools with TypeBox parameter schemas, custom renderers, and per-tool execution modes
- Register slash commands, keyboard shortcuts, and CLI flags
- Register or override LLM providers at runtime via `registerProvider`
- Mutate tool inputs in-flight by modifying `event.input` in place within a `tool_call` handler

```typescript
// packages/coding-agent/src/core/extensions/types.ts:1084-1135 (ExtensionAPI interface)
export interface ExtensionAPI {
  on(event: "session_start", handler: ExtensionHandler<SessionStartEvent>): void;
  on(event: "tool_call", handler: ExtensionHandler<ToolCallEvent, ToolCallEventResult>): void;
  registerTool<TParams extends TSchema>(tool: ToolDefinition<TParams>): void;
  registerProvider(name: string, config: ProviderConfig): void;
  // ...
}
```

This is not a narrow plugin point — it is the full observability surface of a running agent turn. Extensions can cancel operations (`block: true` in `ToolCallEventResult`), replace system prompts, inject messages before turns, and drive UI components.

Sources: [packages/coding-agent/src/core/extensions/types.ts:1084-1140](), [packages/coding-agent/src/core/extensions/types.ts:820-826]()

### Why "reload without restart" matters

Extensions can be reloaded at runtime via `ctx.reload()` from a command handler. The `ExtensionRuntimeState.invalidate()` method marks stale extension instances so their pending handlers throw rather than silently operating on old state. This means the extension system must track lifecycle epochs, not just loaded modules — a complexity that belongs in a dedicated subsystem, not scattered through the CLI entry point.

Sources: [packages/coding-agent/src/core/extensions/types.ts:1451-1466]()

---

## Why the TUI is its own package

### Differential rendering is not trivial

A simple `console.log` output model breaks the moment you want streaming token output, inline diffs, expandable tool call results, and a persistent input editor in the same terminal window simultaneously. These require a retained component tree and differential rendering — knowing which terminal cells changed and repainting only those.

`@earendil-works/pi-tui` is described as a "Terminal User Interface library with differential rendering for efficient text-based applications." It is the rendering substrate for the interactive mode and is imported directly into `main.ts` for the session-picker and the missing-session-cwd prompt, before the full agent session is even created.

```typescript
// packages/coding-agent/src/main.ts:395-416
const ui = new TUI(new ProcessTerminal(), settingsManager.getShowHardwareCursor());
ui.setClearOnShrink(settingsManager.getClearOnShrink());
const selector = new ExtensionSelectorComponent(
  formatMissingSessionCwdPrompt(issue),
  ["Continue", "Cancel"],
  (option) => finish(option === "Continue" ? issue.fallbackCwd : undefined),
  ...
);
```

Extensions use `ExtensionUIContext` to call TUI primitives directly — overlays, footers, headers, custom editor components — without depending on the coding agent's internal rendering logic. The TUI package boundary is what allows extension authors to build rich UI without coupling to the CLI internals.

Sources: [packages/tui/package.json:2-11](), [packages/coding-agent/src/main.ts:395-416](), [packages/coding-agent/src/core/extensions/types.ts:124-275]()

---

## The four-mode run model: why flat CLI cannot host all of them

`main.ts` resolves one of four `AppMode` values before any agent work begins: `interactive`, `print`, `json`, or `rpc`. Each mode requires a different output contract:

| Mode | How it runs | What it needs |
|---|---|---|
| `interactive` | Full TUI, streaming, user input | `pi-tui` + full extension surface |
| `print` | Non-interactive, stdout text | No TUI, piped stdin handled |
| `json` | Non-interactive, structured JSON | Same as print, different serialization |
| `rpc` | JSON-RPC over stdin/stdout | Stdin consumed for protocol, not user input |

The `ExtensionUIContext` interface has a mode-specific implementation per `AppMode`. This means the extension system does not know which mode it is running in — it calls `ctx.ui.select()` and gets the right behavior. A simple script cannot offer this abstraction: it either hardcodes terminal output or hardcodes JSON output.

Sources: [packages/coding-agent/src/main.ts:97-113](), [packages/coding-agent/src/core/extensions/types.ts:124-275]()

---

## The architectural bet, stated plainly

The repo's shape encodes a single wager: that provider neutrality (pi-ai), session persistence (pi-agent-core), live extensibility (extension system in coding-agent core), and an interactive rendering layer (pi-tui) are not independent features that can be added incrementally to a simple script. Each one requires the others to be correct. An extension that registers a new LLM provider must be able to persist that provider choice into the session file. A session resumed from disk must render its history correctly in the TUI. A tool call blocked by an extension must still emit the right JSON in RPC mode.

The packages are separated not to enforce organizational hygiene but to make each concern independently testable, independently publishable, and independently reusable — as proven by `pi-chat` reusing `pi-agent-core` without the coding-specific tools.

Sources: [README.md:22-57](), [packages/agent/package.json:2-7]()

---

## Summary

The gap between "a script that calls an LLM" and "a serious coding agent" is exactly the gap between four missing properties: provider neutrality, durable session state, a live extension surface, and a rendering layer that survives streaming. The `earendil-works/pi` monorepo does not split into four packages as an abstraction exercise — it splits because each property requires its own isolation boundary to be testable, reusable, and safe to evolve without breaking the others. `pi-ai` owns the provider contract; `pi-agent-core` owns the session and agent loop; the extension system in `pi-coding-agent/core` owns the live plugin surface; and `pi-tui` owns the terminal rendering model. Remove any one of them and the architectural bet collapses back into a well-dressed script. Sources: [packages/coding-agent/src/core/agent-session-services.ts:67-75]()

---

## 02. Why Four Packages? Where Does Each Layer End?

> The repo ships four independently publishable npm packages: pi-ai, pi-agent-core, pi-tui, and pi-coding-agent. Each boundary is a deliberate seam. This page asks: which concerns forced each split, what can cross each boundary, and what would break if two packages were merged? Reading package.json exports, tsconfig.build.json, and the import graph answers these questions concretely.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/02-why-four-packages-where-does-each-layer-end.md
- Generated: 2026-05-22T23:31:01.202Z

### Source Files

- `packages/ai/package.json`
- `packages/agent/package.json`
- `packages/tui/package.json`
- `packages/coding-agent/package.json`
- `packages/coding-agent/src/core/sdk.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/ai/package.json](packages/ai/package.json)
- [packages/agent/package.json](packages/agent/package.json)
- [packages/tui/package.json](packages/tui/package.json)
- [packages/coding-agent/package.json](packages/coding-agent/package.json)
- [packages/coding-agent/src/core/sdk.ts](packages/coding-agent/src/core/sdk.ts)
- [packages/ai/src/index.ts](packages/ai/src/index.ts)
- [packages/ai/src/types.ts](packages/ai/src/types.ts)
- [packages/ai/src/stream.ts](packages/ai/src/stream.ts)
- [packages/agent/src/index.ts](packages/agent/src/index.ts)
- [packages/agent/src/agent.ts](packages/agent/src/agent.ts)
- [packages/agent/src/types.ts](packages/agent/src/types.ts)
- [packages/agent/src/node.ts](packages/agent/src/node.ts)
- [packages/agent/src/harness/types.ts](packages/agent/src/harness/types.ts)
- [packages/tui/src/tui.ts](packages/tui/src/tui.ts)
- [packages/coding-agent/src/modes/interactive/interactive-mode.ts](packages/coding-agent/src/modes/interactive/interactive-mode.ts)
</details>

# Why Four Packages? Where Does Each Layer End?

This repository ships four independently publishable npm packages: `@earendil-works/pi-ai`, `@earendil-works/pi-agent-core`, `@earendil-works/pi-tui`, and `@earendil-works/pi-coding-agent`. Each split is a boundary between concerns that cannot be safely merged without forcing unwanted coupling on downstream consumers. This page traces the dependency graph from the bottom up, names the concern each package owns exclusively, and asks what would break if any two packages were folded together.

The question is not "what does each package do?" but rather "what does each package _not_ see?"—because the constraint each package avoids owning is precisely what forces the split.

---

## The Dependency Lattice

Before examining any individual layer, map what depends on what:

```text
@earendil-works/pi-coding-agent
  └── @earendil-works/pi-agent-core
  └── @earendil-works/pi-ai
  └── @earendil-works/pi-tui

@earendil-works/pi-agent-core
  └── @earendil-works/pi-ai

@earendil-works/pi-tui
  (no pi-* deps)

@earendil-works/pi-ai
  (no pi-* deps)
```

Two packages sit at the base and share no cross-dependency: `pi-ai` and `pi-tui`. They are orthogonal concerns — LLM I/O vs. terminal rendering. `pi-agent-core` uses `pi-ai` but is unaware of `pi-tui`. Only `pi-coding-agent` pulls all three together.

Sources: [packages/agent/package.json:32](), [packages/coding-agent/package.json:42-44]()

---

## Layer 1 — `pi-ai`: The Provider Abstraction

### What concern owns this layer?

`pi-ai` answers a single question: *given a `Model` descriptor and a `Context`, how do you get an `AssistantMessageEventStream` back?* Nothing about agents, nothing about terminals.

The package exports a registry of concrete provider adapters (Anthropic, OpenAI, Google, Bedrock, Mistral, Azure, etc.), a generated model catalogue, OAuth helpers, and the two primary streaming entry-points: `stream()` and `streamSimple()`. Every provider resolves through `getApiProvider(model.api)`, keeping the call-site provider-neutral.

```ts
// packages/ai/src/stream.ts:43-50
export function streamSimple<TApi extends Api>(
  model: Model<TApi>,
  context: Context,
  options?: SimpleStreamOptions,
): AssistantMessageEventStream {
  const provider = resolveApiProvider(model.api);
  return provider.streamSimple(model, context, options);
}
```

The `exports` field in `package.json` exposes each provider as a separate sub-path (`./anthropic`, `./google`, `./bedrock-provider`, etc.), so consumers can tree-shake provider bundles they don't need.

Sources: [packages/ai/src/stream.ts:43-50](), [packages/ai/package.json:8-52]()

### What would break if `pi-ai` were merged into `pi-agent-core`?

Any consumer wanting only a lightweight LLM client — a one-off completion script, an image-generation wrapper, a test harness — would be forced to take the agent loop, session storage, compaction logic, and skill-loading machinery. The reverse is also true: merging would make it impossible to swap or publish a fresh provider adapter independently.

---

## Layer 2 — `pi-tui`: The Terminal Rendering Layer

### What concern owns this layer?

`pi-tui` owns everything the terminal sees: differential rendering, keyboard input, ANSI escape sequences, Kitty protocol image display, Unicode/East-Asian width accounting, the editor component, autocomplete, keybindings, and a minimal Markdown renderer. Its `package.json` lists **zero** `pi-*` dependencies; it imports nothing from the LLM stack.

```ts
// packages/tui/src/tui.ts:39-56
export interface Component {
  render(width: number): string[];
  handleInput?(data: string): void;
  wantsKeyRelease?: boolean;
  // ...
}
```

The `Component` interface is the entire contract: a component renders to an array of strings and optionally handles input. The TUI drives differential updates from those arrays without knowing whether the content came from an LLM, a static string, or a file diff.

Sources: [packages/tui/src/tui.ts:39-56](), [packages/tui/package.json:38-41]()

### What would break if `pi-tui` were merged into `pi-coding-agent`?

`pi-tui` is independently publishable and usable as a general terminal-UI library. Merging it into the coding agent would make it impossible to use the TUI in any project that doesn't also want the full agent stack. Because `pi-agent-core` deliberately avoids importing `pi-tui`, there would also be a circular-concern problem: the agent harness would suddenly carry terminal rendering machinery that it never exercises in non-interactive or RPC modes.

---

## Layer 3 — `pi-agent-core`: The Agent Loop and Harness

### What concern owns this layer?

`pi-agent-core` owns the LLM agent loop and the infrastructure required to run it reliably: message state, tool dispatch, context compaction, session persistence (JSONL and in-memory repos), skill loading, system-prompt assembly, and transport negotiation. It depends on `pi-ai` (for `Model`, `streamSimple`, message types, `ThinkingLevel`, `Transport`) but knows nothing about terminals or the coding-specific tool set.

```ts
// packages/agent/src/types.ts:24-26
export type StreamFn = (
  ...args: Parameters<typeof streamSimple>
) => ReturnType<typeof streamSimple> | Promise<ReturnType<typeof streamSimple>>;
```

`StreamFn` is the only connection the agent loop has to actual provider I/O. The caller (the coding agent's `sdk.ts`) wires in the real `streamSimple` call along with auth resolution and attribution headers, but the loop itself is provider-agnostic.

The `./node` sub-path export adds `NodeExecutionEnv` for callers that need Node.js-specific environment integration without bundling it unconditionally.

Sources: [packages/agent/src/types.ts:24-26](), [packages/agent/src/node.ts:1-2](), [packages/agent/package.json:13-17]()

### What would break if `pi-agent-core` were merged into `pi-coding-agent`?

`pi-agent-core`'s harness — compaction, session repos, skill system, prompt templates — is generic enough to power agents beyond the coding use-case. Embedding it inside the coding agent would mean any embedding application would have to import `cross-spawn`, `diff`, `glob`, `highlight.js`, WASM image processing, and the full TUI just to run an agent loop. The library surface would also become impossible to test in isolation, because the harness tests (`test:harness`) currently run against `pi-agent-core` alone without the heavy coding-agent dependencies.

---

## Layer 4 — `pi-coding-agent`: The Composition Layer

### What concern owns this layer?

`pi-coding-agent` is the only package that sees all three lower layers simultaneously. It owns:

- **Coding tools** (`read`, `bash`, `edit`, `write`, `grep`, `find`, `ls`) — file system operations specific to a coding workflow
- **Session configuration** — auth storage, model registry, settings manager, resource loader, extension runner
- **Rendering** — interactive TUI mode (implements `Component` from `pi-tui`), print mode, and RPC mode
- **CLI entry point** (`dist/cli.js` via `bin.pi`)
- **`createAgentSession()`** — the top-level factory that wires all three lower layers into a running session

The `sdk.ts` file is the architectural join-point. It imports `Agent` from `pi-agent-core`, `streamSimple` and `Model` from `pi-ai`, and passes neither directly to `pi-tui`. The interactive mode (`interactive-mode.ts`) then imports from all three simultaneously to wire together rendering, agent events, and LLM output.

```ts
// packages/coding-agent/src/core/sdk.ts:2-3
import { Agent, type AgentMessage, type ThinkingLevel } from "@earendil-works/pi-agent-core";
import { clampThinkingLevel, type Message, type Model, streamSimple } from "@earendil-works/pi-ai";
```

```ts
// packages/coding-agent/src/modes/interactive/interactive-mode.ts:10-49
import type { AgentMessage } from "@earendil-works/pi-agent-core";
import { getProviders, type Model, type OAuthProviderId, ... } from "@earendil-works/pi-ai";
import type { AutocompleteItem, EditorComponent, ... } from "@earendil-works/pi-tui";
import { TUI, Container, Markdown, Text, ... } from "@earendil-works/pi-tui";
```

Sources: [packages/coding-agent/src/core/sdk.ts:2-3](), [packages/coding-agent/src/modes/interactive/interactive-mode.ts:10-49]()

---

## Boundary Summary

```text
┌─────────────────────────────────────────────────────┐
│  pi-coding-agent                                    │
│  • CLI entry point (bin: pi)                        │
│  • Coding tools (read/bash/edit/write/grep/…)       │
│  • Interactive / print / RPC modes                  │
│  • createAgentSession() factory                     │
│  deps: pi-ai + pi-agent-core + pi-tui               │
└───────────┬───────────┬──────────────┬──────────────┘
            │           │              │
   ┌────────▼───┐  ┌────▼──────────┐  │
   │ pi-ai      │  │ pi-agent-core │  │
   │ • Models   │  │ • Agent loop  │  │
   │ • Providers│◄─┤ • Compaction  │  │
   │ • OAuth    │  │ • Sessions    │  │
   │ • stream() │  │ • Skills      │  │
   └────────────┘  └───────────────┘  │
                                       │
                       ┌───────────────▼──┐
                       │ pi-tui            │
                       │ • Terminal render │
                       │ • Diff render     │
                       │ • Editor / keys   │
                       │ (no pi-* deps)    │
                       └──────────────────┘
```

| Package | Owned concern | Forbidden dependency |
|---|---|---|
| `pi-ai` | Provider abstraction, streaming, model catalogue | Nothing from the agent or TUI |
| `pi-tui` | Terminal rendering, keyboard, Markdown display | Nothing from the LLM or agent layer |
| `pi-agent-core` | Agent loop, state, tools interface, session, compaction | `pi-tui` (rendering is caller's concern) |
| `pi-coding-agent` | Coding tools, modes, CLI, session wiring | — (is the composition root) |

---

## What the Split Enables

Each boundary is a publish-time contract. A consumer embedding only an LLM call takes `pi-ai` alone. A project building a terminal dashboard takes `pi-tui` alone. A project building a custom agent (not a coding agent) takes `pi-agent-core` + `pi-ai` and supplies its own tool set and renderer. Only the `pi` CLI binary needs all four.

The `build:binary` script in `pi-coding-agent` makes this concrete: it builds the three dependency packages in order before compiling the coding agent, precisely because none of the lower packages knows about the one above it.

Sources: [packages/coding-agent/package.json:34]()

---

## 03. BYOK / BYOC: What Does Provider Neutrality Actually Cost?

> Pi promises that users bring their own keys and providers. This page probes what that claim requires in practice: how env-api-keys.ts discovers credentials, how the OAuth flow is handled, how the add-llm-provider skill teaches the agent to register new providers at runtime, and what invariants must hold for every provider adapter. The skill file .pi/skills/add-llm-provider.md is the primary non-README evidence.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/03-byok-byoc-what-does-provider-neutrality-actually-cost.md
- Generated: 2026-05-22T23:28:37.306Z

### Source Files

- `packages/ai/src/env-api-keys.ts`
- `packages/ai/src/oauth.ts`
- `.pi/skills/add-llm-provider.md`
- `packages/coding-agent/src/core/auth-storage.ts`
- `packages/coding-agent/src/core/auth-guidance.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/ai/src/env-api-keys.ts](packages/ai/src/env-api-keys.ts)
- [packages/ai/src/oauth.ts](packages/ai/src/oauth.ts)
- [packages/ai/src/utils/oauth/index.ts](packages/ai/src/utils/oauth/index.ts)
- [packages/ai/src/utils/oauth/types.ts](packages/ai/src/utils/oauth/types.ts)
- [packages/ai/src/api-registry.ts](packages/ai/src/api-registry.ts)
- [packages/ai/src/providers/register-builtins.ts](packages/ai/src/providers/register-builtins.ts)
- [packages/coding-agent/src/core/auth-storage.ts](packages/coding-agent/src/core/auth-storage.ts)
- [packages/coding-agent/src/core/auth-guidance.ts](packages/coding-agent/src/core/auth-guidance.ts)
- [packages/coding-agent/src/core/resolve-config-value.ts](packages/coding-agent/src/core/resolve-config-value.ts)
- [.pi/skills/add-llm-provider.md](.pi/skills/add-llm-provider.md)
</details>

# BYOK / BYOC: What Does Provider Neutrality Actually Cost?

Pi makes a strong architectural promise: you bring your own keys and your own provider. No vendor lock-in, no proxied credentials, no forced model choice. But a claim like that only holds if every part of the stack — credential discovery, token lifecycle, provider dispatch, and the extension contract — actually enforces it uniformly. This page probes what provider neutrality costs in practice: the exact credential-resolution chain, how OAuth fits into it, what the `add-llm-provider` skill requires of any new adapter, and which invariants must hold at each boundary.

This matters because BYOK/BYOC is not a UI toggle; it is an engineering commitment that runs from a handful of environment-variable lookups all the way to a locked JSON file on disk. Understanding each layer reveals where the system is genuinely open and where hidden constraints encode provider-specific assumptions.

---

## What is the simplest version of BYOK?

The floor-level design is just `process.env`. Every API-key provider has one canonical environment variable (e.g. `OPENAI_API_KEY`, `GEMINI_API_KEY`), and if that variable is set the system uses it. No login flow, no stored file, no agent aware of it.

`env-api-keys.ts` encodes this mapping as a plain string dictionary:

```ts
// packages/ai/src/env-api-keys.ts:101-131
const envMap: Record<string, string> = {
  openai: "OPENAI_API_KEY",
  "azure-openai-responses": "AZURE_OPENAI_API_KEY",
  deepseek: "DEEPSEEK_API_KEY",
  google: "GEMINI_API_KEY",
  "google-vertex": "GOOGLE_CLOUD_API_KEY",
  groq: "GROQ_API_KEY",
  // ...26 providers total
};
```

`findEnvKeys(provider)` returns the variable names that are actually populated; `getEnvApiKey(provider)` returns the first live value. Both are pure reads — no side effects, no caching between calls for standard providers.

Sources: [packages/ai/src/env-api-keys.ts:91-151]()

---

## Where does the simple version break down?

Two categories of providers cannot be reduced to a single environment variable:

**1. Ambient credential providers (Vertex AI, Amazon Bedrock)**

Neither expects a bare API key. Vertex requires `GOOGLE_APPLICATION_CREDENTIALS` (or the `~/.config/gcloud/application_default_credentials.json` fallback) *plus* `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION`. Bedrock accepts six distinct AWS credential forms (profile, IAM keys, bearer token, ECS task roles, IRSA). Both return the sentinel string `"<authenticated>"` when all preconditions are met, signalling "auth exists" without exposing a real secret.

```ts
// packages/ai/src/env-api-keys.ts:167-207
if (provider === "google-vertex") {
  const hasCredentials = hasVertexAdcCredentials();
  const hasProject = !!( process.env.GOOGLE_CLOUD_PROJECT || ... );
  const hasLocation = !!( process.env.GOOGLE_CLOUD_LOCATION || ... );
  if (hasCredentials && hasProject && hasLocation) {
    return "<authenticated>";
  }
}
```

**2. OAuth providers (Anthropic, GitHub Copilot, OpenAI Codex)**

These cannot be satisfied by any environment variable. They require an interactive login that writes `refresh`/`access`/`expires` credentials to `auth.json`. The OAuth credential surface is materially different from a static API key.

Sources: [packages/ai/src/env-api-keys.ts:61-210]()

---

## The five-source credential priority chain

`AuthStorage.getApiKey()` in `auth-storage.ts` defines the full resolution order. Reading it from top to bottom is the only way to understand which credential wins when multiple sources are present:

```
1. Runtime override     --api-key CLI flag (in-memory, not persisted)
2. Stored api_key       auth.json, type: "api_key"
3. Stored oauth         auth.json, type: "oauth" (auto-refreshed with file lock)
4. Environment variable process.env / /proc/self/environ fallback
5. Fallback resolver    models.json custom providers, injected by setFallbackResolver()
```

```ts
// packages/coding-agent/src/core/auth-storage.ts:462-522
async getApiKey(providerId: string, options?: { includeFallback?: boolean }): Promise<string | undefined> {
  const runtimeKey = this.runtimeOverrides.get(providerId);
  if (runtimeKey) return runtimeKey;                    // 1

  const cred = this.data[providerId];
  if (cred?.type === "api_key") return resolveConfigValue(cred.key);  // 2

  if (cred?.type === "oauth") { /* refresh logic */ }   // 3

  const envKey = getEnvApiKey(providerId);
  if (envKey) return envKey;                            // 4

  return this.fallbackResolver?.(providerId) ?? undefined;  // 5
}
```

The `source` field on `AuthStatus` mirrors this: `"stored"`, `"runtime"`, `"environment"`, `"fallback"`, or `"models_json_key"` / `"models_json_command"`.

Sources: [packages/coding-agent/src/core/auth-storage.ts:349-368, 462-522]()

---

## Secret resolution beyond raw strings

Step 2 above passes `cred.key` through `resolveConfigValue()` before returning it. This is non-obvious: an API key stored in `auth.json` is not necessarily a literal string. `resolveConfigValue` supports three forms:

| Value prefix | Resolution |
|---|---|
| `!` | Execute the rest as a shell command, return trimmed stdout (cached per process) |
| `$VAR` (no prefix) | Check `process.env[config]` first, fall back to literal |
| Anything else | Literal string |

This means `auth.json` can store `"!op read op://vault/key"` or `"MY_CUSTOM_ENV_VAR"` as a key, and the agent will resolve them at call time. The same logic applies to custom HTTP headers via `resolveHeaders()`.

Sources: [packages/coding-agent/src/core/resolve-config-value.ts:17-23]()

---

## The OAuth flow: a deeper contract

Three providers (Anthropic, GitHub Copilot, OpenAI Codex) go through an OAuth dance rather than a bare API key exchange. The flow is defined by the `OAuthProviderInterface`:

```ts
// packages/ai/src/utils/oauth/types.ts:54-72
export interface OAuthProviderInterface {
  readonly id: OAuthProviderId;
  readonly name: string;

  login(callbacks: OAuthLoginCallbacks): Promise<OAuthCredentials>;

  refreshToken(credentials: OAuthCredentials): Promise<OAuthCredentials>;

  getApiKey(credentials: OAuthCredentials): string;

  modifyModels?(models: Model<Api>[], credentials: OAuthCredentials): Model<Api>[];
}
```

The `OAuthLoginCallbacks` interface is the boundary between the auth library and the CLI/UI layer. It provides hooks for browser-redirect (`onAuth`), device-code display (`onDeviceCode`), text prompts (`onPrompt`), and interactive selectors (`onSelect`). The underlying transport — PKCE, device-code flow — is an implementation detail of each provider module.

After login, credentials are stored in `auth.json` as `{ type: "oauth", refresh, access, expires, ...provider-specific }`. When the access token expires, `AuthStorage.refreshOAuthTokenWithLock()` re-enters the file with a `proper-lockfile` advisory lock to prevent race conditions when multiple Pi processes run concurrently:

```ts
// packages/coding-agent/src/core/auth-storage.ts:407-450
private async refreshOAuthTokenWithLock(providerId) {
  return await this.storage.withLockAsync(async (current) => {
    // re-read file under lock
    // if already refreshed by another process, use those creds
    // otherwise call provider.refreshToken(), persist, return
  });
}
```

Sources: [packages/ai/src/utils/oauth/types.ts:43-72](), [packages/coding-agent/src/core/auth-storage.ts:407-450]()

---

## The provider dispatch layer: `Api` vs `Provider`

A key design tension: credentials are keyed by `provider` string, but the stream dispatch is keyed by `api` string. A single `provider` (e.g. `"openai"`) may map to multiple `api` protocols (`"openai-completions"`, `"openai-responses"`). The `api-registry.ts` stores a map from `Api` to stream functions:

```ts
// packages/ai/src/api-registry.ts:40
const apiProviderRegistry = new Map<string, RegisteredApiProvider>();
```

Registration happens at module load time in `register-builtins.ts`:

```ts
// packages/ai/src/providers/register-builtins.ts:406
registerBuiltInApiProviders();
```

Every registration wraps the stream function with a type guard that throws on `api` mismatch, so a Gemini model cannot accidentally be dispatched through the Anthropic adapter. Provider modules are loaded lazily via dynamic `import()` to keep the browser/Vite bundle clean and avoid loading AWS SDKs in a browser context:

```ts
// packages/ai/src/providers/register-builtins.ts:89-92
const importNodeOnlyProvider = (specifier: string): Promise<unknown> => {
  const runtimeSpecifier = import.meta.url.endsWith(".js") ? specifier.replace(/\.ts$/, ".js") : specifier;
  return import(runtimeSpecifier);
};
```

Sources: [packages/ai/src/api-registry.ts:40-98](), [packages/ai/src/providers/register-builtins.ts:89-92, 345-406]()

---

## Architecture: how the layers compose

```text
┌─────────────────────────────────────────────────────────────┐
│  CLI / UI layer                                             │
│  --api-key, /login, /model                                  │
└──────────────────┬──────────────────────────────────────────┘
                   │  AuthStorage.getApiKey(providerId)
┌──────────────────▼──────────────────────────────────────────┐
│  auth-storage.ts  (coding-agent)                            │
│  Priority chain: runtime → stored api_key → stored oauth    │
│                → env var → fallback resolver                 │
│  auth.json  ← FileAuthStorageBackend (proper-lockfile)      │
└──────────────────┬──────────────────────────────────────────┘
                   │  getEnvApiKey / findEnvKeys
┌──────────────────▼──────────────────────────────────────────┐
│  env-api-keys.ts  (packages/ai)                             │
│  envMap[provider] → env var name                            │
│  Vertex ADC: file-based credential detection                │
│  Bedrock: six AWS credential forms                          │
└──────────────────┬──────────────────────────────────────────┘
                   │  resolveConfigValue(key)
┌──────────────────▼──────────────────────────────────────────┐
│  resolve-config-value.ts  (coding-agent)                    │
│  "!cmd" → shell exec (cached)                               │
│  "VAR" → process.env first, then literal                    │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  OAuth layer  (packages/ai/src/utils/oauth/)                │
│  OAuthProviderInterface: login / refreshToken / getApiKey   │
│  Built-ins: anthropic, github-copilot, openai-codex         │
│  registerOAuthProvider() / unregisterOAuthProvider()        │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  API dispatch layer  (packages/ai/src/)                     │
│  api-registry: Map<Api, {stream, streamSimple}>             │
│  register-builtins: lazy dynamic import per provider        │
│  Keyed by Api protocol, not by provider name                │
└─────────────────────────────────────────────────────────────┘
```

---

## What the `add-llm-provider` skill requires: the real cost

The `.pi/skills/add-llm-provider.md` skill file is the clearest statement of what provider neutrality actually demands from each new adapter. It specifies seven distinct touch-points across the codebase, not one or two:

| Step | File(s) | What must be done |
|---|---|---|
| 1 | `packages/ai/src/types.ts` | Add to `Api` union, create options interface, add to `ApiOptionsMap`, add to `KnownProvider` |
| 2 | `packages/ai/src/providers/<name>.ts` | Implement `stream()`, `streamSimple()`, message/tool conversion, standardized event emission |
| 3 | `packages/ai/package.json`, `src/index.ts`, `register-builtins.ts`, `env-api-keys.ts` | Subpath export, lazy registration, credential detection |
| 4 | `packages/ai/scripts/generate-models.ts` | Fetch/parse models, map to `Model` interface |
| 5 | `packages/ai/test/` | Full test matrix: stream, tokens, abort, empty, context overflow, unicode, tool calls, cross-provider handoff |
| 6 | `packages/coding-agent/` | Default model ID, display name, CLI arg docs, README, providers.md |
| 7 | `packages/ai/README.md`, `CHANGELOG.md` | Provider table entry, auth docs, env vars |

The test requirement in step 5 is deliberately exhaustive: the `cross-provider-handoff.test.ts` requirement means every new provider must prove it can receive a conversation that started with a different provider. This is the mechanical guarantee that the "bring your own" claim holds at the protocol level, not just the auth level.

Sources: [.pi/skills/add-llm-provider.md:1-58]()

---

## Invariants every adapter must uphold

Synthesizing what the code enforces:

1. **Standardized event stream.** Every `stream()` implementation must emit typed events: `text`, `tool_call`, `thinking`, `usage`, `stop`, `error`. The registry dispatches blindly; callers never know the underlying SDK.

2. **`api` key identity.** A provider module is registered under exactly one `api` string and dispatched only for models whose `model.api` matches. Mismatches throw at the `wrapStream` boundary.

3. **Credential isolation.** Auth is keyed by `provider` string; stream dispatch is keyed by `api` string. A provider can share an `api` protocol with another (e.g., multiple providers using `"openai-completions"`) without credential bleed, because `getApiKey` receives the `provider` string, not the `api` string.

4. **No top-level Node imports in `env-api-keys.ts`.** The comment at line 1 is explicit: `// NEVER convert to top-level imports - breaks browser/Vite builds`. This makes the credential-detection layer safe in browser bundles.

5. **Locked writes for OAuth refresh.** Any provider using OAuth must tolerate concurrent process refresh attempts. The `proper-lockfile` advisory lock in `FileAuthStorageBackend.withLockAsync` serializes these; a provider that bypasses `AuthStorage` and writes `auth.json` directly would break this invariant.

Sources: [packages/ai/src/env-api-keys.ts:1-4](), [packages/ai/src/api-registry.ts:42-78](), [packages/coding-agent/src/core/auth-storage.ts:122-170]()

---

## What is closed: the static `KnownProvider` type

One place where BYOK/BYOC has a real seam: `KnownProvider` in `types.ts` is a static TypeScript union. Adding a new provider means a code change and rebuild; there is no dynamic registration path that allows a fully external adapter to be loaded without touching the core package. The `Provider` type is `KnownProvider | string`, so the credential and dispatch layers will accept arbitrary strings at runtime, but the type system treats unknown providers as opaque. The `add-llm-provider` skill exists precisely because there is no plugin manifest that bypasses the type union — extensibility is a code contribution, not a configuration contribution.

The fallback resolver (`setFallbackResolver`) and `models.json` provide a partial escape hatch for custom API endpoints that share an existing `api` protocol (e.g., an OpenAI-compatible endpoint at a different base URL), but they do not add new protocol implementations.

Sources: [.pi/skills/add-llm-provider.md:10-15](), [packages/coding-agent/src/core/auth-storage.ts:238-244]()

---

## Summary

Provider neutrality in Pi is real but structural: it holds because each layer — env-var discovery, OAuth token lifecycle, file-locked credential storage, lazy API dispatch, and a mandatory test matrix — enforces the same interface regardless of which provider sits behind it. The actual cost is the seven-step checklist in `.pi/skills/add-llm-provider.md`, where the heaviest line items are the standardized event stream, the `KnownProvider`/`Api` type additions, and the cross-provider-handoff test. The system's openness has one clear limit: there is no runtime plugin protocol; new providers require source changes to `packages/ai`. Everything else — auth sources, credential resolution (including shell commands via `!`-prefix), OAuth provider registration, and API dispatch — is genuinely pluggable at runtime via the registries in `api-registry.ts` and `oauth/index.ts`.

Sources: [packages/ai/src/utils/oauth/index.ts:35-89]()

---

## 04. The Registry: How Does a New Provider Become Callable?

> pi-ai uses a runtime registry pattern: providers are registered by API string key, and the agent loop calls them through that registry without knowing which concrete module it will hit. This page traces registerApiProvider, registerBuiltInApiProviders, and the lazy-load pattern in register-builtins.ts that defers heavy provider modules until first use—exposing the tradeoff between startup time and call overhead.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/04-the-registry-how-does-a-new-provider-become-callable.md
- Generated: 2026-05-22T23:28:29.560Z

### Source Files

- `packages/ai/src/api-registry.ts`
- `packages/ai/src/providers/register-builtins.ts`
- `packages/ai/src/types.ts`
- `packages/ai/src/models.ts`
- `packages/ai/src/index.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/ai/src/api-registry.ts](packages/ai/src/api-registry.ts)
- [packages/ai/src/providers/register-builtins.ts](packages/ai/src/providers/register-builtins.ts)
- [packages/ai/src/types.ts](packages/ai/src/types.ts)
- [packages/ai/src/models.ts](packages/ai/src/models.ts)
- [packages/ai/src/index.ts](packages/ai/src/index.ts)
- [packages/ai/src/stream.ts](packages/ai/src/stream.ts)
- [packages/ai/src/providers/faux.ts](packages/ai/src/providers/faux.ts)
- [packages/ai/src/utils/event-stream.ts](packages/ai/src/utils/event-stream.ts)
- [packages/coding-agent/src/core/model-registry.ts](packages/coding-agent/src/core/model-registry.ts)
- [packages/coding-agent/examples/extensions/custom-provider-gitlab-duo/test.ts](packages/coding-agent/examples/extensions/custom-provider-gitlab-duo/test.ts)
</details>

# The Registry: How Does a New Provider Become Callable?

The `@earendil-works/pi-ai` package separates *what model to call* from *how to call it*. Models carry a plain string `api` field (e.g. `"anthropic-messages"`, `"openai-responses"`). At call time, the `stream()` and `streamSimple()` entry points look that string up in a runtime `Map` to find the concrete streaming implementation. Nothing in the call path imports provider modules directly — every provider reaches the agent loop through one registry lookup.

This indirection is what makes the library provider-neutral. A custom provider (GitLab Duo, a local proxy, a test double) becomes callable by calling `registerApiProvider` with a matching `api` string. The agent loop does not need to change. This page traces the full lifecycle: how the registry is structured, how built-in providers self-register lazily, the precise cost of that laziness, and how third-party and test providers plug in via the same path.

---

## The Registry Data Structure

The registry is a module-level `Map` in `api-registry.ts`:

```ts
// packages/ai/src/api-registry.ts:40
const apiProviderRegistry = new Map<string, RegisteredApiProvider>();
```

Each entry stores the provider's internal representation and an optional `sourceId` used for bulk removal:

```ts
// packages/ai/src/api-registry.ts:35-38
type RegisteredApiProvider = {
  provider: ApiProviderInternal;
  sourceId?: string;
};
```

`ApiProviderInternal` normalises the two stream functions to a common signature `(model, context, options?) => AssistantMessageEventStream`, erasing provider-specific option types behind the registry boundary.

The Map key is the `api` string — a plain `string` value from the `Api` type union. This is the entire "address" of a provider at runtime. A `Model<TApi>` carries `api: TApi`; that value is the lookup key.

Sources: [packages/ai/src/api-registry.ts:35-45]()

---

## `registerApiProvider`: The One Entry Point

```ts
// packages/ai/src/api-registry.ts:66-78
export function registerApiProvider<TApi extends Api, TOptions extends StreamOptions>(
  provider: ApiProvider<TApi, TOptions>,
  sourceId?: string,
): void {
  apiProviderRegistry.set(provider.api, {
    provider: {
      api: provider.api,
      stream: wrapStream(provider.api, provider.stream),
      streamSimple: wrapStreamSimple(provider.api, provider.streamSimple),
    },
    sourceId,
  });
}
```

Two things happen here that are easy to overlook:

**1. Type erasure via `wrapStream`.** Each `ApiProvider<TApi, TOptions>` carries strongly-typed stream functions. The registry stores `ApiStreamFunction`, a uniformly-typed wrapper. `wrapStream` closes over the provider's `api` key and validates that the model handed to the function at call time has a matching `api`; if not, it throws immediately rather than making a confused upstream call.

```ts
// packages/ai/src/api-registry.ts:42-52
function wrapStream<TApi extends Api, TOptions extends StreamOptions>(
  api: TApi,
  stream: StreamFunction<TApi, TOptions>,
): ApiStreamFunction {
  return (model, context, options) => {
    if (model.api !== api) {
      throw new Error(`Mismatched api: ${model.api} expected ${api}`);
    }
    return stream(model as Model<TApi>, context, options as TOptions);
  };
}
```

**2. Map semantics imply last-write-wins.** Registering a provider for an already-registered `api` key silently replaces the old entry. This is how `resetApiProviders` works: it calls `clearApiProviders()` then re-registers all built-ins, returning the registry to a known state.

Sources: [packages/ai/src/api-registry.ts:42-78]()

---

## How the Agent Loop Calls Providers

`stream.ts` is the single file callers interact with. It imports `register-builtins.ts` as a side effect, then delegates every call through the registry:

```ts
// packages/ai/src/stream.ts:1-3
import "./providers/register-builtins.ts";
import { getApiProvider } from "./api-registry.ts";
```

```ts
// packages/ai/src/stream.ts:17-32
function resolveApiProvider(api: Api) {
  const provider = getApiProvider(api);
  if (!provider) {
    throw new Error(`No API provider registered for api: ${api}`);
  }
  return provider;
}

export function stream<TApi extends Api>(
  model: Model<TApi>,
  context: Context,
  options?: ProviderStreamOptions,
): AssistantMessageEventStream {
  const provider = resolveApiProvider(model.api);
  return provider.stream(model, context, options as StreamOptions);
}
```

`resolveApiProvider` does one `Map.get`. It never imports Anthropic, OpenAI, Google, or any other SDK module. That decoupling is intentional: callers import `stream` from `@earendil-works/pi-ai`; the concrete provider code only loads when the lazy `import()` inside `register-builtins.ts` resolves.

Sources: [packages/ai/src/stream.ts:1-59]()

---

## The Lazy-Load Pattern in `register-builtins.ts`

### What problem does it solve?

A naive implementation would `import` every provider SDK at module load time. Bedrock alone pulls in `@aws-sdk/client-bedrock-runtime`. If the caller only ever uses Anthropic, loading the AWS SDK wastes startup time and memory. The lazy pattern defers each provider module's import until the first call that actually needs it.

### How it works

Each provider gets a module-level `Promise` slot, initialised to `undefined`:

```ts
// packages/ai/src/providers/register-builtins.ts:94-123
let anthropicProviderModulePromise:
  | Promise<LazyProviderModule<"anthropic-messages", AnthropicOptions, SimpleStreamOptions>>
  | undefined;
// ... (one per provider)
```

A `load*` function populates that slot on first call, using the `||=` idiom to ensure the import fires exactly once regardless of concurrent calls:

```ts
// packages/ai/src/providers/register-builtins.ts:206-217
function loadAnthropicProviderModule() {
  anthropicProviderModulePromise ||= import("./anthropic.ts").then((module) => {
    const provider = module as AnthropicProviderModule;
    return {
      stream: provider.streamAnthropic,
      streamSimple: provider.streamSimpleAnthropic,
    };
  });
  return anthropicProviderModulePromise;
}
```

`createLazyStream` wraps a `loadModule` function into a `StreamFunction`. It returns an `AssistantMessageEventStream` immediately — the caller can start iterating right away — and asynchronously resolves the module promise, then forwards events from the inner provider stream:

```ts
// packages/ai/src/providers/register-builtins.ts:162-181
function createLazyStream<TApi extends Api, TOptions extends StreamOptions, TSimpleOptions extends SimpleStreamOptions>(
  loadModule: () => Promise<LazyProviderModule<TApi, TOptions, TSimpleOptions>>,
): StreamFunction<TApi, TOptions> {
  return (model, context, options) => {
    const outer = new AssistantMessageEventStream();

    loadModule()
      .then((module) => {
        const inner = module.stream(model, context, options);
        forwardStream(outer, inner);
      })
      .catch((error) => {
        const message = createLazyLoadErrorMessage(model, error);
        outer.push({ type: "error", reason: "error", error: message });
        outer.end(message);
      });

    return outer;
  };
}
```

`forwardStream` drives the async iteration of `inner` and pushes each event into `outer`:

```ts
// packages/ai/src/providers/register-builtins.ts:132-139
function forwardStream(target: AssistantMessageEventStream, source: AsyncIterable<AssistantMessageEvent>): void {
  (async () => {
    for await (const event of source) {
      target.push(event);
    }
    target.end();
  })();
}
```

These lazy stream functions are then registered directly:

```ts
// packages/ai/src/providers/register-builtins.ts:326-327
export const streamAnthropic = createLazyStream(loadAnthropicProviderModule);
export const streamSimpleAnthropic = createLazySimpleStream(loadAnthropicProviderModule);
```

### The Bedrock special case

AWS Bedrock requires Node.js-specific modules. To support environments where the AWS SDK may not be available (e.g., browsers, Deno without Node compat), `register-builtins.ts` introduces a secondary escape hatch:

```ts
// packages/ai/src/providers/register-builtins.ts:89-92
const importNodeOnlyProvider = (specifier: string): Promise<unknown> => {
  const runtimeSpecifier = import.meta.url.endsWith(".js")
    ? specifier.replace(/\.ts$/, ".js")
    : specifier;
  return import(runtimeSpecifier);
};
```

An additional `setBedrockProviderModule` export lets the caller inject a pre-loaded Bedrock module, bypassing the dynamic import entirely:

```ts
// packages/ai/src/providers/register-builtins.ts:125-130
export function setBedrockProviderModule(module: BedrockProviderModule): void {
  bedrockProviderModuleOverride = {
    stream: module.streamBedrock,
    streamSimple: module.streamSimpleBedrock,
  };
}
```

Sources: [packages/ai/src/providers/register-builtins.ts:89-130, 162-217, 310-406]()

---

## `registerBuiltInApiProviders` and Module-Load Side Effect

`registerBuiltInApiProviders` registers all nine built-in providers. The lazy stream functions are already bound closures at this point — the actual provider modules have not loaded yet:

```ts
// packages/ai/src/providers/register-builtins.ts:345-399
export function registerBuiltInApiProviders(): void {
  registerApiProvider({ api: "anthropic-messages",      stream: streamAnthropic,           streamSimple: streamSimpleAnthropic });
  registerApiProvider({ api: "openai-completions",      stream: streamOpenAICompletions,   streamSimple: streamSimpleOpenAICompletions });
  registerApiProvider({ api: "mistral-conversations",   stream: streamMistral,             streamSimple: streamSimpleMistral });
  registerApiProvider({ api: "openai-responses",        stream: streamOpenAIResponses,     streamSimple: streamSimpleOpenAIResponses });
  registerApiProvider({ api: "azure-openai-responses",  stream: streamAzureOpenAIResponses, streamSimple: streamSimpleAzureOpenAIResponses });
  registerApiProvider({ api: "openai-codex-responses",  stream: streamOpenAICodexResponses, streamSimple: streamSimpleOpenAICodexResponses });
  registerApiProvider({ api: "google-generative-ai",    stream: streamGoogle,              streamSimple: streamSimpleGoogle });
  registerApiProvider({ api: "google-vertex",           stream: streamGoogleVertex,        streamSimple: streamSimpleGoogleVertex });
  registerApiProvider({ api: "bedrock-converse-stream", stream: streamBedrockLazy,         streamSimple: streamSimpleBedrockLazy });
}
```

Critically, the file ends with an unconditional call:

```ts
// packages/ai/src/providers/register-builtins.ts:406
registerBuiltInApiProviders();
```

This means that merely importing `register-builtins.ts` — which `stream.ts` does as a side-effect import — populates the registry. Any code that imports from `@earendil-works/pi-ai` transitively triggers this. The registry is ready before any user code runs, but no provider SDK has loaded yet.

Sources: [packages/ai/src/providers/register-builtins.ts:345-406]()

---

## Tradeoff: Startup Time vs. First-Call Overhead

```text
Module load (import register-builtins.ts)
  │
  ├─ registerBuiltInApiProviders() runs immediately
  │    └─ 9 × registerApiProvider(lazyStream, ...)
  │         └─ registry Map populated; provider modules NOT loaded
  │
First call to stream(model, context)  ← model.api = "anthropic-messages"
  │
  ├─ resolveApiProvider("anthropic-messages") → registry hit
  ├─ provider.stream(model, context) → createLazyStream closure fires
  │    ├─ outer = new AssistantMessageEventStream()   ← returned immediately
  │    └─ loadAnthropicProviderModule() starts
  │         ├─ import("./anthropic.ts") resolves (network/disk)
  │         └─ forwardStream(outer, inner) begins piping events
  │
  └─ caller starts iterating outer — first event arrives after module resolves

Subsequent calls (same api key)
  └─ loadAnthropicProviderModule() returns cached Promise immediately
       └─ module already loaded; forwardStream fires without import delay
```

| Phase | Cost | Notes |
|---|---|---|
| Module import | Low — 9 `Map.set` calls | No SDK loaded |
| First call per `api` | `dynamic import` round-trip | One-time per provider per process |
| Subsequent calls | Effectively zero | `||=` guard returns resolved Promise |
| Error during load | Encoded in stream | `createLazyLoadErrorMessage` wraps the error as an `AssistantMessage` with `stopReason: "error"`, keeping the caller's async-iteration contract intact |

Sources: [packages/ai/src/providers/register-builtins.ts:162-204](), [packages/ai/src/stream.ts:17-23]()

---

## Registering a Custom Provider

The `Api` type is open-ended:

```ts
// packages/ai/src/types.ts:17
export type Api = KnownApi | (string & {});
```

This means any string can be a valid API key. A custom provider registers itself by calling the same function the built-ins use:

```ts
// packages/coding-agent/examples/extensions/custom-provider-gitlab-duo/test.ts:40-44
registerApiProvider({
  api: "gitlab-duo-api" as Api,
  stream: streamGitLabDuo,
  streamSimple: streamGitLabDuo,
});
```

Then a `Model<Api>` is constructed with `api: "gitlab-duo-api"`, and `stream(model, context, options)` routes to the GitLab Duo implementation without any change to `stream.ts`.

The `sourceId` parameter supports lifecycle management: plugins pass a stable ID, and `unregisterApiProviders(sourceId)` removes all entries associated with that ID when the plugin is torn down.

```ts
// packages/ai/src/api-registry.ts:88-94
export function unregisterApiProviders(sourceId: string): void {
  for (const [api, entry] of apiProviderRegistry.entries()) {
    if (entry.sourceId === sourceId) {
      apiProviderRegistry.delete(api);
    }
  }
}
```

Sources: [packages/ai/src/api-registry.ts:88-94](), [packages/coding-agent/examples/extensions/custom-provider-gitlab-duo/test.ts:40-44]()

---

## The Faux Provider: Same Path for Tests

Testing uses the same registration path. `registerFauxProvider` in `faux.ts` creates a controllable `stream` function that pops scripted responses from a queue, then calls `registerApiProvider` with an auto-generated `api` key and a `sourceId`:

```ts
// packages/ai/src/providers/faux.ts:470
registerApiProvider({ api, stream, streamSimple }, sourceId);
```

The test harness calls `fauxProvider.unregister()` to clean up, which calls `unregisterApiProviders(sourceId)`. Tests never touch the real provider modules; they go through exactly the same registry lookup as production code.

Sources: [packages/ai/src/providers/faux.ts:391-499]()

---

## Architecture Summary

```text
┌─────────────────────────────────────────────────────────┐
│                  @earendil-works/pi-ai                   │
│                                                         │
│  stream.ts              api-registry.ts                 │
│  ┌──────────────┐       ┌──────────────────────────┐    │
│  │ stream()     │──────▶│  Map<string, Provider>   │    │
│  │ streamSimple()│      │                          │    │
│  │ complete()   │       │  "anthropic-messages" →  │    │
│  └──────────────┘       │    lazyStream(loadAnthr.) │    │
│        ▲                │  "openai-responses"    → │    │
│        │                │    lazyStream(loadOAI)   │    │
│  registerApiProvider()  │  "gitlab-duo-api"      → │    │
│  (public API)           │    streamGitLabDuo       │    │
│        │                │  "faux:abc123"         → │    │
│  ┌─────┴─────────┐      │    fauxStream            │    │
│  │Built-ins      │      └──────────────────────────┘    │
│  │register-      │                                      │
│  │builtins.ts    │       provider modules (lazy)        │
│  │(side effect   │      ┌──────────────────────────┐    │
│  │ on import)    │      │  anthropic.ts  (SDK)     │    │
│  └───────────────┘      │  openai-responses.ts     │    │
│                         │  google.ts               │    │
│  ┌───────────────┐      │  amazon-bedrock.ts       │    │
│  │Custom/Test    │      │  ... (loaded on 1st use) │    │
│  │Providers      │      └──────────────────────────┘    │
│  │(registerApi   │                                      │
│  │ Provider)     │                                      │
│  └───────────────┘                                      │
└─────────────────────────────────────────────────────────┘
```

The registry is a thin `Map<string, wrapped-stream-fn>`. It carries no provider logic. Every provider — built-in, third-party, or test — becomes callable by writing one `api` string key into that map. The lazy-load pattern in `register-builtins.ts` ensures that startup cost is fixed (nine `Map.set` calls) regardless of how many providers are registered, while deferring SDK import costs to first actual use per provider. The `AssistantMessageEventStream` returned immediately by every lazy-wrapped call preserves the streaming contract even before the backing module has resolved.

Sources: [packages/ai/src/api-registry.ts:40](), [packages/ai/src/providers/register-builtins.ts:162-181, 406]()

---

## 05. Nine Providers, One Interface: What Must Every Adapter Guarantee?

> The providers/ directory contains adapters for Anthropic, OpenAI (completions, responses, Codex), Azure OpenAI, Google AI, Google Vertex, Mistral, Amazon Bedrock, Cloudflare, and a faux test provider. This page asks what the common stream/streamSimple contract is, where adapters diverge (Bedrock's Node-only constraint, GitHub Copilot's custom headers, OpenAI prompt-cache specifics), and what the faux provider reveals about testability.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/05-nine-providers-one-interface-what-must-every-adapter-guarantee.md
- Generated: 2026-05-22T23:28:30.136Z

### Source Files

- `packages/ai/src/providers/anthropic.ts`
- `packages/ai/src/providers/amazon-bedrock.ts`
- `packages/ai/src/providers/faux.ts`
- `packages/ai/src/providers/github-copilot-headers.ts`
- `packages/ai/src/providers/transform-messages.ts`
- `packages/ai/src/providers/openai-prompt-cache.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/ai/src/types.ts](packages/ai/src/types.ts)
- [packages/ai/src/providers/anthropic.ts](packages/ai/src/providers/anthropic.ts)
- [packages/ai/src/providers/amazon-bedrock.ts](packages/ai/src/providers/amazon-bedrock.ts)
- [packages/ai/src/providers/faux.ts](packages/ai/src/providers/faux.ts)
- [packages/ai/src/providers/github-copilot-headers.ts](packages/ai/src/providers/github-copilot-headers.ts)
- [packages/ai/src/providers/transform-messages.ts](packages/ai/src/providers/transform-messages.ts)
- [packages/ai/src/providers/openai-prompt-cache.ts](packages/ai/src/providers/openai-prompt-cache.ts)
- [packages/ai/src/providers/openai-responses.ts](packages/ai/src/providers/openai-responses.ts)
- [packages/ai/src/providers/register-builtins.ts](packages/ai/src/providers/register-builtins.ts)
- [packages/ai/src/utils/event-stream.ts](packages/ai/src/utils/event-stream.ts)
</details>

# Nine Providers, One Interface: What Must Every Adapter Guarantee?

The `packages/ai/src/providers/` directory contains adapters for nine distinct AI backends: Anthropic (native messages API), OpenAI (completions, responses, and Codex variants), Azure OpenAI Responses, Google AI, Google Vertex, Mistral, Amazon Bedrock, Cloudflare, and a faux test provider. Each adapter exposes a narrow, stable interface — `stream` and `streamSimple` — that the rest of the codebase calls without needing to know which provider sits underneath.

This page asks the Socratic questions: what is the minimum a provider must guarantee, where do individual adapters diverge from the common mold, and what does the existence of a faux provider reveal about how the system is tested and reasoned about?

---

## What Is the Common Contract?

### The StreamFunction type

Every provider adapter reduces to a single type:

```typescript
// packages/ai/src/types.ts:206-210
export type StreamFunction<TApi extends Api = Api, TOptions extends StreamOptions = StreamOptions> = (
  model: Model<TApi>,
  context: Context,
  options?: TOptions,
) => AssistantMessageEventStream;
```

Three inputs — typed model, conversation context, options — and one output: an `AssistantMessageEventStream`. No exceptions thrown from the call site. All failure paths are encoded as stream events.

Each provider registers two functions:

| Function | Options type | Purpose |
|---|---|---|
| `stream` | Provider-specific (e.g., `AnthropicOptions`, `BedrockOptions`) | Full control — enables provider-specific knobs |
| `streamSimple` | `SimpleStreamOptions` | Portable subset: `reasoning` level, budgets, standard options |

`streamSimple` is the portability surface. A caller using `streamSimple` can switch providers without learning each provider's native option vocabulary.

Sources: [packages/ai/src/types.ts:192-210]()

### The event stream protocol

`AssistantMessageEventStream` is a push-based async iterable. All adapters must emit events in a defined order:

```
start → (text_start → text_delta* → text_end)*
      → (thinking_start → thinking_delta* → thinking_end)*
      → (toolcall_start → toolcall_delta* → toolcall_end)*
      → done | error
```

The `done` event carries the final `AssistantMessage`; the `error` event carries a partial one with `stopReason: "error" | "aborted"` and an `errorMessage`.

Crucially, providers never throw. If an error occurs mid-stream, the adapter catches it, sets `stopReason`, and emits `error` before calling `stream.end()`. This invariant is verified by looking at both the Anthropic and Bedrock adapters:

```typescript
// packages/ai/src/providers/anthropic.ts:692-703
} catch (error) {
  for (const block of output.content) {
    delete (block as { index?: number }).index;
    delete (block as { partialJson?: string }).partialJson;
  }
  output.stopReason = options?.signal?.aborted ? "aborted" : "error";
  output.errorMessage = error instanceof Error ? error.message : JSON.stringify(error);
  stream.push({ type: "error", reason: output.stopReason, error: output });
  stream.end();
}
```

The Bedrock adapter does the same: `stream.push({ type: "error" ... }); stream.end()` (amazon-bedrock.ts:259-262). The faux provider matches this too (faux.ts:311-323 for abort handling; faux.ts:381-385 for normal error propagation). The pattern is consistent across all three providers inspected.

Sources: [packages/ai/src/utils/event-stream.ts:69-82](), [packages/ai/src/providers/anthropic.ts:692-703](), [packages/ai/src/providers/amazon-bedrock.ts:253-263]()

### Shared usage shape

Every adapter produces the same `usage` structure on every `AssistantMessage`:

```typescript
usage: {
  input: number;
  output: number;
  cacheRead: number;
  cacheWrite: number;
  totalTokens: number;
  cost: { input, output, cacheRead, cacheWrite, total };
}
```

If a provider does not report cache tokens (e.g., an older API), those fields remain 0. The shape is always present — consumers never need to guard for `undefined`.

Sources: [packages/ai/src/providers/anthropic.ts:455-471](), [packages/ai/src/providers/amazon-bedrock.ts:97-113]()

### What `transformMessages` does for every adapter

Before serializing messages to any provider's wire format, both Anthropic and Bedrock call `transformMessages`. This shared pre-processing step handles:

1. **Image downgrade**: if the target model does not include `"image"` in `model.input`, all images in user and tool-result messages are replaced with placeholder text.
2. **Cross-model thinking stripping**: redacted or signed thinking blocks are dropped or converted to plain text when replaying into a different model than the one that generated them.
3. **Tool call ID normalization**: OpenAI Responses API generates IDs that are 450+ characters with `|` characters; Anthropic requires `^[a-zA-Z0-9_-]+` (max 64 chars). `normalizeToolCallId` re-encodes them.
4. **Orphaned tool call repair**: if an assistant message ended with tool calls but no matching results exist (e.g., after an aborted request), synthetic `toolResult` messages with `isError: true` are inserted so the conversation replay satisfies provider requirements.

```typescript
// packages/ai/src/providers/transform-messages.ts:64-67
export function transformMessages<TApi extends Api>(
  messages: Message[],
  model: Model<TApi>,
  normalizeToolCallId?: (id: string, model: Model<TApi>, source: AssistantMessage) => string,
): Message[]
```

Sources: [packages/ai/src/providers/transform-messages.ts:64-220]()

---

## Where Adapters Diverge

### Amazon Bedrock: Node-only constraint

Bedrock is the only adapter with an explicit Node.js/Bun environment check:

```typescript
// packages/ai/src/providers/amazon-bedrock.ts:141-176
if (typeof process !== "undefined" && (process.versions?.node || process.versions?.bun)) {
  // ... region resolution, proxy setup, HTTP handler config
} else {
  // Non-Node environment (browser): fall back to us-east-1
  config.region = configuredRegion || ... || "us-east-1";
}
```

Several capabilities only work in Node:

- **HTTP proxy support**: `NodeHttpHandler` wraps proxy agents for HTTP(S) tunneling. Without this, Bedrock traffic cannot be routed through corporate proxies.
- **AWS credential chain**: profile-based auth (`AWS_PROFILE`) relies on reading `~/.aws/config`, which is not available in the browser.
- **SigV4 signing**: the AWS SDK's default signing uses Node-native crypto.

In the browser, the adapter falls back to `us-east-1` and relies on externally injected credentials. This is the only adapter with such explicit runtime branching.

Additionally, Bedrock has a `setBedrockProviderModule` escape hatch in `register-builtins.ts` (line 125-130), which allows the Bedrock implementation to be overridden by an external module — useful for environments where dynamic import of the AWS SDK is impractical.

Sources: [packages/ai/src/providers/amazon-bedrock.ts:141-176](), [packages/ai/src/providers/register-builtins.ts:118-130]()

### Bedrock's content-block start divergence

Unlike Anthropic, where `contentBlockStart` events are sent for text blocks, Bedrock omits them for text. The adapter compensates by lazily creating text blocks on the first `contentBlockDelta`:

```typescript
// packages/ai/src/providers/amazon-bedrock.ts:379-391
if (delta?.text !== undefined) {
  // If no text block exists yet, create one,
  // as handleContentBlockStart is not sent for text blocks
  if (!block) {
    const newBlock: Block = { type: "text", text: "", index: contentBlockIndex };
    output.content.push(newBlock);
    ...
    stream.push({ type: "text_start", ... });
  }
```

This is a per-adapter normalization: the event consumer always sees `text_start` before `text_delta`, regardless of the wire protocol.

Sources: [packages/ai/src/providers/amazon-bedrock.ts:379-395]()

### GitHub Copilot: custom dynamic headers

The GitHub Copilot provider runs on top of the Anthropic messages API (same wire format) but requires several additional headers that must be computed per-request, not just once at client construction. These are produced in `github-copilot-headers.ts`:

```typescript
// packages/ai/src/providers/github-copilot-headers.ts:23-37
export function buildCopilotDynamicHeaders(params: {
  messages: Message[];
  hasImages: boolean;
}): Record<string, string> {
  const headers: Record<string, string> = {
    "X-Initiator": inferCopilotInitiator(params.messages),
    "Openai-Intent": "conversation-edits",
  };
  if (params.hasImages) {
    headers["Copilot-Vision-Request"] = "true";
  }
  return headers;
}
```

`X-Initiator` is `"user"` when the last message is a user turn, `"agent"` otherwise. `Copilot-Vision-Request: true` is added only when images are present in the conversation. These headers are merged into the Anthropic client's `defaultHeaders` at each invocation:

```typescript
// packages/ai/src/providers/anthropic.ts:484-490
if (model.provider === "github-copilot") {
  const hasImages = hasCopilotVisionInput(context.messages);
  copilotDynamicHeaders = buildCopilotDynamicHeaders({
    messages: context.messages,
    hasImages,
  });
}
```

Bearer token auth (not API key auth) is used for the Copilot client, with `authToken: apiKey` instead of `apiKey: apiKey`.

Sources: [packages/ai/src/providers/github-copilot-headers.ts:1-37](), [packages/ai/src/providers/anthropic.ts:483-506]()

### OpenAI prompt cache: a 64-character key constraint

OpenAI prompt caching uses a cache key that must not exceed 64 Unicode code points. The `openai-prompt-cache.ts` utility enforces this:

```typescript
// packages/ai/src/providers/openai-prompt-cache.ts:1-8
export const OPENAI_PROMPT_CACHE_KEY_MAX_LENGTH = 64;

export function clampOpenAIPromptCacheKey(key: string | undefined): string | undefined {
  if (key === undefined) return undefined;
  const chars = Array.from(key);
  if (chars.length <= OPENAI_PROMPT_CACHE_KEY_MAX_LENGTH) return key;
  return chars.slice(0, OPENAI_PROMPT_CACHE_KEY_MAX_LENGTH).join("");
}
```

Note that `Array.from(key)` iterates Unicode code points (not UTF-16 code units), which correctly handles multi-byte characters like emoji. This function is called by `openai-responses.ts` when constructing the OpenAI client with a session-based cache key.

Sources: [packages/ai/src/providers/openai-prompt-cache.ts:1-8]()

### Anthropic: stealth mode and OAuth identity headers

The Anthropic adapter has a "stealth mode" for OAuth token users. When the API key starts with `sk-ant-oat`, the adapter identifies itself as Claude Code:

```typescript
// packages/ai/src/providers/anthropic.ts:844-863
if (isOAuthToken(apiKey)) {
  const client = new Anthropic({
    ...
    defaultHeaders: mergeHeaders({
      "anthropic-beta": ["claude-code-20250219", "oauth-2025-04-20", ...betaFeatures].join(","),
      "user-agent": `claude-cli/${claudeCodeVersion}`,
      "x-app": "cli",
    }, ...),
  });
  return { client, isOAuthToken: true };
}
```

When `isOAuth` is true, tool names are translated to Claude Code's canonical casing (`Read`, `Write`, `Edit`, `Bash`, etc.) before being sent, and reversed when received. This allows the OAuth path to impersonate Claude Code's tool namespace exactly.

Sources: [packages/ai/src/providers/anthropic.ts:69-106](), [packages/ai/src/providers/anthropic.ts:844-864]()

### Cache control placement: last-block injection

Both Anthropic and Bedrock apply cache control markers at the same logical position — the last content block of the last user message in the conversation. For Anthropic this means appending `cache_control: { type: "ephemeral" }` to the final content block. For Bedrock this means appending a `cachePoint` block after the last user content block. Neither provider receives explicit per-block cache annotations from the caller; the adapters inject them deterministically.

Sources: [packages/ai/src/providers/anthropic.ts:1136-1158](), [packages/ai/src/providers/amazon-bedrock.ts:762-773]()

---

## Provider × Feature Divergence Matrix

| Feature | Anthropic | Bedrock | OpenAI Responses | Copilot | Faux |
|---|---|---|---|---|---|
| Wire format | SSE (own decoder) | AWS SDK stream | OpenAI SDK stream | Anthropic SDK | In-memory |
| Auth | API key / Bearer / OAuth | SigV4 / Bearer token | API key | Bearer (OAuth) | None |
| Cache control | `cache_control` on blocks | `cachePoint` blocks | Session key (24h / default) | Via Anthropic | Simulated |
| Thinking | adaptive / budget / disabled | adaptive / budget (Claude only) | `reasoningEffort` string | Via Anthropic | Pass-through |
| Proxy support | Via `baseUrl` | `NodeHttpHandler` only | Via `baseUrl` | Via `baseUrl` | N/A |
| Custom headers | `defaultHeaders` merge | Not supported (SDK auth) | Client headers | Dynamic per-request | N/A |
| Node-only | No | Yes | No | No | No |
| Vision detection | `model.input` check | `model.input` check | `model.input` check | `hasCopilotVisionInput` + header | Supported |

---

## What the Faux Provider Reveals About Testability

### What problem does the faux provider solve?

Testing code that calls a real LLM is slow, expensive, and non-deterministic. But replacing the LLM with a mock at the HTTP layer means you're testing the wrong boundary — the real interface is the `AssistantMessageEventStream`, not HTTP responses.

The faux provider tests the streaming contract from the inside. It registers as a real provider via `registerApiProvider`, meaning callers cannot distinguish it from Anthropic at the call site. It emits the exact same event sequence: `start`, `text_start`, `text_delta` chunks, `text_end`, `done`. The chunk size is randomized between `min` and `max` token sizes to shake out partial-text handling.

### What the faux provider exposes

```typescript
// packages/ai/src/providers/faux.ts:96-101
export type FauxResponseFactory = (
  context: Context,
  options: StreamOptions | undefined,
  state: { callCount: number },
  model: Model<string>,
) => AssistantMessage | Promise<AssistantMessage>;
```

Tests can pass either a prebuilt `AssistantMessage` or a factory function. The factory receives the full `Context` and call count, so tests can assert on what the provider received (e.g., the accumulated tool results in `context.messages`), return different responses on each invocation, and simulate stateful multi-turn conversations.

The faux provider also simulates prompt caching:

```typescript
// packages/ai/src/providers/faux.ts:215-225
const previousPrompt = promptCache.get(sessionId);
if (previousPrompt) {
  const cachedChars = commonPrefixLength(previousPrompt, promptText);
  cacheRead = estimateTokens(previousPrompt.slice(0, cachedChars));
  cacheWrite = estimateTokens(promptText.slice(cachedChars));
  input = Math.max(0, promptTokens - cacheRead);
}
```

It tracks session prompts in memory and computes cache hits based on common prefix length — a faithful-enough model to test cache-aware billing logic without a real API.

Token throughput can be rate-limited via `tokensPerSecond` to test streaming cancellation and abort handling. The abort signal is checked before every content block, making this a strong regression harness for abort-mid-stream edge cases.

Sources: [packages/ai/src/providers/faux.ts:96-101](), [packages/ai/src/providers/faux.ts:201-239](), [packages/ai/src/providers/faux.ts:296-389]()

---

## Lifecycle of a Streaming Request

```text
Caller
  │
  ▼
stream(model, context, options)          ← StreamFunction signature
  │
  ├─ transformMessages()                 ← shared pre-processing (image downgrade,
  │                                         thinking strip, tool ID normalization)
  │
  ├─ Provider-specific client setup
  │   Anthropic  → new Anthropic({ apiKey|authToken, defaultHeaders, betaFeatures })
  │   Bedrock    → new BedrockRuntimeClient({ region, credentials, requestHandler })
  │   OpenAI     → new OpenAI({ apiKey, baseURL, defaultHeaders })
  │   Faux       → in-memory queue
  │
  ├─ Wire call (SSE / AWS SDK stream / OpenAI stream / microtask)
  │
  ├─ Event normalization loop:
  │   content_block_start / delta / stop → text_start / text_delta / text_end
  │                                      → toolcall_start / toolcall_delta / toolcall_end
  │                                      → thinking_start / thinking_delta / thinking_end
  │
  └─ stream.push(done | error) + stream.end()
```

The normalization loop is the largest per-provider divergence point. Anthropic uses a hand-rolled SSE decoder (`iterateSseMessages` → `iterateAnthropicEvents`). Bedrock drives `for await (const item of response.stream!)` over the AWS SDK's async iterable. OpenAI responses use the OpenAI SDK's streaming client. All three ultimately produce the same event vocabulary before the events reach the caller.

Sources: [packages/ai/src/providers/anthropic.ts:347-445](), [packages/ai/src/providers/amazon-bedrock.ts:213-241]()

---

## Summary

Every adapter in `packages/ai/src/providers/` must satisfy four non-negotiable guarantees: return an `AssistantMessageEventStream` synchronously, emit all failures through the stream (never throw), populate the standard `usage` shape, and apply `transformMessages` before serializing conversation history. The shared `stream` / `streamSimple` pair is the boundary that makes providers interchangeable.

Where adapters diverge is in exactly the places forced by their host platforms: Bedrock's Node-only AWS SDK and proxy model, GitHub Copilot's per-request dynamic headers and bearer auth, Anthropic's OAuth stealth identity and per-block `cache_control`, and OpenAI's 64-character prompt cache key limit. The faux provider, by contrast, eliminates all of these specifics and reduces the adapter to its logical core — a push-based, abort-aware, cache-simulating event emitter — proving that the `StreamFunction` contract is both necessary and sufficient for testing the entire streaming pipeline without any real network call.

Sources: [packages/ai/src/providers/faux.ts:391-499](), [packages/ai/src/types.ts:199-210]()

---

## 06. Streaming All the Way Down: What Happens Between Token and Tool Call?

> Every LLM call returns an AsyncIterable of AssistantMessageEvents. This page follows an event from the provider stream through AssistantMessageEventStream (utils/event-stream.ts), into the agent loop's stream function, and up to the TUI renderer. It asks: where is backpressure applied, how are partial tool-call arguments accumulated before validation, and what does the overflow utility guard against?

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/06-streaming-all-the-way-down-what-happens-between-token-and-tool-call.md
- Generated: 2026-05-22T23:29:43.828Z

### Source Files

- `packages/ai/src/stream.ts`
- `packages/ai/src/utils/event-stream.ts`
- `packages/ai/src/utils/overflow.ts`
- `packages/ai/src/utils/json-parse.ts`
- `packages/ai/src/session-resources.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/ai/src/stream.ts](packages/ai/src/stream.ts)
- [packages/ai/src/utils/event-stream.ts](packages/ai/src/utils/event-stream.ts)
- [packages/ai/src/utils/overflow.ts](packages/ai/src/utils/overflow.ts)
- [packages/ai/src/utils/json-parse.ts](packages/ai/src/utils/json-parse.ts)
- [packages/ai/src/session-resources.ts](packages/ai/src/session-resources.ts)
- [packages/ai/src/types.ts](packages/ai/src/types.ts)
- [packages/ai/src/providers/anthropic.ts](packages/ai/src/providers/anthropic.ts)
- [packages/agent/src/agent-loop.ts](packages/agent/src/agent-loop.ts)
- [packages/ai/src/utils/validation.ts](packages/ai/src/utils/validation.ts)
- [packages/coding-agent/src/core/agent-session.ts](packages/coding-agent/src/core/agent-session.ts)
- [packages/coding-agent/src/modes/interactive/interactive-mode.ts](packages/coding-agent/src/modes/interactive/interactive-mode.ts)
</details>

# Streaming All the Way Down: What Happens Between Token and Tool Call?

Every LLM response in this codebase travels through four distinct layers before it has any effect: the provider HTTP stream, the `AssistantMessageEventStream` event bus, the agent loop's `streamAssistantResponse` function, and the TUI or extension subscriber. Each layer has a specific contract. This page traces a single event from raw network bytes to validated tool execution, asking along the way where backpressure lives, how partial JSON is assembled, and what the `overflow` utility actually guards against.

---

## What is the simplest version of this system?

The simplest design would be: call the LLM, get a string back, parse it, act on it. The complexity that replaces that design comes from two real requirements:

1. **Streaming renders text progressively** as the model generates it, which requires an event-at-a-time protocol rather than a single resolved value.
2. **Tool call arguments arrive as a JSON fragment stream**, so argument objects must be assembled, parsed, and validated incrementally before tool execution is safe.

The architecture resolves both by defining a typed event union (`AssistantMessageEvent`) and a generic queue (`EventStream<T, R>`) that lets producers push events at their own pace while consumers pull via `for await`.

---

## Layer 1: The Public Entry Point (`packages/ai/src/stream.ts`)

```ts
// packages/ai/src/stream.ts:25-32
export function stream<TApi extends Api>(
    model: Model<TApi>,
    context: Context,
    options?: ProviderStreamOptions,
): AssistantMessageEventStream {
    const provider = resolveApiProvider(model.api);
    return provider.stream(model, context, options as StreamOptions);
}
```

`stream()` resolves the correct provider from the API registry and immediately returns an `AssistantMessageEventStream`. The caller never sees a raw HTTP response — the provider adapter's job is to translate whatever wire format it receives into typed events and push them into the stream object before returning it.

This means: **the stream object is returned before any events exist in it**. The provider populates it asynchronously. Callers that iterate `for await (const event of stream)` will suspend if the queue is empty, which is the mechanism that provides implicit backpressure.

Sources: [packages/ai/src/stream.ts:25-32]()

---

## Layer 2: The Event Bus (`packages/ai/src/utils/event-stream.ts`)

### What does `EventStream<T, R>` actually implement?

```ts
// packages/ai/src/utils/event-stream.ts:4-67
export class EventStream<T, R = T> implements AsyncIterable<T> {
    private queue: T[] = [];
    private waiting: ((value: IteratorResult<T>) => void)[] = [];
    private done = false;
    // ...
    push(event: T): void { ... }
    end(result?: R): void { ... }
    async *[Symbol.asyncIterator](): AsyncIterator<T> { ... }
    result(): Promise<R> { ... }
}
```

The class maintains two parallel data structures: `queue` (events that arrived before anyone asked) and `waiting` (consumers that asked before events arrived). On each `push()`:

- If a consumer is suspended in `for await`, it is resumed immediately via `waiter({ value: event, done: false })`.
- If no consumer is waiting, the event is buffered in `queue`.

This is a **single-consumer, single-producer rendez-vous queue**. The "backpressure" in this system is purely implicit: the provider calls `push()` synchronously inside its event handler loop, and the agent loop's `await` on the next event creates a natural pause between them. There is no explicit flow control, buffer limit, or back-channel signal to slow the provider. If the provider pushes faster than the consumer processes, events accumulate in `queue`.

### How does `AssistantMessageEventStream` specialize it?

```ts
// packages/ai/src/utils/event-stream.ts:69-83
export class AssistantMessageEventStream extends EventStream<AssistantMessageEvent, AssistantMessage> {
    constructor() {
        super(
            (event) => event.type === "done" || event.type === "error",
            (event) => {
                if (event.type === "done") return event.message;
                else if (event.type === "error") return event.error;
                throw new Error("Unexpected event type for final result");
            },
        );
    }
}
```

The two callbacks teach `EventStream` which event type terminates the stream and how to extract the final `AssistantMessage` from that terminal event. Callers can either iterate all events or skip directly to `stream.result()`, which returns a promise that resolves when the `done` or `error` event fires.

Sources: [packages/ai/src/utils/event-stream.ts:4-88]()

---

## Layer 3: Provider Adapters — How Raw Tokens Become Events

### The event type taxonomy

```ts
// packages/ai/src/types.ts:347-359
export type AssistantMessageEvent =
    | { type: "start"; partial: AssistantMessage }
    | { type: "text_start"; contentIndex: number; partial: AssistantMessage }
    | { type: "text_delta"; contentIndex: number; delta: string; partial: AssistantMessage }
    | { type: "text_end"; contentIndex: number; content: string; partial: AssistantMessage }
    | { type: "thinking_start"; contentIndex: number; partial: AssistantMessage }
    | { type: "thinking_delta"; contentIndex: number; delta: string; partial: AssistantMessage }
    | { type: "thinking_end"; contentIndex: number; content: string; partial: AssistantMessage }
    | { type: "toolcall_start"; contentIndex: number; partial: AssistantMessage }
    | { type: "toolcall_delta"; contentIndex: number; delta: string; partial: AssistantMessage }
    | { type: "toolcall_end"; contentIndex: number; toolCall: ToolCall; partial: AssistantMessage }
    | { type: "done"; reason: ...; message: AssistantMessage }
    | { type: "error"; reason: ...; error: AssistantMessage };
```

Every event carries `partial: AssistantMessage`, a live snapshot of the in-progress message object at the moment the event was pushed. This means the consumer always has a complete (if incomplete) picture of the message state without needing to track deltas themselves.

### How partial tool-call arguments are accumulated (Anthropic provider example)

The Anthropic streaming protocol sends tool arguments as successive `input_json_delta` chunks. The provider maintains a mutable block object per content index:

```ts
// packages/ai/src/providers/anthropic.ts:604-616
} else if (event.delta.type === "input_json_delta") {
    const index = blocks.findIndex((b) => b.index === event.index);
    const block = blocks[index];
    if (block && block.type === "toolCall") {
        block.partialJson += event.delta.partial_json;
        block.arguments = parseStreamingJson(block.partialJson);
        stream.push({
            type: "toolcall_delta",
            contentIndex: index,
            delta: event.delta.partial_json,
            partial: output,
        });
    }
}
```

Two things happen simultaneously on each delta:
1. `partialJson` (a raw string scratch buffer) is appended with the new chunk.
2. `arguments` is re-parsed from `partialJson` via `parseStreamingJson`, giving a best-effort object representation of whatever JSON has arrived so far.

At `content_block_stop`, the final parse is committed and `partialJson` is deleted from the block before the `toolcall_end` event fires:

```ts
// packages/ai/src/providers/anthropic.ts:644-654
} else if (block.type === "toolCall") {
    block.arguments = parseStreamingJson(block.partialJson);
    delete (block as { partialJson?: string }).partialJson;
    stream.push({
        type: "toolcall_end",
        contentIndex: index,
        toolCall: block,
        partial: output,
    });
}
```

The `partialJson` scratch buffer is an internal implementation detail; downstream consumers only see the parsed `arguments` object. Other providers (OpenAI completions, Bedrock, Mistral, OpenAI Responses) follow the same pattern with the same field names.

Sources: [packages/ai/src/providers/anthropic.ts:564-655]()

---

## Layer 4: Partial JSON Parsing (`packages/ai/src/utils/json-parse.ts`)

The question "what does `parseStreamingJson` return when the JSON is incomplete?" has a definite answer:

```ts
// packages/ai/src/utils/json-parse.ts:104-124
export function parseStreamingJson<T = Record<string, unknown>>(partialJson: string | undefined): T {
    if (!partialJson || partialJson.trim() === "") {
        return {} as T;
    }
    try {
        return parseJsonWithRepair<T>(partialJson);
    } catch {
        try {
            const result = partialParse(partialJson);      // third-party partial-json library
            return (result ?? {}) as T;
        } catch {
            try {
                const result = partialParse(repairJson(partialJson));
                return (result ?? {}) as T;
            } catch {
                return {} as T;                            // always returns an object, never throws
            }
        }
    }
}
```

The function applies a three-level parse cascade:
1. Full JSON parse with repair (handles stray control characters and invalid backslash escapes).
2. `partial-json` library parse, which tolerates truncated JSON structures.
3. `partial-json` parse on the repaired string.

If all three fail, it returns `{}`. **This function never throws**. The goal is a best-effort live object for display during streaming; final correctness comes from the last parse at `toolcall_end`, which receives the complete JSON string.

The `repairJson` function specifically handles two malformation classes that appear in real LLM output: raw control characters inside strings (e.g., literal `\n` that should be `\\n`) and invalid escape sequences (e.g., `\p` which is not a valid JSON escape).

Sources: [packages/ai/src/utils/json-parse.ts:1-124]()

---

## Layer 5: The Agent Loop (`packages/agent/src/agent-loop.ts`)

### From stream to validated tool call

`streamAssistantResponse` is the function that drives the provider stream, re-emits events as `AgentEvent`s, and returns the finalized `AssistantMessage`:

```ts
// packages/agent/src/agent-loop.ts:313-357
for await (const event of response) {
    switch (event.type) {
        case "start":
            partialMessage = event.partial;
            context.messages.push(partialMessage);
            addedPartial = true;
            await emit({ type: "message_start", message: { ...partialMessage } });
            break;

        case "text_start":
        case "text_delta":
        // ... (all mid-stream events)
        case "toolcall_delta":
        case "toolcall_end":
            if (partialMessage) {
                partialMessage = event.partial;
                context.messages[context.messages.length - 1] = partialMessage;
                await emit({
                    type: "message_update",
                    assistantMessageEvent: event,
                    message: { ...partialMessage },
                });
            }
            break;

        case "done":
        case "error": {
            const finalMessage = await response.result();
            // ...
            await emit({ type: "message_end", message: finalMessage });
            return finalMessage;
        }
    }
}
```

Three behaviors are worth noting:

1. **Eager partial message insertion**: on the `start` event, the partial message is pushed into `context.messages` immediately. Subsequent `message_update` events overwrite it in place (`context.messages[context.messages.length - 1] = partialMessage`). This is how context-aware compaction can observe the partial assistant turn.

2. **Event passthrough**: every `AssistantMessageEvent` becomes a `message_update` `AgentEvent`, carrying both the inner event and the current partial message snapshot. The TUI and extension subscribers receive both.

3. **Argument validation happens after streaming ends**: once `streamAssistantResponse` returns, `runLoop` calls `executeToolCalls`, which calls `prepareToolCall`, which calls `validateToolArguments`. The streaming phase never validates — only the `toolcall_end` event's finalized `arguments` object is validated.

### When is `validateToolArguments` called?

```ts
// packages/agent/src/agent-loop.ts:578-581
const preparedToolCall = prepareToolCallArguments(tool, toolCall);
const validatedArgs = validateToolArguments(tool, preparedToolCall);
```

`validateToolArguments` uses TypeBox's `Value.Convert` for coercion plus an AJV-style validator. If validation fails, `prepareToolCall` returns an `ImmediateToolCallOutcome` with an error string rather than a `PreparedToolCall` — this feeds an error result back to the LLM without executing anything.

Sources: [packages/agent/src/agent-loop.ts:275-368](), [packages/agent/src/agent-loop.ts:562-626]()

---

## Layer 6: The Overflow Guard (`packages/ai/src/utils/overflow.ts`)

`isContextOverflow` is not part of the streaming path itself — it is called after a stream completes to classify the returned `AssistantMessage`. It answers: "did this request fail because the context window was exceeded?"

The function handles three distinct cases:

| Case | Signal | Providers |
|------|--------|-----------|
| Error-based overflow | `stopReason === "error"` and `errorMessage` matches an `OVERFLOW_PATTERN` regex | Anthropic, OpenAI, Gemini, Groq, xAI, Mistral, Bedrock, llama.cpp, LM Studio, etc. |
| Silent overflow | `stopReason === "stop"` but `usage.input + usage.cacheRead > contextWindow` | z.ai |
| Truncation overflow | `stopReason === "length"` with `output === 0` and input fills `>= 99%` of `contextWindow` | Xiaomi MiMo |

```ts
// packages/ai/src/utils/overflow.ts:122-150
export function isContextOverflow(message: AssistantMessage, contextWindow?: number): boolean {
    if (message.stopReason === "error" && message.errorMessage) {
        const isNonOverflow = NON_OVERFLOW_PATTERNS.some((p) => p.test(message.errorMessage!));
        if (!isNonOverflow && OVERFLOW_PATTERNS.some((p) => p.test(message.errorMessage!))) {
            return true;
        }
    }
    if (contextWindow && message.stopReason === "stop") {
        const inputTokens = message.usage.input + message.usage.cacheRead;
        if (inputTokens > contextWindow) return true;
    }
    if (contextWindow && message.stopReason === "length" && message.usage.output === 0) {
        const inputTokens = message.usage.input + message.usage.cacheRead;
        if (inputTokens >= contextWindow * 0.99) return true;
    }
    return false;
}
```

The `NON_OVERFLOW_PATTERNS` exclusion list exists because some error messages match overflow patterns structurally but mean something different — for example, AWS Bedrock throttling errors contain the phrase "too many tokens" (a generic `OVERFLOW_PATTERNS` match) but are not context window errors. The exclusion list is checked first.

Sources: [packages/ai/src/utils/overflow.ts:33-151]()

---

## End-to-End Event Sequence

```text
Provider HTTP stream
    │
    │  content_block_delta (input_json_delta)
    ▼
Provider adapter
    │  block.partialJson += delta
    │  block.arguments = parseStreamingJson(partialJson)  ← best-effort live object
    │  stream.push({ type: "toolcall_delta", ... })
    ▼
AssistantMessageEventStream.push()
    │  if consumer waiting → resume immediately
    │  else → queue.push(event)
    ▼
agent-loop: for await (const event of response)
    │  partialMessage = event.partial
    │  context.messages[last] = partialMessage           ← live context update
    │  emit({ type: "message_update", ... })
    ▼
AgentSession._emit() / extension runner
    │  subscribers (TUI, RPC clients) receive message_update
    ▼
--- stream ends: toolcall_end fires ---
    │  block.arguments = parseStreamingJson(partialJson) ← final parse
    │  delete block.partialJson
    │  stream.push({ type: "toolcall_end", toolCall: block, ... })
    ▼
streamAssistantResponse returns AssistantMessage
    ▼
executeToolCalls → prepareToolCall
    │  validateToolArguments(tool, toolCall)              ← TypeBox validation
    │  if invalid → ImmediateToolCallOutcome (error to LLM)
    │  if valid   → PreparedToolCall → executePreparedToolCall
    ▼
isContextOverflow(finalMessage, contextWindow?)           ← post-hoc classification
```

---

## Where is backpressure applied?

The honest answer: **it isn't, in the traditional sense**. The `EventStream` queue is unbounded and there is no signal flowing back to the provider to slow down. Backpressure is instead provided by the `await emit(...)` calls in `streamAssistantResponse`: because `emit` is awaited, the agent loop cannot dequeue the next event until the current one has been fully processed by all listeners. If a listener is slow (e.g., a TUI render takes time), the agent loop stalls on `await emit`, which stalls the `for await` consumer, which leaves events buffered in `queue`. This is cooperative backpressure through `async/await` scheduling, not explicit flow control.

---

## Summary

The streaming pipeline is a layered push-pull system: providers push typed events into a rendez-vous queue, the agent loop pulls events via `async iteration` and re-emits them as `AgentEvent`s, and tool arguments are assembled from raw JSON fragments using a fault-tolerant parser cascade (`parseStreamingJson`) that never throws. Validation happens once, after the stream closes, using the fully-assembled argument object from `toolcall_end`. The `isContextOverflow` utility sits outside the stream path and classifies a completed `AssistantMessage` post-hoc, covering provider-specific failure modes that range from explicit error messages to silent truncation detected only via token-usage ratios.

Sources: [packages/ai/src/utils/event-stream.ts:50-66](), [packages/agent/src/agent-loop.ts:310-368]()

---

## 07. The Loop: What Is the Minimal Unit of Agent Work?

> packages/agent/src/agent-loop.ts implements the turn-based cycle: add prompt → call LLM → emit events → execute pending tool calls → repeat until stop. This page asks what each AgentEvent type signals, how tool execution mode (sequential vs. parallel) is chosen, and what the difference is between runAgentLoop and runAgentLoopContinue. The test files agent-loop.test.ts and agent.test.ts show which invariants the authors actually enforce.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/07-the-loop-what-is-the-minimal-unit-of-agent-work.md
- Generated: 2026-05-22T23:28:09.391Z

### Source Files

- `packages/agent/src/agent-loop.ts`
- `packages/agent/src/agent.ts`
- `packages/agent/src/types.ts`
- `packages/agent/test/agent-loop.test.ts`
- `packages/agent/test/agent.test.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/agent/src/agent-loop.ts](packages/agent/src/agent-loop.ts)
- [packages/agent/src/agent.ts](packages/agent/src/agent.ts)
- [packages/agent/src/types.ts](packages/agent/src/types.ts)
- [packages/agent/test/agent-loop.test.ts](packages/agent/test/agent-loop.test.ts)
- [packages/agent/test/agent.test.ts](packages/agent/test/agent.test.ts)
</details>

# The Loop: What Is the Minimal Unit of Agent Work?

The agent loop in `packages/agent/src/agent-loop.ts` defines the fundamental rhythm of this runtime: receive a prompt, call the LLM, dispatch tool calls, collect results, and decide whether to continue. Everything else — state management, subscribers, queuing — is scaffolding around that cycle. This page examines what each turn is made of, what each event type signals to observers, how the loop decides between sequential and parallel tool execution, and what contract distinguishes `runAgentLoop` from `runAgentLoopContinue`.

Understanding this loop is the prerequisite for building anything on top of the `Agent` class: custom tools, context transformations, steering flows, or test harnesses all depend on being able to reason about where in this cycle their hooks fire and what invariants they can rely on.

---

## What is a Turn?

The loop operates at two granularities: the **run** (from first prompt to `agent_end`) and the **turn** (from one `turn_start` to the next `turn_end`). A turn is precisely one LLM call plus all the tool calls that LLM response spawns. The turn ends after all tool results are back; then the loop decides whether to start another turn.

The outer structure in `runLoop` makes this visible:

```typescript
// packages/agent/src/agent-loop.ts:170-254
while (true) {
  let hasMoreToolCalls = true;
  while (hasMoreToolCalls || pendingMessages.length > 0) {
    // ... emit turn_start, inject steering messages, call LLM, execute tools, emit turn_end
  }
  const followUpMessages = (await config.getFollowUpMessages?.()) || [];
  if (followUpMessages.length > 0) { pendingMessages = followUpMessages; continue; }
  break;
}
await emit({ type: "agent_end", messages: newMessages });
```

The inner `while` loop handles tool chaining (the LLM returned tool calls → execute → feed back → LLM again). The outer `while` loop handles the case where new messages arrive from a follow-up queue after the agent would otherwise stop.

Sources: [packages/agent/src/agent-loop.ts:155-269]()

---

## The AgentEvent Taxonomy

Every observable occurrence in a run is surfaced as one of these event types. They form a strict lifecycle that observers can rely on.

| Event type | When it fires | Payload |
|---|---|---|
| `agent_start` | Once, at the very start of any run | none |
| `turn_start` | Before each LLM call (including subsequent turns for tool chains) | none |
| `message_start` | When a new message object is available (user, assistant, or toolResult) | `message: AgentMessage` |
| `message_update` | Each streaming delta from the LLM (text, thinking, toolcall chunks) | `message`, `assistantMessageEvent` |
| `message_end` | When a message is finalized and committed to context | `message: AgentMessage` |
| `tool_execution_start` | When a tool call is about to execute (before `execute()` is called) | `toolCallId`, `toolName`, `args` |
| `tool_execution_update` | Intermediate progress from a long-running tool | `toolCallId`, `toolName`, `args`, `partialResult` |
| `tool_execution_end` | When a tool call finishes (success or error) | `toolCallId`, `toolName`, `result`, `isError` |
| `turn_end` | After all tool results for that turn are ready | `message`, `toolResults` |
| `agent_end` | Once, the last event of any run | `messages: AgentMessage[]` — the full set of new messages |

The type definition is precise and exhaustive:

```typescript
// packages/agent/src/types.ts:403-418
export type AgentEvent =
  | { type: "agent_start" }
  | { type: "agent_end"; messages: AgentMessage[] }
  | { type: "turn_start" }
  | { type: "turn_end"; message: AgentMessage; toolResults: ToolResultMessage[] }
  | { type: "message_start"; message: AgentMessage }
  | { type: "message_update"; message: AgentMessage; assistantMessageEvent: AssistantMessageEvent }
  | { type: "message_end"; message: AgentMessage }
  | { type: "tool_execution_start"; toolCallId: string; toolName: string; args: any }
  | { type: "tool_execution_update"; toolCallId: string; toolName: string; args: any; partialResult: any }
  | { type: "tool_execution_end"; toolCallId: string; toolName: string; result: any; isError: boolean };
```

The `Agent` class consumes these events in `processEvents()` to update its mutable state. `message_end` is when a message is actually appended to `state.messages`; `message_start` only sets `state.streamingMessage`. The `tool_execution_start/end` pair maintains `state.pendingToolCalls` as a live set of in-flight tool call IDs.

Sources: [packages/agent/src/types.ts:396-418](), [packages/agent/src/agent.ts:509-556]()

### The guaranteed ordering invariant

Tests in `agent-loop.test.ts` pin down one non-obvious invariant for parallel mode: `tool_execution_end` fires in completion order (whichever tool finishes first), but `message_start/end` for the resulting `toolResult` messages always fires in the original **source order** (the order the LLM listed the tool calls):

```typescript
// packages/agent/test/agent-loop.test.ts:522-544
expect(toolExecutionEndIds).toEqual(["tool-2", "tool-1"]); // completion order
expect(toolResultIds).toEqual(["tool-1", "tool-2"]);        // source order
expect(turnToolResultIds).toEqual(["tool-1", "tool-2"]);    // source order in turn_end
```

Sources: [packages/agent/test/agent-loop.test.ts:452-545]()

---

## A Single Turn in Sequence

```text
turn_start
  │
  ├─ [inject steering messages → message_start/end for each]
  │
  ├─ LLM call (streaming)
  │     message_start (partial assistant message)
  │     message_update × N (text_delta, thinking_delta, toolcall_delta, …)
  │     message_end (final assistant message)
  │
  ├─ [if tool calls exist]
  │     tool_execution_start × M
  │     tool_execution_update × 0..N per tool
  │     tool_execution_end × M          ← completion order (parallel) or source order (sequential)
  │     message_start/end × M           ← always source order (toolResult messages)
  │
turn_end
```

Sources: [packages/agent/src/agent-loop.ts:155-269](), [packages/agent/test/agent-loop.test.ts:1051-1064]()

---

## How Tool Execution Mode Is Chosen

The loop uses a two-level decision to determine whether tool calls in a single assistant message execute sequentially or concurrently:

```typescript
// packages/agent/src/agent-loop.ts:381-388
async function executeToolCalls(...): Promise<ExecutedToolCallBatch> {
  const toolCalls = assistantMessage.content.filter((c) => c.type === "toolCall");
  const hasSequentialToolCall = toolCalls.some(
    (tc) => currentContext.tools?.find((t) => t.name === tc.name)?.executionMode === "sequential",
  );
  if (config.toolExecution === "sequential" || hasSequentialToolCall) {
    return executeToolCallsSequential(...);
  }
  return executeToolCallsParallel(...);
}
```

**Rule:** if `config.toolExecution` is `"sequential"` OR if **any** tool in the batch has `executionMode: "sequential"` on its definition, the entire batch runs sequentially. One "slow" tool contaminates the whole batch.

| Scenario | Result |
|---|---|
| `config.toolExecution = "parallel"` (default), all tools have no `executionMode` | Parallel |
| `config.toolExecution = "parallel"`, one tool has `executionMode: "sequential"` | Sequential |
| `config.toolExecution = "sequential"`, all tools have `executionMode: "parallel"` | Sequential |
| `config.toolExecution = "parallel"`, all tools have `executionMode: "parallel"` | Parallel |

The tests enforce this precisely. A single `executionMode: "sequential"` tool mixed with a fast tool forces sequential execution even though the config default is parallel:

```typescript
// packages/agent/test/agent-loop.test.ts:736-821
// fast tool has no executionMode (defaults to parallel)
// slow tool has executionMode: "sequential"
// config has no toolExecution (defaults to parallel)
// → execution is sequential: fast tool does NOT start before slow tool finishes
expect(executionOrder[0]).toBe("slow:a");
```

Sources: [packages/agent/src/agent-loop.ts:373-515](), [packages/agent/test/agent-loop.test.ts:653-895]()

### The parallel execution strategy in detail

In `executeToolCallsParallel`, tool calls are **prepared** (validated, `beforeToolCall` called) one at a time in source order, then **executed** concurrently via `Promise.all`. The preparation phase is always serial so that `beforeToolCall` can see a consistent context. Execution begins for each tool as soon as its preparation step completes — the first tool starts executing while the second is still being prepared.

```typescript
// packages/agent/src/agent-loop.ts:484-504
finalizedCalls.push(async () => {
  const executed = await executePreparedToolCall(preparation, signal, emit);
  const finalized = await finalizeExecutedToolCall(...);
  await emitToolExecutionEnd(finalized, emit);
  return finalized;
});
// ...
const orderedFinalizedCalls = await Promise.all(
  finalizedCalls.map((entry) => (typeof entry === "function" ? entry() : Promise.resolve(entry))),
);
```

Tool-result messages are then appended in source order after all tools have finished.

Sources: [packages/agent/src/agent-loop.ts:451-516]()

---

## The Tool Call Lifecycle: Prepare → Execute → Finalize

A single tool call passes through three internal phases before it becomes a `ToolResultMessage`:

1. **Prepare** (`prepareToolCall`): find the tool definition, optionally run `tool.prepareArguments()` to reshape raw LLM arguments, validate against the schema, and call `config.beforeToolCall`. If `beforeToolCall` returns `{ block: true }`, an error result is produced immediately without calling `execute`.

2. **Execute** (`executePreparedToolCall`): call `tool.execute()`. Errors thrown by the tool are caught and converted to error results — the loop does not propagate tool exceptions. Intermediate progress is emitted as `tool_execution_update` events.

3. **Finalize** (`finalizeExecutedToolCall`): call `config.afterToolCall` if set. The hook can override any field in the result: content, details, error flag, or the `terminate` hint.

The `terminate` hint is the mechanism for early loop exit: if **every** tool result in a batch sets `terminate: true`, `shouldTerminateToolBatch` returns true and the loop does not make another LLM call:

```typescript
// packages/agent/src/agent-loop.ts:544-546
function shouldTerminateToolBatch(finalizedCalls: FinalizedToolCallOutcome[]): boolean {
  return finalizedCalls.length > 0 && finalizedCalls.every((finalized) => finalized.result.terminate === true);
}
```

Sources: [packages/agent/src/agent-loop.ts:562-708](), [packages/agent/test/agent-loop.test.ts:1067-1117]()

---

## runAgentLoop vs. runAgentLoopContinue

These are the two entry points into `runLoop`. They differ in exactly one thing: whether the caller is providing new messages or treating the existing context as ready.

| | `runAgentLoop` | `runAgentLoopContinue` |
|---|---|---|
| Takes new prompt messages | Yes (`prompts: AgentMessage[]`) | No |
| Emits `message_start/end` for prompts | Yes, before the first LLM call | No |
| Appends prompts to context | Yes | No (context is used as-is) |
| Returns | New messages including prompts | New messages only (not pre-existing context) |
| Precondition on context | None | Last message must not be `role: "assistant"` |
| Throws if context is empty | No | Yes |

The precondition is enforced explicitly:

```typescript
// packages/agent/src/agent-loop.ts:70-75
if (context.messages.length === 0) {
  throw new Error("Cannot continue: no messages in context");
}
if (context.messages[context.messages.length - 1].role === "assistant") {
  throw new Error("Cannot continue from message role: assistant");
}
```

The reason: the LLM expects a `user` or `toolResult` message as the last entry before it responds. The comment in the public `agentLoopContinue` wrapper calls this "caller responsibility" — the loop cannot validate it because `convertToLlm` (which maps `AgentMessage[]` to `Message[]`) might remap a custom message role at call time.

The test for `continue` confirms the invariant — only the new assistant message is returned, not the pre-existing user message that was already in context:

```typescript
// packages/agent/test/agent-loop.test.ts:1278-1287
const messages = await stream.result();
expect(messages.length).toBe(1);
expect(messages[0].role).toBe("assistant");
// Should NOT have user message events (that's the key difference from agentLoop)
```

Sources: [packages/agent/src/agent-loop.ts:31-143](), [packages/agent/test/agent-loop.test.ts:1233-1351]()

### How Agent wraps both entry points

The `Agent` class calls `runAgentLoop` from `runPromptMessages` and `runAgentLoopContinue` from `runContinuation`. The public `Agent.continue()` method contains extra logic: if the last transcript message is `assistant` and the steering or follow-up queue has items, it drains one batch from those queues and calls `runPromptMessages` instead of `runContinuation`. This prevents the "cannot continue from assistant" error while still processing queued input.

Sources: [packages/agent/src/agent.ts:338-411]()

---

## Steering and Follow-Up Message Injection

The loop polls two external queues at well-defined points:

- **Steering messages** (`getSteeringMessages`): polled after the current assistant turn finishes its tool calls, before starting the next LLM call. They are injected into context and events are emitted for them, then the loop immediately continues. This models "interrupt the agent mid-work."

- **Follow-up messages** (`getFollowUpMessages`): polled only when the inner loop would exit (no more tool calls, no pending steering). They re-enter the outer loop and force another turn. This models "queue the next request until the agent is done."

The `QueueMode` controls how many messages the `PendingMessageQueue` releases on each drain: `"all"` or `"one-at-a-time"` (the default for both queues in `Agent`):

```typescript
// packages/agent/src/agent.ts:211-213
this.steeringQueue = new PendingMessageQueue(options.steeringMode ?? "one-at-a-time");
this.followUpQueue = new PendingMessageQueue(options.followUpMode ?? "one-at-a-time");
```

Sources: [packages/agent/src/types.ts:39-44](), [packages/agent/src/agent.ts:118-152](), [packages/agent/src/agent-loop.ts:167-268]()

---

## The AgentMessage Abstraction Boundary

The loop works exclusively in `AgentMessage[]` — a union of LLM message types plus any custom messages an app registers via declaration merging. The translation to `Message[]` (the type the LLM provider understands) happens exactly once per turn, inside `streamAssistantResponse`, via `config.convertToLlm`. A `transformContext` hook fires before that, at the `AgentMessage` level, where operations like context-window pruning belong.

```typescript
// packages/agent/src/agent-loop.ts:283-289
let messages = context.messages;
if (config.transformContext) {
  messages = await config.transformContext(messages, signal);
}
const llmMessages = await config.convertToLlm(messages);
```

This boundary means the loop's context never leaks provider-specific types. Custom message roles (notification banners, artifact metadata, UI-only annotations) stay in `AgentMessage[]` throughout and are filtered out by `convertToLlm` before the provider sees them.

The default `convertToLlm` in `Agent` is a simple role filter: it passes through only `user`, `assistant`, and `toolResult` messages, dropping anything else.

Sources: [packages/agent/src/agent-loop.ts:275-368](), [packages/agent/src/agent.ts:31-35](), [packages/agent/test/agent-loop.test.ts:131-183]()

---

## Summary

The minimal unit of agent work is the **turn**: one LLM call and the tool executions it triggers. `runLoop` iterates turns until no tool calls remain and no queued messages are waiting. `runAgentLoop` starts a turn sequence by appending new prompt messages and emitting events for them; `runAgentLoopContinue` enters the same loop assuming the context already ends in a user or tool-result message. Tool execution mode is decided per-batch: a single `executionMode: "sequential"` tool overrides the parallel default for the entire batch. Every observable state change — streaming chunks, tool dispatch, message finalization — is expressed as a typed `AgentEvent` that the `Agent` class reduces into mutable state and forwards to application subscribers, with `agent_end` guaranteed as the final event of any run.

Sources: [packages/agent/src/agent-loop.ts:1-268]()

---

## 08. AgentSession: What State Must Survive a Model Switch or Session Resume?

> AgentSession (core/agent-session.ts) is the shared abstraction across interactive, print, and RPC modes. It owns session persistence, model/thinking-level management, bash execution, and auto-compaction triggers. This page asks: what is serialized to disk, which events drive session persistence, how session branching works, and why AgentSession is deliberately mode-agnostic.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/08-agentsession-what-state-must-survive-a-model-switch-or-session-resume.md
- Generated: 2026-05-22T23:29:53.519Z

### Source Files

- `packages/coding-agent/src/core/agent-session.ts`
- `packages/coding-agent/src/core/session-manager.ts`
- `packages/coding-agent/src/core/agent-session-services.ts`
- `packages/coding-agent/src/core/agent-session-runtime.ts`
- `packages/coding-agent/test/suite/agent-session-runtime.test.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/coding-agent/src/core/agent-session.ts](packages/coding-agent/src/core/agent-session.ts)
- [packages/coding-agent/src/core/session-manager.ts](packages/coding-agent/src/core/session-manager.ts)
- [packages/coding-agent/src/core/agent-session-runtime.ts](packages/coding-agent/src/core/agent-session-runtime.ts)
- [packages/coding-agent/src/core/agent-session-services.ts](packages/coding-agent/src/core/agent-session-services.ts)
- [packages/coding-agent/test/suite/agent-session-runtime.test.ts](packages/coding-agent/test/suite/agent-session-runtime.test.ts)
</details>

# AgentSession: What State Must Survive a Model Switch or Session Resume?

`AgentSession` is the shared core abstraction that all run modes — interactive TUI, print (non-interactive), and RPC — build on top of. It owns the coupling between the in-memory agent state (streaming turns, tool calls, message queue) and the durable session log on disk. When a user switches models, resumes a previous session, or forks a conversation branch, `AgentSession` is responsible for ensuring that the right state — and only the right state — moves across that boundary.

This page walks from first principles through what exactly is serialized, which events trigger writes, how branching works mechanically, and why the abstraction is deliberately agnostic to the I/O layer above it.

---

## What Is the Simplest Version?

The simplest possible agent persistence is: write every message to a file. That works until a user switches models partway through a session, compacts context, or resumes a conversation in a different directory. Each of those cases requires persisting *more* than raw messages — you need configuration snapshots, tombstone markers for history replacement, and a way to reconstitute the exact LLM context for a given branch of the conversation.

The actual design therefore stores not just messages but a *typed entry log*, and reconstitutes runtime state from it at load time.

---

## The Session File: Append-Only JSONL with a Tree Structure

The fundamental persistence unit is a JSONL file where each line is a JSON object (a `FileEntry`). The first line is always a `SessionHeader`; every subsequent line is a `SessionEntry` with `id` and `parentId` fields that form a tree.

```
SessionHeader  { type: "session", version: 3, id, timestamp, cwd, parentSession? }
SessionEntry   { type, id, parentId, timestamp, ...type-specific fields }
SessionEntry   ...
```

The `SessionManager` class owns all writes to this file.

Sources: [packages/coding-agent/src/core/session-manager.ts:30-37](), [packages/coding-agent/src/core/session-manager.ts:700-720]()

### Entry Types and What They Encode

| Entry type | What it encodes | Participates in LLM context? |
|---|---|---|
| `message` | A single `AgentMessage` (user, assistant, toolResult, custom) | Yes |
| `model_change` | Provider + model ID snapshot | Yes — replayed to restore model on resume |
| `thinking_level_change` | ThinkingLevel string | Yes — replayed to restore thinking level |
| `compaction` | Summary text, `firstKeptEntryId`, token count before compaction, optional extension details | Yes — summary injected as first message |
| `branch_summary` | Summary of a diverged branch, `fromId` pointer | Yes — injected as synthetic message |
| `custom_message` | Extension-injected user message (shown in TUI) | Yes |
| `custom` | Extension-specific opaque data | No — for extension state only |
| `label` | User-defined bookmark on any entry | No |
| `session_info` | Display name for the session | No |

Sources: [packages/coding-agent/src/core/session-manager.ts:44-147]()

**Key distinction:** `custom` (opaque data bag for extension state reconstruction) vs `custom_message` (actually becomes a user-role message in the LLM context). Both survive a session resume; only `custom_message` changes what the model sees.

---

## When Does a Write Happen?

### Message Persistence: `message_end` Event

`AgentSession` subscribes to every `AgentEvent` from the underlying `Agent` core. On `message_end`, it calls the appropriate `SessionManager.append*` method:

```ts
// packages/coding-agent/src/core/agent-session.ts:498-517
if (event.type === "message_end") {
    if (event.message.role === "custom") {
        this.sessionManager.appendCustomMessageEntry(...)
    } else if (
        event.message.role === "user" ||
        event.message.role === "assistant" ||
        event.message.role === "toolResult"
    ) {
        this.sessionManager.appendMessage(event.message);
    }
}
```

This means persistence is event-driven and synchronous with the streaming pipeline. The session file grows one entry at a time as the agent turn progresses, not in a batch at the end.

Sources: [packages/coding-agent/src/core/agent-session.ts:498-517]()

### Configuration Change Persistence

Model and thinking-level changes are written immediately when they are applied, before the next turn:

- `setModel()` calls `sessionManager.appendModelChange(provider, id)` — Sources: [packages/coding-agent/src/core/agent-session.ts:1417-1431]()
- `setThinkingLevel()` calls `sessionManager.appendThinkingLevelChange(level)` only when the level actually changes — Sources: [packages/coding-agent/src/core/agent-session.ts:1519-1531]()

This is essential: when the session is resumed, `buildSessionContext()` replays entries in branch order, and it extracts the *last* `model_change` and `thinking_level_change` entries it encounters to set `model` and `thinkingLevel` on the reconstituted `SessionContext`. Without persisting these as log entries, a model switch made mid-session would be invisible on resume.

### Compaction Persistence

Compaction appends a `CompactionEntry` with:
- `summary`: the LLM-generated summary text
- `firstKeptEntryId`: which entry is the oldest one still included verbatim in context
- `tokensBefore`: count for display/analytics
- `details`: optional extension-specific opaque data

After appending, `buildSessionContext()` is called to rebuild in-memory messages from the new log state — Sources: [packages/coding-agent/src/core/agent-session.ts:1692-1695]()

### The Deferred Write Optimization

`SessionManager._persist()` has a deliberate optimization: it does not write anything to disk until the first assistant message arrives. Up to that point, a conversation consists only of user messages, which have no value without a response. Once the first assistant message is received, the entire accumulated entry list is flushed in one batch, then subsequent entries are appended one-by-one.

```ts
// packages/coding-agent/src/core/session-manager.ts:843-861
_persist(entry: SessionEntry): void {
    if (!this.persist || !this.sessionFile) return;

    const hasAssistant = this.fileEntries.some(
        (e) => e.type === "message" && e.message.role === "assistant"
    );
    if (!hasAssistant) {
        this.flushed = false;
        return;
    }

    if (!this.flushed) {
        for (const e of this.fileEntries) {
            appendFileSync(this.sessionFile, `${JSON.stringify(e)}\n`);
        }
        this.flushed = true;
    } else {
        appendFileSync(this.sessionFile, `${JSON.stringify(entry)}\n`);
    }
}
```

This means a session file does not appear on disk (or grow in size) until at least one assistant response has been generated. Prompt-only runs that the user aborts immediately leave no file behind.

Sources: [packages/coding-agent/src/core/session-manager.ts:843-861]()

---

## Reconstituting State on Resume: `buildSessionContext()`

The counterpart to the write path is `buildSessionContext()`, a pure function that accepts the flat entry list and a `leafId` and walks the tree from leaf to root:

```
leaf → parent → ... → root
```

The walk collects (in leaf-wins order):
- The last `thinking_level_change` → `thinkingLevel`
- The last `model_change` or last assistant `message` (which embeds provider/model) → `model`
- The last `compaction` entry → compaction boundary

Then it rebuilds the message array:

1. If a compaction boundary exists: emit the summary message first, then emit only the entries from `firstKeptEntryId` onward (pre-compaction), then all entries after the compaction marker.
2. If no compaction: emit all messages in path order.

Sources: [packages/coding-agent/src/core/session-manager.ts:315-421]()

The rebuilt `SessionContext` is then applied directly to the live agent state:

```ts
// packages/coding-agent/src/core/agent-session.ts:1974-1975
const sessionContext = this.sessionManager.buildSessionContext();
this.agent.state.messages = sessionContext.messages;
```

This means the in-memory agent state is always derivable from the JSONL file. There is no separate in-memory cache that can diverge from the persisted log — the log *is* the source of truth.

---

## What Survives a Model Switch?

When `setModel()` is called:

1. A new `ModelChangeEntry` is appended to the session log.
2. `agent.state.model` is updated immediately.
3. The thinking level is re-clamped to the new model's capabilities (different models support different thinking level sets).
4. `settingsManager.setDefaultModelAndProvider()` records the choice as the new default for future sessions.

```ts
// packages/coding-agent/src/core/agent-session.ts:1417-1431
async setModel(model: Model<any>): Promise<void> {
    ...
    this.agent.state.model = model;
    this.sessionManager.appendModelChange(model.provider, model.id);
    this.settingsManager.setDefaultModelAndProvider(model.provider, model.id);
    this.setThinkingLevel(thinkingLevel);
    ...
}
```

The conversation history is **not affected** — all prior messages remain in context. The change only affects which provider and model handle the *next* LLM call. On session resume, `buildSessionContext()` replays the `model_change` entry so the correct model is restored without user intervention.

Sources: [packages/coding-agent/src/core/agent-session.ts:1417-1473]()

### Thinking Level Clamping

A model that does not support reasoning will not honor a `thinking_level_change` entry higher than `"off"`. `clampThinkingLevel()` from `@earendil-works/pi-ai` is applied every time the model changes, ensuring the persisted level is always a value the current model can actually use. This is important when a session is resumed on a machine where a previously-used model is not available.

Sources: [packages/coding-agent/src/core/agent-session.ts:1566-1578]()

---

## Session Branching: How Forks Work

The tree structure (`id`/`parentId`) exists specifically to support branching. Forking creates a new session file that is a copy of the original up to a chosen entry, then diverges:

```
Original session (copied):
  header → e1 → e2 → e3(user) → e4(assistant) → leaf

Fork "before" e3:
  New session starts with parentSession = original file path
  leafId in new session = e3.parentId (= e2)
  The forked session's new entries are children of e2
```

`SessionManager.createBranchedSession()` physically copies the current JSONL file and sets `leafId` to the target entry's parent (for a "before" fork) or the entry itself (for an "at" fork). The copy captures the full history; the new leaf pointer determines what context the agent sees on the next turn.

Sources: [packages/coding-agent/src/core/agent-session-runtime.ts:246-330]()

The `SessionHeader.parentSession` field stores the original file path, creating an audit trail of which session a fork originated from.

### Branch Summaries

When a session is forked, the diverged branch can be summarized into a `BranchSummaryEntry`. On resume of the original session, `buildSessionContext()` encounters this entry and injects a synthetic user message explaining what happened in the diverged branch. This lets the LLM retain awareness of exploratory forks without including the full fork context verbatim.

Sources: [packages/coding-agent/src/core/session-manager.ts:78-86](), [packages/coding-agent/src/core/session-manager.ts:385-387]()

---

## The Session Lifecycle Event Model

`AgentSessionRuntime` wraps `AgentSession` and owns the higher-level session replacement flows (new, resume, fork, import). Every replacement follows a strict sequence of extension events:

```text
session_before_switch / session_before_fork  (cancellable)
        ↓
session_shutdown  (teardown of old session)
        ↓
[new AgentSession created, old one disposed]
        ↓
session_start  (startup of new session)
```

Sources: [packages/coding-agent/test/suite/agent-session-runtime.test.ts:164-207]()

The test confirms exact event ordering:
```ts
expect(events).toEqual([
    { type: "session_before_switch", reason: "new", targetSessionFile: undefined },
    { type: "session_shutdown", reason: "new", targetSessionFile: secondSessionFile },
    { type: "session_start", reason: "new", previousSessionFile: originalSessionFile },
]);
```

This sequence matters for extensions that maintain their own state (e.g., an artifact index stored in `custom` entries). The `session_shutdown` event gives them a chance to flush; the `session_start` event gives the new context a chance to scan existing entries and reconstruct their state.

---

## Why AgentSession Is Mode-Agnostic

The class header is explicit:

```ts
/**
 * This class is shared between all run modes (interactive, print, rpc).
 * Modes use this class and add their own I/O layer on top.
 */
```

Sources: [packages/coding-agent/src/core/agent-session.ts:1-14]()

All event emission goes through the `_emit()` method, which broadcasts to a list of `AgentSessionEventListener` callbacks. Interactive mode, print mode, and RPC mode each add their own listener via `subscribe()`. Session persistence happens in `_handleAgentEvent` which runs *before* listeners are notified, so the data is on disk before any UI reacts.

This design means:
- The interactive TUI can subscribe to events and render streaming output.
- RPC mode can subscribe to the same events and serialize them over a protocol.
- Neither mode needs to know that the other exists, and neither is responsible for persistence.
- The `AgentSessionRuntime` layer, which manages session switching and forking, operates on `AgentSession` instances without knowing which I/O layer is attached.

The only mode-specific coupling is the `ExtensionUIContext` and `ExtensionCommandContextActions` passed via `bindExtensions()`, which give extensions access to UI primitives. These are injected by the host (interactive mode wires its TUI here) but have no effect on persistence.

Sources: [packages/coding-agent/src/core/agent-session.ts:668-683](), [packages/coding-agent/src/core/agent-session.ts:2041-2060]()

---

## State Diagram: Session Persistence Lifecycle

```text
┌──────────────────────────────────────────────────────┐
│ AgentSession                                          │
│                                                       │
│  agent.subscribe(_handleAgentEvent)                   │
│         │                                             │
│         ▼                                             │
│  message_end ──► sessionManager.appendMessage()      │
│  model change ──► sessionManager.appendModelChange() │
│  thinking chg ──► sessionManager.appendThinkingLevelChange() │
│  compaction ──► sessionManager.appendCompaction()    │
│         │                                             │
│         ▼                                             │
│  _emit(event) ──► listeners (TUI, RPC, print)        │
└──────────────────────────────────────────────────────┘
         │
         ▼
┌──────────────────────────────────────────────────────┐
│ SessionManager (JSONL file on disk)                   │
│                                                       │
│  [header]                                             │
│  [message: user, parentId=null]                       │
│  [message: assistant, parentId=e1]                    │
│  [model_change, parentId=e2]                          │
│  [compaction, firstKeptEntryId=e1, parentId=e3]       │
│  [message: user, parentId=e4]                         │
│  ...                                                  │
│                                                       │
│  leafId = last entry id                               │
└──────────────────────────────────────────────────────┘
         │
         ▼ buildSessionContext(leafId)
┌──────────────────────────────────────────────────────┐
│ SessionContext { messages[], thinkingLevel, model }   │
│  (what the LLM sees on next turn)                     │
└──────────────────────────────────────────────────────┘
```

---

## Summary

`AgentSession` ensures durable state by treating every important runtime change as an appended log entry: messages arrive via `message_end` events, model and thinking-level transitions are appended immediately on change, and compaction writes a boundary marker with a summary that replaces the preceding history. The JSONL file's tree structure (`id`/`parentId`) enables branching and non-destructive history editing. On resume, `buildSessionContext()` walks from the current leaf to the root, replaying configuration changes and respecting compaction boundaries to rebuild exactly the right in-memory context. The abstraction is mode-agnostic by design: all I/O layers attach as event listeners on top of `AgentSession`, while persistence runs unconditionally in the internal `_handleAgentEvent` handler before any listener is notified.

Sources: [packages/coding-agent/src/core/agent-session.ts:334-337](), [packages/coding-agent/src/core/session-manager.ts:1087-1093]()

---

## 09. Compaction: When the Context Window Is the Enemy, What Gets Thrown Away?

> As conversations grow, token counts approach the context window limit. The compaction subsystem (core/compaction/) answers: when to compact, which messages to summarize, how branch-level summaries differ from turn-level summaries, and how the agent resumes coherently after a compaction round. The tests agent-session-compaction.test.ts and harness/compaction.test.ts reveal the boundary conditions.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/09-compaction-when-the-context-window-is-the-enemy-what-gets-thrown-away.md
- Generated: 2026-05-22T23:28:41.792Z

### Source Files

- `packages/coding-agent/src/core/compaction/compaction.ts`
- `packages/coding-agent/src/core/compaction/branch-summarization.ts`
- `packages/coding-agent/src/core/compaction/utils.ts`
- `packages/coding-agent/test/suite/agent-session-compaction.test.ts`
- `packages/agent/test/harness/compaction.test.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/coding-agent/src/core/compaction/compaction.ts](packages/coding-agent/src/core/compaction/compaction.ts)
- [packages/coding-agent/src/core/compaction/branch-summarization.ts](packages/coding-agent/src/core/compaction/branch-summarization.ts)
- [packages/coding-agent/src/core/compaction/utils.ts](packages/coding-agent/src/core/compaction/utils.ts)
- [packages/coding-agent/src/core/agent-session.ts](packages/coding-agent/src/core/agent-session.ts)
- [packages/coding-agent/test/suite/agent-session-compaction.test.ts](packages/coding-agent/test/suite/agent-session-compaction.test.ts)
- [packages/agent/test/harness/compaction.test.ts](packages/agent/test/harness/compaction.test.ts)
</details>

# Compaction: When the Context Window Is the Enemy, What Gets Thrown Away?

Every conversation with a language model is a race against a fixed budget. Token counts accumulate — user prompts, assistant replies, tool calls, tool results, images, bash output — and eventually the window fills. When that happens, the model receives an overflow error or, worse, silently degrades. The compaction subsystem answers this problem by deciding, precisely and reversibly, what history to discard and what to keep, and by generating a structured summary that lets the agent resume as if it remembered everything.

This page traces the full lifecycle: how the system detects that a compaction is needed, where the cut point falls, which messages are summarized versus kept, how branch-level summarization differs from turn-level summarization, and how the agent reconstructs coherent state after a compaction round.

---

## Why the Problem Is Harder Than It Looks

The naive answer — "drop the oldest messages" — breaks tool integrity. A `toolResult` message must follow its parent `toolCall`; cutting between them corrupts the conversation structure. Equally, slicing in the middle of a multi-step assistant turn (one user prompt generating several tool calls and partial replies) leaves the kept suffix without sufficient context. The compaction system must reason about conversational boundaries, not raw token counts.

---

## When Does Compaction Trigger?

The session checks for compaction in two places, both via the private `_checkCompaction` method in `AgentSession`:

1. **After every assistant response** — normal housekeeping.
2. **Before resending a prompt**, if the last assistant message was aborted.

Two trigger modes exist:

| Trigger | Condition | `willRetry` |
|---------|-----------|-------------|
| **Overflow** | `isContextOverflow(assistantMessage)` — the LLM returned a "prompt is too long" error | `true` — agent retries immediately after compaction |
| **Threshold** | `contextTokens > contextWindow - reserveTokens` | `false` — next user turn is fine |

The threshold check uses `calculateContextTokens` from actual `usage` data when available, falling back to `estimateContextTokens` for error messages (which carry no usage). For error messages, the system walks backwards to find the last *successful* assistant response's usage, then estimates trailing tokens from messages after that point.

```typescript
// packages/coding-agent/src/core/compaction/compaction.ts:219-222
export function shouldCompact(contextTokens: number, contextWindow: number, settings: CompactionSettings): boolean {
  if (!settings.enabled) return false;
  return contextTokens > contextWindow - settings.reserveTokens;
}
```

Default settings: `reserveTokens: 16384`, `keepRecentTokens: 20000`, `enabled: true`.

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:115-125](), [packages/coding-agent/src/core/agent-session.ts:1768-1845]()

---

## The Cut Point: Where to Slice History

Once a compaction is decided, `prepareCompaction` computes exactly which session entries get summarized and which get kept.

### The Budget Walk

`findCutPoint` walks the entry list **backwards from newest to oldest**, accumulating estimated token sizes. When the running total reaches `keepRecentTokens`, it stops and marks that position as the cut point.

```typescript
// packages/coding-agent/src/core/compaction/compaction.ts:386-448
export function findCutPoint(entries, startIndex, endIndex, keepRecentTokens): CutPointResult {
  // Walks backwards, accumulates tokens, stops at keepRecentTokens
  // Returns firstKeptEntryIndex, turnStartIndex, isSplitTurn
}
```

### Valid Cut Points

Not all entries are eligible cut points. `findValidCutPoints` excludes `toolResult` messages entirely — they cannot stand alone without their preceding tool call. Valid positions are:

- `user` messages
- `assistant` messages (when cut here, subsequent tool results come with them)
- `bashExecution` messages (treated as user-role context)
- `branch_summary` and `custom_message` entries (they represent user-initiated context)

Non-message entries like `model_change`, `thinking_level_change`, `label`, and `compaction` markers are never cut points.

### Split Turn Detection

A "split turn" occurs when the cut falls inside a turn — for example, when a user's request spawned a very long assistant response whose prefix must be dropped but whose suffix must be kept. In this case:

- `isSplitTurn: true`
- `turnStartIndex` points to the user message that opened the turn
- `turnPrefixMessages` holds the discarded prefix of that turn
- `firstKeptEntryIndex` is inside the turn, not at a turn boundary

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:299-448]()

---

## Token Estimation

Because token counts cannot always be read from the LLM's usage field (especially for messages that haven't been sent yet), the system uses a conservative `chars / 4` heuristic:

```typescript
// packages/coding-agent/src/core/compaction/compaction.ts:232-290
export function estimateTokens(message: AgentMessage): number {
  // user/assistant/custom/toolResult/bashExecution/branchSummary/compactionSummary
  // images: estimated at 4800 chars (≈1200 tokens)
  return Math.ceil(chars / 4);
}
```

Images are hard-coded at 4,800 char-equivalents (1,200 tokens), a deliberate overestimate to be conservative.

`estimateContextTokens` combines real usage data with estimated trailing tokens:

```
total = lastAssistantUsageTokens + estimatedTokensAfterLastAssistant
```

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:186-214]()

---

## What Gets Summarized vs. Kept

```text
Session entries (chronological):
  ┌──────────────────────────────────────────────────┐
  │  [prior compaction marker]                        │
  │  [u1] user msg                                    │  ← boundaryStart
  │  [a1] assistant msg                               │
  │  [u2] user msg                                    │  ← historyEnd (to summarize)
  │  [a2] assistant msg (start of last kept turn)     │  ← firstKeptEntryIndex (kept)
  │  [u3] user msg                                    │
  │  [a3] assistant msg (most recent)                 │  ← boundaryEnd
  └──────────────────────────────────────────────────┘

  messagesToSummarize = [u1..a1] → discarded, replaced by summary
  kept = [a2..a3] → passed to LLM as context
```

For a **split turn**:

```text
  ┌──────────────────────────────────────────────────┐
  │  [u1] user msg (very long turn starts)            │  ← turnStartIndex
  │  [a1] assistant reply part 1                      │  ← turnPrefixMessages (summarized separately)
  │  [a2] assistant reply part 2 (suffix, kept)       │  ← firstKeptEntryIndex
  └──────────────────────────────────────────────────┘
```

In the split-turn case, two LLM summarization calls run **in parallel**:

1. History summary (prior turns) — uses `SUMMARIZATION_PROMPT` or `UPDATE_SUMMARIZATION_PROMPT`
2. Turn-prefix summary — uses `TURN_PREFIX_SUMMARIZATION_PROMPT` (smaller budget: 50% of `reserveTokens`)

The final compaction summary is their concatenation, separated by a horizontal rule and a `**Turn Context (split turn):**` header.

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:644-830]()

---

## The Summary Format

The LLM is asked to produce a structured Markdown checkpoint following a rigid template:

```
## Goal
## Constraints & Preferences
## Progress
  ### Done
  ### In Progress
  ### Blocked
## Key Decisions
## Next Steps
## Critical Context
```

For repeat compactions, an `UPDATE_SUMMARIZATION_PROMPT` is used instead, instructing the model to merge new activity into the `previousSummary` from the prior compaction — preserving history across multiple compaction rounds without re-processing everything from scratch.

The system prompt for the summarization LLM call explicitly prevents conversation continuation:

```typescript
// packages/coding-agent/src/core/compaction/utils.ts:168-170
export const SUMMARIZATION_SYSTEM_PROMPT = `You are a context summarization assistant...
Do NOT continue the conversation. ONLY output the structured summary.`;
```

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:454-524](), [packages/coding-agent/src/core/compaction/utils.ts:168-171]()

---

## File Operation Tracking

Both compaction and branch summarization track which files the agent read or modified during the discarded conversation. This is appended to the summary as XML tags:

```xml
<read-files>
src/index.ts
</read-files>

<modified-files>
src/core/session-manager.ts
</modified-files>
```

File operations are collected from:
1. Tool calls in the messages being discarded (`read`, `write`, `edit` tool names)
2. The `details` field of the previous compaction entry (cumulative carry-forward)

`modifiedFiles` = union of `write` and `edit` operations. `readFiles` = files read but never modified in the same session window.

```typescript
// packages/coding-agent/src/core/compaction/utils.ts:62-66
export function computeFileLists(fileOps: FileOperations): { readFiles: string[]; modifiedFiles: string[] } {
  const modified = new Set([...fileOps.edited, ...fileOps.written]);
  const readOnly = [...fileOps.read].filter((f) => !modified.has(f)).sort();
  ...
}
```

Sources: [packages/coding-agent/src/core/compaction/utils.ts:29-82]()

---

## Branch Summarization: A Different Kind of Summary

Turn-level compaction discards old history when the window fills. Branch summarization is a different concept: when the user navigates away from one conversation branch (in a tree-structured session), the branch being abandoned is summarized so that its context is not lost when the user returns.

### Key Differences

| Aspect | Turn Compaction | Branch Summarization |
|--------|----------------|---------------------|
| Trigger | Context window pressure | Branch navigation |
| Cut point | `keepRecentTokens` budget | No cut — whole branch |
| Token budget | `keepRecentTokens` for kept portion | `contextWindow - reserveTokens` |
| Compaction entry? | Yes, written to session | No — creates `branch_summary` entry |
| Previous summary? | Yes, iterative update | No |
| Summary format | Full structured checkpoint | Branch-focused summary |
| Preamble | None | "The user explored a different conversation branch before returning here." |

Branch summarization walks the session tree from the old leaf position back to the **common ancestor** with the target position, collects those entries in chronological order, and summarizes them. Compaction boundaries inside the branch are **not** stopped at — existing compaction summaries are included as context.

```typescript
// packages/coding-agent/src/core/compaction/branch-summarization.ts:98-136
export function collectEntriesForBranchSummary(session, oldLeafId, targetId): CollectEntriesResult {
  // Find common ancestor via set intersection of branch paths
  // Walk from old leaf to ancestor, collecting entries
}
```

When a branch is very long, `prepareBranchEntries` uses a **newest-first** walk with a token budget, keeping the most recent context if the branch cannot fit in the window. Unlike turn compaction, summaries and compaction entries can be squeezed in past the soft budget limit (up to 90% consumed) because they carry high-value context.

Sources: [packages/coding-agent/src/core/compaction/branch-summarization.ts:86-237]()

---

## The Compaction Lifecycle (Session Layer)

```mermaid
stateDiagram-v2
    [*] --> Running: agent prompt
    Running --> CheckCompaction: assistant response received
    CheckCompaction --> Running: below threshold
    CheckCompaction --> AutoCompaction: threshold exceeded
    CheckCompaction --> OverflowCompaction: LLM overflow error
    AutoCompaction --> ExtensionHook: session_before_compact event
    OverflowCompaction --> ExtensionHook: session_before_compact event
    ExtensionHook --> Summarizing: proceed (or extension provides summary)
    ExtensionHook --> Cancelled: extension cancels
    Summarizing --> SessionReload: summary saved to session file
    SessionReload --> Running: agent state rebuilt from new context
    OverflowCompaction --> Running: willRetry=true, agent.continue()
    Cancelled --> Running: no-op
```

The session emits `compaction_start` and `compaction_end` events for observability. The `compaction_end` event carries the result, the reason (`"manual"`, `"threshold"`, or `"overflow"`), and whether a retry is pending.

### Overflow Recovery Guard

A guard prevents infinite overflow loops: if overflow recovery has already been attempted once in the current turn, subsequent overflow errors are reported as a fatal event instead of triggering another compaction:

```typescript
// packages/coding-agent/src/core/agent-session.ts:1796
if (this._overflowRecoveryAttempted) {
  this._emit({ type: "compaction_end", reason: "overflow", result: undefined, ... });
  // message: "Context overflow recovery failed after one compact-and-retry attempt."
}
```

Sources: [packages/coding-agent/src/core/agent-session.ts:1768-2020]()

---

## Extension Hook: Custom Compaction

Extensions can intercept compaction via `session_before_compact`. The event carries a `CompactionPreparation` object with `firstKeptEntryId`, `messagesToSummarize`, `tokensBefore`, and `fileOps`. An extension can either:

- Return `{ compaction: { summary, firstKeptEntryId, tokensBefore, details } }` to supply its own summary (skipping the LLM call)
- Return `{ cancel: true }` to cancel the compaction
- Return nothing to let the default summarization run

This is how structured artifact indices (e.g., ArtifactIndex) are wired into compaction: the extension generates richer metadata and stores it in `details`, which is carried forward in the `CompactionEntry`.

Sources: [packages/coding-agent/test/suite/agent-session-compaction.test.ts:97-123]()

---

## Resuming After Compaction: How the Agent Reconstructs State

After a compaction, the session is reloaded. `buildSessionContext` walks the session entries from root to leaf and, when it encounters a `compaction` entry, emits a `compactionSummary` role message — a synthetic message type that the LLM sees instead of all the discarded turns:

```typescript
// packages/agent/test/harness/compaction.test.ts:316-327
it("builds session context with a compaction entry", () => {
  const loaded = buildSessionContext([u1, a1, u2, a2, compaction, u3, a3]);
  expect(loaded.messages).toHaveLength(5);
  expect(loaded.messages[0]?.role).toBe("compactionSummary");
});
```

The agent resumes from this reconstructed context. The LLM receives: one `compactionSummary` message containing the structured checkpoint, followed by all kept messages. From the model's perspective, history before the compaction is a single coherent narrative rather than a raw message dump.

---

## Boundary Conditions (from Tests)

The test suites encode the edge cases that the implementation must handle correctly:

| Scenario | Behavior |
|----------|----------|
| Last session entry is a `compaction` marker | `prepareCompaction` returns `undefined` — nothing to compact |
| Error message with no prior usage | No threshold compaction; cannot estimate context size |
| Error message with prior successful usage | Uses last successful usage + estimated trailing tokens |
| Stale pre-compaction usage kept across boundary | Ignored — timestamp check prevents false threshold trigger |
| `toolResult` at cut-point search | Never a valid cut point; `findCutPoint` skips it |
| All entries are `thinking_level_change` / `model_change` | `findCutPoint` falls back to `firstKeptEntryIndex: 0`, `isSplitTurn: false` |
| `branch_summary` / `custom_message` at entry | Valid cut point and valid turn-start marker |
| Compaction entry between user and assistant | `findCutPoint` stops backward scan at compaction boundary |
| Overflow recovery already attempted | Fatal error emitted, no second retry |
| `maxTokens` capped by model output limit | `Math.min(0.8 * reserveTokens, model.maxTokens)` |

Sources: [packages/agent/test/harness/compaction.test.ts:150-670](), [packages/coding-agent/test/suite/agent-session-compaction.test.ts:86-407]()

---

## Summary

The compaction subsystem is a surgical trimmer, not a blunt truncator. It preserves the most recent `keepRecentTokens` of conversation history by walking backwards from the newest message, finds a structurally valid cut point (never inside a tool call/result pair), and replaces everything older with a structured LLM-generated summary that updates iteratively across multiple compaction rounds. Branch navigation triggers a separate but parallel mechanism that summarizes abandoned session tree branches rather than overflow history. After any compaction, the agent resumes from a reconstructed context where the entire discarded history is visible to the model as a single `compactionSummary` role message — keeping coherence without keeping tokens.

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:644-831]()

---

## 10. Built-In Tools: What Can the Agent Actually Do to a Filesystem?

> The coding agent ships six built-in tools: Read, Write, Edit, Bash, Grep/Find, and Ls. This page asks how each tool definition wraps the underlying operation (tool-definition-wrapper.ts), what file-mutation-queue.ts serializes to prevent concurrent edits, how bash.ts sandboxes commands, and what output-accumulator.ts does to keep large tool results from overflowing the context. The tools/ directory is the system's ground-level action surface.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/10-built-in-tools-what-can-the-agent-actually-do-to-a-filesystem.md
- Generated: 2026-05-22T23:28:57.679Z

### Source Files

- `packages/coding-agent/src/core/tools/tool-definition-wrapper.ts`
- `packages/coding-agent/src/core/tools/file-mutation-queue.ts`
- `packages/coding-agent/src/core/tools/bash.ts`
- `packages/coding-agent/src/core/tools/edit.ts`
- `packages/coding-agent/src/core/tools/output-accumulator.ts`
- `packages/coding-agent/test/tools.test.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/coding-agent/src/core/tools/tool-definition-wrapper.ts](packages/coding-agent/src/core/tools/tool-definition-wrapper.ts)
- [packages/coding-agent/src/core/tools/file-mutation-queue.ts](packages/coding-agent/src/core/tools/file-mutation-queue.ts)
- [packages/coding-agent/src/core/tools/bash.ts](packages/coding-agent/src/core/tools/bash.ts)
- [packages/coding-agent/src/core/tools/edit.ts](packages/coding-agent/src/core/tools/edit.ts)
- [packages/coding-agent/src/core/tools/output-accumulator.ts](packages/coding-agent/src/core/tools/output-accumulator.ts)
- [packages/coding-agent/src/core/tools/read.ts](packages/coding-agent/src/core/tools/read.ts)
- [packages/coding-agent/src/core/tools/write.ts](packages/coding-agent/src/core/tools/write.ts)
- [packages/coding-agent/src/core/tools/grep.ts](packages/coding-agent/src/core/tools/grep.ts)
- [packages/coding-agent/src/core/tools/find.ts](packages/coding-agent/src/core/tools/find.ts)
- [packages/coding-agent/src/core/tools/ls.ts](packages/coding-agent/src/core/tools/ls.ts)
- [packages/coding-agent/src/core/tools/truncate.ts](packages/coding-agent/src/core/tools/truncate.ts)
- [packages/coding-agent/test/tools.test.ts](packages/coding-agent/test/tools.test.ts)
</details>

# Built-In Tools: What Can the Agent Actually Do to a Filesystem?

The coding agent ships seven built-in tools that form its complete ground-level action surface on a filesystem: **Read**, **Write**, **Edit**, **Bash**, **Grep**, **Find**, and **Ls**. Each tool is defined via a `ToolDefinition` object and bridged into the core runtime through a thin wrapper. Beneath that surface lie two shared services that make concurrent use safe: a per-file serialization queue that prevents torn writes, and a streaming output accumulator that bounds memory consumption for long-running commands. Understanding these layers explains both what the agent *can* do and what it is *prevented* from doing accidentally.

This page traces each tool from its public interface down to the operating-system call it ultimately makes, examines the concurrency and truncation mechanisms that guard against data loss and context overflow, and surfaces the design assumptions encoded in each layer.

---

## The `ToolDefinition` → `AgentTool` Bridge

### What problem does a wrapper solve?

The agent runtime (`@earendil-works/pi-agent-core`) speaks `AgentTool`. The tools directory prefers a richer `ToolDefinition` that additionally carries prompt metadata, TUI render callbacks, and a `prepareArguments` hook for argument normalization. The wrapper erases those extras when handing control to the runtime.

### What the wrapper actually does

```ts
// packages/coding-agent/src/core/tools/tool-definition-wrapper.ts:5-18
export function wrapToolDefinition<TDetails = unknown>(
  definition: ToolDefinition<any, TDetails>,
  ctxFactory?: () => ExtensionContext,
): AgentTool<any, TDetails> {
  return {
    name: definition.name,
    label: definition.label,
    description: definition.description,
    parameters: definition.parameters,
    prepareArguments: definition.prepareArguments,
    executionMode: definition.executionMode,
    execute: (toolCallId, params, signal, onUpdate) =>
      definition.execute(toolCallId, params, signal, onUpdate, ctxFactory?.() as ExtensionContext),
  };
}
```

The bridge is deliberately minimal: it copies the five scalar fields that `AgentTool` needs, keeps `prepareArguments` for argument normalization, and injects an `ExtensionContext` when one is available. The render callbacks (`renderCall`, `renderResult`) stay inside `ToolDefinition` and are consumed only by the TUI layer.

An inverse path also exists: `createToolDefinitionFromAgentTool` synthesizes a minimal `ToolDefinition` from a plain `AgentTool`, keeping the registry definition-first even for caller-supplied overrides.

Sources: [packages/coding-agent/src/core/tools/tool-definition-wrapper.ts:5-45]()

---

## The `FileMutationQueue`: Serializing Concurrent Writes

### What would break without it?

Two simultaneous `edit` or `write` calls targeting the same file would race at the OS level: read-then-write sequences could interleave, leaving the file in a corrupt or partially updated state. The model can and does issue multiple tool calls in a single turn.

### How the queue works

```ts
// packages/coding-agent/src/core/tools/file-mutation-queue.ts:4-39
const fileMutationQueues = new Map<string, Promise<void>>();

export async function withFileMutationQueue<T>(filePath: string, fn: () => Promise<T>): Promise<T> {
  const key = getMutationQueueKey(filePath);
  const currentQueue = fileMutationQueues.get(key) ?? Promise.resolve();

  let releaseNext!: () => void;
  const nextQueue = new Promise<void>((resolveQueue) => { releaseNext = resolveQueue; });
  const chainedQueue = currentQueue.then(() => nextQueue);
  fileMutationQueues.set(key, chainedQueue);

  await currentQueue;
  try {
    return await fn();
  } finally {
    releaseNext();
    if (fileMutationQueues.get(key) === chainedQueue) {
      fileMutationQueues.delete(key);
    }
  }
}
```

The key insight is the promise-chaining pattern: each caller appends its own `nextQueue` promise to the tail of the existing chain and then awaits the *previous* head before executing. When it finishes it calls `releaseNext()`, unblocking the next waiter. The map entry is deleted when no further waiters exist, preventing unbounded growth.

The queue key is resolved via `realpathSync.native` to canonicalize symlinks—two paths that point to the same inode share one queue.

Operations on *different* files are fully parallel; only same-file mutations are serialized.

Sources: [packages/coding-agent/src/core/tools/file-mutation-queue.ts:1-39]()

---

## Read

### What it does

`createReadTool` reads a file, detects whether it is an image or text, and returns structured content blocks. For images (detected by magic bytes via `detectSupportedImageMimeTypeFromFile`, not by extension), it optionally resizes to 2000×2000 and returns a base64-encoded `image` block alongside a text note. For text, it applies `truncateHead` and returns a single `text` block with an actionable continuation notice.

### Key parameters

| Parameter | Default | Effect |
|-----------|---------|--------|
| `path`    | required | Resolved against `cwd` |
| `offset`  | 1 (line) | 1-indexed; error if beyond EOF |
| `limit`   | auto    | User-specified max lines; remainder noted in output |

Text output is capped at **2000 lines or 50 KB** (whichever hits first). When truncated, the model receives a message like:

```
[Showing lines 1-2000 of 2500. Use offset=2001 to continue.]
```

The `Read` tool does **not** use `withFileMutationQueue`—reads are not serialized because no state is mutated.

Sources: [packages/coding-agent/src/core/tools/read.ts:206-358](), [packages/coding-agent/src/core/tools/truncate.ts:78-160]()

---

## Write

### What it does

`createWriteTool` creates or fully overwrites a file. It first calls `ops.mkdir` with `{ recursive: true }` to create any missing parent directories, then writes the content as UTF-8.

Critically, it wraps the entire mkdir-plus-write sequence in `withFileMutationQueue`:

```ts
// packages/coding-agent/src/core/tools/write.ts:201-238
return withFileMutationQueue(absolutePath, () =>
  new Promise<...>((resolve, reject) => {
    ...
    await ops.mkdir(dir);
    await ops.writeFile(absolutePath, content);
    resolve({ content: [{ type: "text", text: `Successfully wrote ${content.length} bytes to ${path}` }], ... });
  })
);
```

There is no partial-write detection or rollback: if the process dies mid-write, the file is corrupt. The tool is intended for new files or complete rewrites—the `promptGuidelines` field in the definition explicitly states this.

Sources: [packages/coding-agent/src/core/tools/write.ts:181-280]()

---

## Edit

### What it does

`createEditTool` performs targeted in-place text replacement. Its schema accepts a `path` and an `edits` array, where each entry specifies an `oldText`/`newText` pair. All edits are matched against the *original* file in a single pass—not incrementally—so the agent cannot express overlapping edits.

The execution sequence:
1. Check access (`R_OK | W_OK`), surface `ENOENT`/`EACCES` immediately
2. Read the file into a buffer
3. Strip BOM, normalize line endings to LF
4. Call `applyEditsToNormalizedContent` (from `edit-diff.ts`) to match and replace all blocks atomically
5. Restore original line endings (CRLF preserved if the file used CRLF)
6. Re-add any BOM
7. Write the result via `withFileMutationQueue`

If any one edit fails (text not found, or found more than once), **no** edits are applied—the file is not touched. This all-or-nothing guarantee is enforced in `edit-diff.ts` before the write ever happens.

```ts
// packages/coding-agent/src/core/tools/edit.ts:314-421 (simplified)
return withFileMutationQueue(absolutePath, () => new Promise((resolve, reject) => {
  ...
  const buffer = await ops.readFile(absolutePath);
  const { bom, text: content } = stripBom(rawContent);
  const originalEnding = detectLineEnding(content);
  const normalizedContent = normalizeToLF(content);
  const { baseContent, newContent } = applyEditsToNormalizedContent(normalizedContent, edits, path);
  const finalContent = bom + restoreLineEndings(newContent, originalEnding);
  await ops.writeFile(absolutePath, finalContent);
  resolve({ ..., details: { diff, patch, firstChangedLine } });
}));
```

**Fuzzy matching**: The `prepareArguments` hook normalizes some LLM artifacts before matching—smart quotes mapped to ASCII, Unicode dashes to hyphens, fullwidth punctuation to halfwidth, non-breaking spaces to regular spaces, trailing whitespace stripped from lines, and Unicode compatibility normalization. Exact matches take priority over fuzzy matches.

Sources: [packages/coding-agent/src/core/tools/edit.ts:291-493](), [packages/coding-agent/test/tools.test.ts:227-436]()

---

## Bash

### What it does

`createBashTool` spawns a real shell subprocess and streams its combined stdout+stderr output. The shell is discovered via `getShellConfig` (respecting an optional `shellPath` override). The process is spawned `detached` on Unix so its entire process tree can be killed as a unit.

### Execution model

```ts
// packages/coding-agent/src/core/tools/bash.ts:66-128 (createLocalBashOperations)
const child = spawn(shell, [...args, command], {
  cwd,
  detached: process.platform !== "win32",
  env: env ?? getShellEnv(),
  stdio: ["ignore", "pipe", "pipe"],
  windowsHide: true,
});
```

stdin is closed (`"ignore"`), preventing interactive prompts from hanging indefinitely. stdout and stderr are merged into a single stream via the `onData` callback.

### Timeout and abort

- **Timeout**: If `timeout` is provided (seconds), a `setTimeout` kills the entire process tree via `killProcessTree(child.pid)`.
- **Abort**: An `AbortSignal` listener calls `killProcessTree` immediately on signal.
- **Exit code**: Any non-zero exit code is surfaced as a thrown `Error`; the model sees partial output plus `Command exited with code N`.

### `BashSpawnHook` and `commandPrefix`

Two extension points modify what actually runs:
- `commandPrefix` prepends shell setup lines (e.g. environment initialization) before every command.
- `spawnHook?: (context: BashSpawnContext) => BashSpawnContext` lets an extension rewrite the command, `cwd`, or environment before spawn—used for SSH delegation.

### Output streaming and throttling

Data flows into an `OutputAccumulator` on every `onData` callback. The UI is updated via a throttle: updates are batched with a minimum 100 ms gap (`BASH_UPDATE_THROTTLE_MS`), preventing chatty commands from flooding the TUI.

Sources: [packages/coding-agent/src/core/tools/bash.ts:64-447]()

---

## `OutputAccumulator`: Bounded Memory for Streaming Output

### What problem does it solve?

A bash command might emit gigabytes of output. The model context can hold far less. Without a bound, the agent would exhaust memory before it even reached the truncation decision.

### Architecture

`OutputAccumulator` maintains a *rolling tail* in memory and an optional temp file on disk:

```
Raw data (Buffer)
      │
      ├─► tailText (in-memory rolling window, 2× maxBytes)
      │         trimTail() fires when tailText > 2×maxBytes
      │
      └─► tempFileStream (if totalRawBytes > maxBytes)
                WriteStream to tmpdir/pi-bash-<hex>.log
```

- **In-memory tail**: UTF-8 decoded incrementally via a streaming `TextDecoder`. When `tailBytes` exceeds `2 × maxRollingBytes`, `trimTail()` slices from a UTF-8 character boundary, preserving multi-byte characters.
- **Temp file**: Opened lazily the first time output exceeds the limits. All raw `Buffer` chunks are written; the in-memory chunks buffered before the threshold are flushed first.
- **Snapshot**: `snapshot({ persistIfTruncated: true })` applies `truncateTail` to the in-memory tail and returns both the display-safe content and the `fullOutputPath` for the model to reference.

```ts
// packages/coding-agent/src/core/tools/output-accumulator.ts:91-119
snapshot(options: { persistIfTruncated?: boolean } = {}): OutputSnapshot {
  const tailTruncation = truncateTail(this.getSnapshotText(), { maxLines: this.maxLines, maxBytes: this.maxBytes });
  const truncated = this.totalLines > this.maxLines || this.totalDecodedBytes > this.maxBytes;
  ...
  if (options.persistIfTruncated && truncation.truncated) {
    this.ensureTempFile();
  }
  return { content: truncation.content, truncation, fullOutputPath: this.tempFilePath };
}
```

Defaults are **2000 lines or 50 KB** (from `truncate.ts`), matching the Read tool. Bash uses `truncateTail` (keep last N), while Read uses `truncateHead` (keep first N)—because for command output the recent end is most useful, while for file reads the beginning is.

Sources: [packages/coding-agent/src/core/tools/output-accumulator.ts:1-222](), [packages/coding-agent/src/core/tools/truncate.ts:163-241]()

---

## Grep

### What it does

`createGrepTool` delegates pattern matching to **ripgrep** (`rg`), spawned as a subprocess with `--json` output for structured parsing. The tool resolves `rg` via `ensureTool` (which can auto-download it if absent).

```ts
// packages/coding-agent/src/core/tools/grep.ts:214-219
const args: string[] = ["--json", "--line-number", "--color=never", "--hidden"];
if (ignoreCase) args.push("--ignore-case");
if (literal) args.push("--fixed-strings");
if (glob) args.push("--glob", glob);
args.push("--", pattern, searchPath);
```

The `--` separator before the pattern is a deliberate injection guard—flag-like patterns such as `--pre=script.sh` are treated as literal search text, not ripgrep flags. The test suite verifies this:

```ts
// packages/coding-agent/test/tools.test.ts:722-737
const result = await grepTool.execute("test-call-grep-injection", {
  pattern: `--pre=${payload}`,
  path: testDir,
});
expect(getTextOutput(result)).toContain("No matches found");
expect(existsSync(marker)).toBe(false);
```

Output is capped at **100 matches** (default) or **50 KB**, whichever arrives first. Matches beyond the limit cause the child process to be killed. Individual lines are truncated to **500 characters** to prevent single long lines from consuming the budget.

Sources: [packages/coding-agent/src/core/tools/grep.ts:122-384]()

---

## Find

### What it does

`createFindTool` delegates glob-based file discovery to **fd** (`fd-find`), also resolved via `ensureTool`. The tool handles both simple basename patterns (matched by fd's default `--glob` mode) and path-containing patterns:

```ts
// packages/coding-agent/src/core/tools/find.ts:241-249
if (pattern.includes("/")) {
  args.push("--full-path");
  if (!pattern.startsWith("/") && !pattern.startsWith("**/") && pattern !== "**") {
    effectivePattern = `**/${pattern}`;
  }
}
args.push("--", effectivePattern, searchPath);
```

The `--` separator prevents flag-injection attacks on the pattern argument. `--no-require-git` makes `.gitignore` semantics apply even outside git repositories without leaking sibling-directory rules. Hidden files are included (`--hidden`).

Results are relativized against `searchPath` and capped at **1000 results** (default) or **50 KB**.

Sources: [packages/coding-agent/src/core/tools/find.ts:112-370]()

---

## Ls

### What it does

`createLsTool` reads a directory directly via Node.js `readdirSync`, sorts entries case-insensitively, appends a `/` suffix for subdirectories, and returns the sorted list as a text block. Unlike `find`, it does not recurse and does not respect `.gitignore`. It does include dotfiles by default.

Entries are capped at **500** (default `limit`). The byte cap from `truncateHead` is applied to the assembled string as a secondary guard.

Sources: [packages/coding-agent/src/core/tools/ls.ts:99-228]()

---

## Truncation Architecture: `truncate.ts`

All seven tools share the same two-limit truncation policy defined in one file:

| Constant | Value | Used by |
|---|---|---|
| `DEFAULT_MAX_LINES` | 2000 | Read, OutputAccumulator |
| `DEFAULT_MAX_BYTES` | 50 KB | All tools |
| `GREP_MAX_LINE_LENGTH` | 500 chars | Grep (per-line) |

Two strategies exist, chosen by tool semantics:

- **`truncateHead`** (keep beginning): Read, Grep, Find, Ls. Correct for file inspection—the model wants the top of the file or the first matching results.
- **`truncateTail`** (keep end): OutputAccumulator / Bash. Correct for command output—the model wants the final state, which contains errors, prompts, and results.

Both strategies return a `TruncationResult` that includes `totalLines`, `outputLines`, `truncatedBy` (`"lines"` or `"bytes"`), and `fullOutputPath` (Bash only). These fields drive the actionable continuation notices shown in tool output.

Sources: [packages/coding-agent/src/core/tools/truncate.ts:1-276]()

---

## Pluggable Operations Pattern

Every tool except Ls exposes an `*Operations` interface that replaces its I/O backend:

```text
ReadOperations    { readFile, access, detectImageMimeType? }
WriteOperations   { writeFile, mkdir }
EditOperations    { readFile, writeFile, access }
BashOperations    { exec }
GrepOperations    { isDirectory, readFile }
FindOperations    { exists, glob }
LsOperations      { exists, stat, readdir }
```

This means the entire tool surface can be redirected to SSH, a container, a mock, or a remote filesystem by swapping a single options object—without touching the truncation, queueing, or rendering logic. The Bash tool's `BashSpawnHook` extends this further by allowing arbitrary command, `cwd`, or environment rewriting before spawn.

---

## Concurrency and Safety Summary

```text
Two concurrent edit calls targeting the same file:

 Call A                        Call B
   │                             │
   ├─ withFileMutationQueue ──┐  ├─ withFileMutationQueue ──┐
   │   key = realpath(file)   │  │   key = realpath(file)   │
   │   appends to chain       │  │   appends to chain       │
   │   awaits currentQueue ◄──┘  │   awaits A's nextQueue ◄─┘
   │                             │
   ├─ reads file                 │ (blocked)
   ├─ applies edits              │
   ├─ writes file                │
   └─ calls releaseNext() ──────►│ unblocked
                                 ├─ reads file (sees A's result)
                                 ├─ applies edits
                                 └─ writes file
```

Write and Edit both route through `withFileMutationQueue`. Read does not—reads are never serialized. Bash has no file-level lock because it runs arbitrary commands; the OS provides whatever atomicity the commands themselves implement.

Sources: [packages/coding-agent/src/core/tools/file-mutation-queue.ts:19-39](), [packages/coding-agent/src/core/tools/edit.ts:316](), [packages/coding-agent/src/core/tools/write.ts:203]()

---

## Summary

The seven built-in tools cover all of the agent's filesystem surface area: Read and Ls for observation, Write and Edit for mutation, Bash for arbitrary shell execution, and Grep and Find for search. Every mutation path flows through `withFileMutationQueue` to prevent concurrent torn writes, and every output path flows through the shared truncation constants in `truncate.ts` to prevent context overflow. The `OutputAccumulator` adds a second layer specifically for streaming bash output, maintaining an in-memory rolling tail and spilling to a temp file on disk when output exceeds limits. All seven tools expose a pluggable `*Operations` interface, keeping the scheduling, truncation, and rendering logic decoupled from the I/O backend and making them portable to remote or virtualized environments without changing any of the safety machinery.

Sources: [packages/coding-agent/src/core/tools/truncate.ts:1-13]()

---

## 11. pi-tui: Why Build a Terminal UI Library from Scratch?

> packages/tui implements its own terminal rendering engine with differential output, an undo stack, a kill ring, Emacs-style key bindings, fuzzy search, and inline image display (Kitty/Sixel). This page asks what constraints made off-the-shelf libraries insufficient, how the virtual terminal model in terminal.ts avoids screen-flicker, and what stdin-buffer.ts does to handle raw key events. The regression tests expose the edge cases that forced custom code.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/11-pi-tui-why-build-a-terminal-ui-library-from-scratch.md
- Generated: 2026-05-22T23:31:17.297Z

### Source Files

- `packages/tui/src/tui.ts`
- `packages/tui/src/terminal.ts`
- `packages/tui/src/stdin-buffer.ts`
- `packages/tui/src/kill-ring.ts`
- `packages/tui/src/terminal-image.ts`
- `packages/tui/test/tui-render.test.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/tui/src/tui.ts](packages/tui/src/tui.ts)
- [packages/tui/src/terminal.ts](packages/tui/src/terminal.ts)
- [packages/tui/src/stdin-buffer.ts](packages/tui/src/stdin-buffer.ts)
- [packages/tui/src/kill-ring.ts](packages/tui/src/kill-ring.ts)
- [packages/tui/src/terminal-image.ts](packages/tui/src/terminal-image.ts)
- [packages/tui/src/undo-stack.ts](packages/tui/src/undo-stack.ts)
- [packages/tui/src/editor-component.ts](packages/tui/src/editor-component.ts)
- [packages/tui/src/fuzzy.ts](packages/tui/src/fuzzy.ts)
- [packages/tui/src/keybindings.ts](packages/tui/src/keybindings.ts)
- [packages/tui/test/tui-render.test.ts](packages/tui/test/tui-render.test.ts)
- [packages/tui/test/virtual-terminal.ts](packages/tui/test/virtual-terminal.ts)
</details>

# pi-tui: Why Build a Terminal UI Library from Scratch?

`packages/tui` is a self-contained terminal UI engine written in TypeScript for Node.js. Rather than wrapping an existing library like `blessed`, `ink`, or `xterm.js`, the package provides its own virtual line buffer, differential renderer, raw input parser, kill ring, undo stack, fuzzy search, and inline image support. This page examines the constraints that drove each custom piece, how the rendering model avoids screen flicker, and how the stdin buffer turns a stream of raw bytes into discrete, typed key events.

Understanding the design is useful when extending the TUI (adding components, integrating image protocols, or writing editor widgets), when debugging rendering artifacts on unusual terminals, or when evaluating whether to reuse this package in a separate application.

---

## Why not use an off-the-shelf library?

The first question to ask is: what would an existing library have needed to do that it could not?

**Differential output at line granularity.** Most high-level TUI libraries redraw entire regions on every frame. This package tracks `previousLines: string[]` per render cycle and only rewrites the lines that actually changed, wrapping the minimal set of cursor moves and line clears in a single synchronized-output block (`\x1b[?2026h` / `\x1b[?2026l`). A spinner animation that changes one middle line in a 20-line layout only writes that one line.

**Kitty and iTerm2 inline image protocols.** Bitmap images rendered via the Kitty APC graphics protocol or the iTerm2 `ESC]1337` protocol cannot be treated as text: they occupy physical cell rows that the terminal controls, not the application. Off-the-shelf libraries generally ignore these protocols entirely. This package tracks `previousKittyImageIds: Set<number>` across frames and issues explicit delete sequences (`Ga=d,d=I,i=<id>`) before redrawing positions that overlap an existing image placement. Tests verify the ordering is correct: delete must precede redraw.

**Emacs-style keybindings + kill ring.** The `Keybindings` interface in `keybindings.ts` declares 30+ named actions — `tui.editor.yank`, `tui.editor.yankPop`, `tui.editor.deleteToLineStart`, `tui.editor.undo`, and so on — that map to multiple physical key chords. This is a richer binding model than most TUI libraries expose. The kill ring itself (`kill-ring.ts`) supports consecutive-kill accumulation (both forward and backward), which requires the `accumulate`/`prepend` flag on `push()`. No off-the-shelf kill ring with those semantics was available as an isolated module.

**Bracketed paste and Kitty keyboard protocol negotiation.** Input from `stdin` in raw mode is not a stream of "key events"; it is a byte stream where `\x1b[<35;20;5m` (an SGR mouse event) might arrive split across three separate `data` callbacks. The Kitty keyboard protocol further adds key-release events that most components must filter out, and WezTerm emits a double-ESC concatenation (`\x1b\x1b[27;...u`) that needs careful disambiguation. None of this is handled correctly by most general-purpose input parsers.

Sources: [packages/tui/src/tui.ts:241-261](), [packages/tui/src/kill-ring.ts:1-46](), [packages/tui/src/keybindings.ts:1-42](), [packages/tui/src/stdin-buffer.ts:1-18]()

---

## The virtual terminal model in `terminal.ts`

### What is the simplest abstraction needed?

The minimal contract between the render engine and the physical terminal is: write bytes out, read dimensions, handle resize events. The `Terminal` interface in `terminal.ts` captures exactly this:

```typescript
// packages/tui/src/terminal.ts:17-58
export interface Terminal {
  start(onInput: (data: string) => void, onResize: () => void): void;
  stop(): void;
  drainInput(maxMs?: number, idleMs?: number): Promise<void>;
  write(data: string): void;
  get columns(): number;
  get rows(): number;
  get kittyProtocolActive(): boolean;
  moveBy(lines: number): void;
  hideCursor(): void;
  showCursor(): void;
  clearLine(): void;
  clearFromCursor(): void;
  clearScreen(): void;
  setTitle(title: string): void;
  setProgress(active: boolean): void;
}
```

This interface separates concerns: `ProcessTerminal` wires up `process.stdin`/`process.stdout`, while `VirtualTerminal` in the test suite wraps `@xterm/headless` for deterministic test assertions. Tests never need a real TTY.

### Where does complexity become necessary?

The `ProcessTerminal.start()` method does more than enable raw mode:

1. **Bracketed paste mode** is enabled immediately: `\x1b[?2004h`. This causes the terminal to wrap paste content in sentinel sequences, allowing the input parser to reassemble multi-line pastes without treating each newline as a submit event.

2. **Kitty protocol negotiation** runs asynchronously. A query (`\x1b[?u`) is sent; if the terminal responds with `\x1b[?<flags>u`, the Kitty protocol is activated with flags 1+2+4 (disambiguate escape codes, report event types, report alternate keys). If no response arrives within 150 ms, `xterm modifyOtherKeys` mode 2 is activated as a fallback — needed for tmux forwarding.

3. **Windows VT input** requires loading a native `.node` module that calls `SetConsoleMode` to add `ENABLE_VIRTUAL_TERMINAL_INPUT (0x0200)` to the console handle. Without this, Shift+Tab arrives as plain `\t` on Windows.

4. **Termux height-change suppression**: a resize that changes only height triggers a full redraw everywhere except Termux, where the software keyboard raising and lowering causes constant height thrash. The environment variable `TERMUX_VERSION` gates this path.

Sources: [packages/tui/src/terminal.ts:92-239]()

### The synchronized output idiom

Every write that touches more than one line is wrapped in `\x1b[?2026h` / `\x1b[?2026l` — the "synchronized output" mode supported by Kitty, WezTerm, and newer xterm builds. The terminal holds the frame until the closing marker arrives, preventing the user from seeing intermediate cursor-movement states. This is the primary mechanism by which the differential renderer avoids visible flicker.

Sources: [packages/tui/src/tui.ts:985](), [packages/tui/src/tui.ts:1144-1145](), [packages/tui/src/tui.ts:1230]()

---

## Differential rendering in `tui.ts`

### How does the virtual line buffer work?

`TUI` extends `Container`, which recursively calls `render(width: number): string[]` on all children. The result is an array of strings, each representing one terminal row. After overlays are composited, the current array is compared against `this.previousLines` line by line:

```
for (let i = 0; i < maxLines; i++) {
    if (previousLines[i] !== newLines[i]) {
        track firstChanged / lastChanged
    }
}
```

Only the range `[firstChanged, lastChanged]` is rewritten. The cursor is moved from its tracked position (`hardwareCursorRow`) to `firstChanged`, then each changed line is cleared with `\x1b[2K` and the new content written. Lines outside the changed range are untouched.

```text
Frame N:   ["Header", "Working |", "Footer"]
Frame N+1: ["Header", "Working /", "Footer"]

Changed range: [1, 1]
Actions:  move to row 1 → clear line → write "Working /"
Untouched: rows 0 and 2
```

Sources: [packages/tui/src/tui.ts:953-1087](), [packages/tui/src/tui.ts:1171-1209]()

### When does the engine fall back to a full redraw?

The differential path has several guards that escalate to `fullRender(clear: true)`:

| Trigger | Reason |
|---|---|
| Width changed | Text wrapping changes; all line lengths are invalid |
| Height changed (non-Termux) | Visible viewport alignment changes |
| `firstChanged < prevViewportTop` | Changed lines are above the currently visible scroll window |
| Deleted lines shift the viewport up | Cursor tracking breaks when content moves out of the viewport |
| Content shrinks past `maxLinesRendered` (opt-in) | Stale lines below new content must be erased |

The test `"full re-renders when deleted lines move the viewport upward"` pins the specific scenario where 12 lines shrink to 7 in a 5-row terminal: the viewport anchor changes, so a full redraw is required.

Sources: [packages/tui/src/tui.ts:1028-1141](), [packages/tui/test/tui-render.test.ts:484-503]()

### Hardware cursor positioning for IME

When a focused component emits `CURSOR_MARKER` (`\x1b_pi:c\x07` — an APC sequence that terminals ignore) at its cursor position, `TUI` scans the bottom `height` rendered lines for this marker, calculates the visual column from the text before it using `visibleWidth()`, strips the marker from the line, then moves the hardware cursor to that exact cell. This ensures the IME candidate window appears at the text insertion point even though the TUI normally hides the hardware cursor.

Sources: [packages/tui/src/tui.ts:88-90](), [packages/tui/src/tui.ts:933-951](), [packages/tui/src/tui.ts:1287-1318]()

### Kitty image lifecycle

Images add a lifecycle dimension that pure text does not have. The Kitty protocol places a bitmap into the terminal's image store identified by a 32-bit `imageId`. If the same ID is placed again at a different position (e.g., because the TUI content reflows), the old placement remains until explicitly deleted. The TUI therefore:

1. Collects all `imageId` values from `previousLines` and `newLines`.
2. Before writing changed lines, issues `deleteKittyImage(id)` for every ID that appears in the changed range of `previousLines`.
3. After a full redraw, issues delete sequences for all IDs from `previousKittyImageIds`.

Test `"deletes changed image ids before drawing moved placements"` verifies the ordering: `deleteIndex < drawIndex` in the raw write stream.

Sources: [packages/tui/src/tui.ts:832-872](), [packages/tui/test/tui-render.test.ts:68-145]()

---

## `stdin-buffer.ts`: parsing raw key events

### What problem does it solve?

Node.js delivers stdin in `data` events that can split any escape sequence across multiple callbacks. The SGR mouse sequence `\x1b[<35;20;5m` might arrive as three separate events. If forwarded directly to a key handler, `\x1b` alone looks like the Escape key, breaking all subsequent input until the terminal recovers.

`StdinBuffer` accumulates incoming bytes in a string buffer and runs `extractCompleteSequences()` to identify sequence boundaries before forwarding events.

### How does sequence detection work?

The state machine in `isCompleteSequence()` classifies the sequence type from the second byte after ESC:

```text
ESC [  → CSI sequence — ends at byte in 0x40–0x7E range
ESC ]  → OSC sequence — ends at ESC \ or BEL (\x07)
ESC P  → DCS sequence — ends at ESC \
ESC _  → APC sequence — ends at ESC \
ESC O  → SS3 — ESC O + one byte
ESC <single char> → Meta key — complete
```

SGR mouse sequences (`<digits;digits;digits[Mm]`) require a more specific pattern match because the final byte `M` or `m` alone is not a reliable terminator — the parser waits for the full three-field numeric pattern.

```typescript
// packages/tui/src/stdin-buffer.ts:104-119
if (payload.startsWith("<")) {
    const mouseMatch = /^<\d+;\d+;\d+[Mm]$/.test(payload);
    if (mouseMatch) return "complete";
    // partial SGR mouse: wait for more bytes
    if (lastChar === "M" || lastChar === "m") { ... }
    return "incomplete";
}
```

Sources: [packages/tui/src/stdin-buffer.ts:29-126]()

### The WezTerm double-ESC edge case

WezTerm with `enable_kitty_keyboard` sends the Escape key press as a raw `\x1b` byte (simple text path) and the key release as a full CSI-u sequence, concatenated: `\x1b\x1b[27;...u`. The naive parser sees `\x1b\x1b` as a complete meta-key sequence (ESC + ESC) and leaves `[27;...u` as stray text. The buffer handles this explicitly:

```typescript
// packages/tui/src/stdin-buffer.ts:216-229
if (candidate === "\x1b\x1b") {
    const nextChar = remaining[seqEnd];
    if (nextChar === "[" || nextChar === "]" || nextChar === "O"
        || nextChar === "P" || nextChar === "_") {
        sequences.push(ESC);  // emit first ESC alone
        pos += 1;
        break;  // restart from second ESC
    }
}
```

Sources: [packages/tui/src/stdin-buffer.ts:207-232]()

### Bracketed paste reassembly

Paste content wrapped in `\x1b[200~`…`\x1b[201~` is extracted from the byte stream and re-emitted as a `paste` event rather than individual keystrokes. `ProcessTerminal` re-wraps the content in the bracketed markers before forwarding to the input handler, ensuring existing editor handling that recognizes those sentinels continues to work.

Sources: [packages/tui/src/stdin-buffer.ts:315-369](), [packages/tui/src/terminal.ts:167-170]()

### Kitty printable codepoint deduplication

The Kitty protocol reports a printable character as both a raw Unicode codepoint (e.g., `a`) and a CSI-u sequence (`\x1b[97u`). The buffer detects the CSI-u form, extracts its codepoint via `parseUnmodifiedKittyPrintableCodepoint()`, and suppresses the immediately following raw codepoint if it matches. This prevents every keystroke from being delivered twice.

Sources: [packages/tui/src/stdin-buffer.ts:183-397]()

---

## Undo stack and kill ring

### `UndoStack<S>`

The undo stack uses `structuredClone` to snapshot arbitrary state objects. The caller pushes the entire editor state before each mutation; undo pops the most recent snapshot and restores it. Because `structuredClone` is used at push time, pops return already-detached objects with no additional copying.

```typescript
// packages/tui/src/undo-stack.ts:11-13
push(state: S): void {
    this.stack.push(structuredClone(state));
}
```

Sources: [packages/tui/src/undo-stack.ts:1-28]()

### `KillRing`

The kill ring stores deleted text strings in a ring array and supports two behaviors that Emacs users expect:

- **Consecutive kill accumulation**: if `opts.accumulate` is true on successive kills, the new text is merged into the existing top entry rather than pushed as a new entry. `opts.prepend` controls direction — backward deletion (`Ctrl+Backspace`) prepends; forward deletion (`Ctrl+Delete`) appends.
- **Yank-pop cycling**: `rotate()` moves the last entry to the front, allowing repeated `yank-pop` (typically `Alt+Y`) to cycle through earlier kills.

Sources: [packages/tui/src/kill-ring.ts:1-46]()

---

## Inline image display

`terminal-image.ts` provides capability detection, encoding, and cell-size calculation for two image protocols:

| Protocol | Detection | Sequence prefix |
|---|---|---|
| Kitty graphics | `KITTY_WINDOW_ID`, `TERM_PROGRAM=kitty`, `ghostty`, `wezterm` | `\x1b_G` (APC) |
| iTerm2 inline | `ITERM_SESSION_ID`, `TERM_PROGRAM=iterm.app` | `\x1b]1337;File=` |
| None (fallback) | tmux/screen, vscode, alacritty, unknown | — |

tmux and screen always receive `images: null` because they swallow OSC sequences by default.

Cell size for aspect-ratio-correct scaling is obtained at runtime by sending `CSI 16 t` and parsing the response `CSI 6 ; height ; width t`. Until the response arrives, a default of 9×18 pixels per cell is used. When the response arrives, all components are invalidated and a re-render is triggered.

Large images are chunked into 4096-byte base64 segments with Kitty's `m=1` (more chunks follow) / `m=0` (final chunk) framing.

Sources: [packages/tui/src/terminal-image.ts:42-86](), [packages/tui/src/terminal-image.ts:126-170](), [packages/tui/src/tui.ts:463-471]()

---

## The test harness

The test suite avoids real TTYs by implementing `VirtualTerminal` backed by `@xterm/headless`. This gives tests a real terminal emulator with accurate cursor tracking and escape-sequence interpretation, so assertions can query cell contents (`getViewport()`, `getCursorPosition()`, `isItalic()`) rather than parsing raw escape sequences.

`waitForRender()` accounts for the TUI's throttled render pipeline (minimum 16 ms between frames, `process.nextTick` scheduling):

```typescript
// packages/tui/test/virtual-terminal.ts:213-217
async waitForRender(): Promise<void> {
    await new Promise<void>((resolve) => process.nextTick(resolve));
    await new Promise<void>((resolve) => setTimeout(resolve, 20));
    await this.flush();
}
```

The regression tests expose specific edge cases that motivated custom code:

- **Style isolation** (`"resets styles after each rendered line"`): an italic span in line 0 must not bleed into line 1. The render path appends `\x1b[0m\x1b]8;;\x07` after every non-image line.
- **Stale content from transient components**: a selector overlay that temporarily inflates `maxLinesRendered` must be fully cleared when it disappears, even if the component below it did not change.
- **Termux height-change suppression**: height resize must not trigger `\x1b[2J` (screen clear) on Termux because the software keyboard hides and shows constantly.
- **Image delete-before-draw ordering**: a Kitty image placement that moves between frames must have its old placement deleted before the new one is written, or two copies appear on screen.

Sources: [packages/tui/test/tui-render.test.ts:366-590](), [packages/tui/test/virtual-terminal.ts:11-218]()

---

## Summary

`packages/tui` is not a "nice to have" abstraction over ncurses. It exists because the application required a specific combination that no existing library delivered: differential line-level rendering with synchronized output, Kitty/iTerm2 inline image lifecycle management, raw Kitty keyboard protocol negotiation with WezTerm-specific quirks, bracketed paste reassembly from split `stdin` chunks, and an Emacs kill ring with directional accumulation. The virtual terminal interface (`Terminal`) keeps every non-trivial behavior testable against an `@xterm/headless` emulator without a real TTY. The regression suite in `tui-render.test.ts` is the clearest record of which edge cases were hard enough to require custom code rather than a general-purpose solution.

Sources: [packages/tui/src/tui.ts:239-280](), [packages/tui/src/stdin-buffer.ts:274-435](), [packages/tui/test/tui-render.test.ts:535-590]()

---

## 12. Three Modes, One AgentSession: What Changes Between Interactive, Print, and RPC?

> The coding agent runs in three surface modes: interactive (full TUI), print (stdout-only for scripting), and RPC (JSONL protocol for IDE integration). All three share AgentSession; each adds its own I/O adapter. This page examines rpc-mode.ts and rpc-types.ts to understand the JSONL protocol, contrasts it with interactive-mode.ts component wiring, and asks what the RPC mode reveals about the true API surface of the agent.

- Page Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/pages/12-three-modes-one-agentsession-what-changes-between-interactive-print-and-rpc.md
- Generated: 2026-05-22T23:31:33.944Z

### Source Files

- `packages/coding-agent/src/modes/rpc/rpc-mode.ts`
- `packages/coding-agent/src/modes/rpc/rpc-types.ts`
- `packages/coding-agent/src/modes/rpc/jsonl.ts`
- `packages/coding-agent/src/modes/interactive/interactive-mode.ts`
- `packages/coding-agent/src/modes/print-mode.ts`
- `packages/coding-agent/src/modes/index.ts`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/coding-agent/src/modes/rpc/rpc-mode.ts](packages/coding-agent/src/modes/rpc/rpc-mode.ts)
- [packages/coding-agent/src/modes/rpc/rpc-types.ts](packages/coding-agent/src/modes/rpc/rpc-types.ts)
- [packages/coding-agent/src/modes/rpc/jsonl.ts](packages/coding-agent/src/modes/rpc/jsonl.ts)
- [packages/coding-agent/src/modes/rpc/rpc-client.ts](packages/coding-agent/src/modes/rpc/rpc-client.ts)
- [packages/coding-agent/src/modes/print-mode.ts](packages/coding-agent/src/modes/print-mode.ts)
- [packages/coding-agent/src/modes/index.ts](packages/coding-agent/src/modes/index.ts)
- [packages/coding-agent/src/core/output-guard.ts](packages/coding-agent/src/core/output-guard.ts)
- [packages/coding-agent/src/core/agent-session-runtime.ts](packages/coding-agent/src/core/agent-session-runtime.ts)
- [packages/coding-agent/src/core/extensions/types.ts](packages/coding-agent/src/core/extensions/types.ts)
</details>

# Three Modes, One AgentSession: What Changes Between Interactive, Print, and RPC?

The coding agent exposes three surface modes — interactive (full TUI), print (stdout-only for scripting), and RPC (JSONL protocol for IDE integration) — yet all three operate the same `AgentSession` core. The mode boundary is thin: each surface provides a different I/O adapter and a different implementation of `ExtensionUIContext`, but the session's business logic — prompting, compaction, model selection, session forking, bash execution — never changes. This page uses RPC mode as a lens to reveal what that shared API surface actually is, how the JSONL wire protocol works, and what interactive mode adds that RPC deliberately omits.

Understanding this boundary matters if you are embedding the agent in an IDE extension, a CI pipeline, or any headless host: the RPC surface is effectively the machine-readable contract for the agent's capabilities.

---

## What Is the Simplest Version?

Imagine stripping the agent to its minimum: a function that accepts a text prompt, forwards it to a language model, and writes the assistant's reply to stdout. That is print mode.

```ts
// packages/coding-agent/src/modes/print-mode.ts:32-46
export async function runPrintMode(
  runtimeHost: AgentSessionRuntime,
  options: PrintModeOptions,
): Promise<number> {
  const { mode, messages = [], initialMessage, initialImages } = options;
  let session = runtimeHost.session;
```

Print mode runs once: it fires `session.prompt(...)` for each message, then writes the final assistant text (in `"text"` mode) or every `AgentSessionEvent` as a JSON line (in `"json"` mode), then exits. It subscribes to session events but uses no UI context at all — `bindExtensions` receives only `commandContextActions` and `onError`, with no `uiContext`.

Sources: [packages/coding-agent/src/modes/print-mode.ts:71-108]()

---

## Where Complexity Becomes Necessary

Print mode cannot serve an IDE host that needs to:
- remain alive across multiple user turns,
- react to agent events in real time,
- request interactive input from the user (select, confirm, text input, editor), and
- receive typed, structured data rather than raw text.

All of these require a long-lived, bidirectional channel with a defined protocol. That is RPC mode.

---

## The JSONL Wire Protocol

### Framing

The transport is newline-delimited JSON (JSONL) on `stdin`/`stdout`. `jsonl.ts` implements this with a deliberate choice: it does **not** use Node's `readline`, because `readline` splits on additional Unicode line separators (U+2028, U+2029) that are legal inside JSON strings and would corrupt payloads.

```ts
// packages/coding-agent/src/modes/rpc/jsonl.ts:10-12
export function serializeJsonLine(value: unknown): string {
  return `${JSON.stringify(value)}\n`;
}
```

```ts
// packages/coding-agent/src/modes/rpc/jsonl.ts:21-42
export function attachJsonlLineReader(
  stream: Readable,
  onLine: (line: string) => void,
): () => void {
  const decoder = new StringDecoder("utf8");
  let buffer = "";
  // Splits only on \n, not on U+2028/U+2029
  ...
}
```

Framing rule: records are separated by LF (`\n`) only. CRLF is handled by stripping a trailing `\r` before the `\n` boundary.

Sources: [packages/coding-agent/src/modes/rpc/jsonl.ts:1-58]()

### The Three Message Kinds

All messages on the wire fall into three distinct categories:

| Direction | Type field | Purpose |
|-----------|-----------|---------|
| stdin → agent | `RpcCommand["type"]` values | Commands: `prompt`, `abort`, `get_state`, `set_model`, `bash`, etc. |
| agent → stdout | `"response"` | Replies to commands, keyed by command type and optional `id` |
| agent → stdout | `AgentSessionEvent` | Streaming events from the session (token chunks, tool calls, etc.) |
| agent → stdout | `"extension_ui_request"` | Agent asks host for user input |
| stdin → agent | `"extension_ui_response"` | Host replies to a UI request with `id` correlation |

The `id` field on commands is optional but enables correlation: a client can tag a command with `id: "req_1"` and the matching response will carry the same `id`.

Sources: [packages/coding-agent/src/modes/rpc/rpc-types.ts:19-69](), [packages/coding-agent/src/modes/rpc/rpc-types.ts:111-206]()

### Command Groups

`RpcCommand` is a tagged union of 27 command variants grouped by concern:

| Group | Commands |
|-------|----------|
| Prompting | `prompt`, `steer`, `follow_up`, `abort`, `new_session` |
| State | `get_state` |
| Model | `set_model`, `cycle_model`, `get_available_models` |
| Thinking | `set_thinking_level`, `cycle_thinking_level` |
| Queue modes | `set_steering_mode`, `set_follow_up_mode` |
| Compaction | `compact`, `set_auto_compaction` |
| Retry | `set_auto_retry`, `abort_retry` |
| Bash | `bash`, `abort_bash` |
| Session | `get_session_stats`, `export_html`, `switch_session`, `fork`, `clone`, `get_fork_messages`, `get_last_assistant_text`, `set_session_name` |
| Messages | `get_messages` |
| Commands | `get_commands` |

Sources: [packages/coding-agent/src/modes/rpc/rpc-types.ts:19-69]()

---

## How RPC Mode Bootstraps

### stdout takeover

The first act of `runRpcMode` is `takeOverStdout()`. This patches `process.stdout.write` so that any code that naively calls `console.log` or writes to stdout gets silently redirected to stderr. Only calls through `writeRawStdout(serializeJsonLine(obj))` reach real stdout. This ensures the JSONL stream is never polluted by debug output from extensions or library code.

```ts
// packages/coding-agent/src/modes/rpc/rpc-mode.ts:48-55
export async function runRpcMode(runtimeHost: AgentSessionRuntime): Promise<never> {
  takeOverStdout();
  let session = runtimeHost.session;
  ...
  const output = (obj: RpcResponse | RpcExtensionUIRequest | object) => {
    writeRawStdout(serializeJsonLine(obj));
  };
```

Sources: [packages/coding-agent/src/core/output-guard.ts:9-34](), [packages/coding-agent/src/modes/rpc/rpc-mode.ts:48-55]()

### Session subscription

After binding extensions, RPC mode subscribes to `session.subscribe(event => output(event))`. Every `AgentSessionEvent` — token chunks, tool-call starts and ends, errors, idle signals — is forwarded verbatim as a JSONL line. The host does not need to poll; events arrive in real time.

```ts
// packages/coding-agent/src/modes/rpc/rpc-mode.ts:346-348
unsubscribe?.();
unsubscribe = session.subscribe((event) => {
  output(event);
});
```

Sources: [packages/coding-agent/src/modes/rpc/rpc-mode.ts:344-349]()

### Infinite loop

The function returns `Promise<never>` and resolves only on shutdown. The process stays alive via a never-resolving promise at the end, keeping stdin open for commands.

```ts
// packages/coding-agent/src/modes/rpc/rpc-mode.ts:752-753
// Keep process alive forever
return new Promise(() => {});
```

Sources: [packages/coding-agent/src/modes/rpc/rpc-mode.ts:752-753]()

---

## The Extension UI Adapter: What RPC Can and Cannot Do

All three modes call `session.bindExtensions({ uiContext, ... })`. Print mode passes no `uiContext` at all. Interactive mode wires `uiContext` to concrete TUI components — overlays, selectors, the Monaco-style editor. RPC mode creates a synthetic `ExtensionUIContext` that translates each UI method into either a `extension_ui_request` JSONL line or a no-op.

```ts
// packages/coding-agent/src/modes/rpc/rpc-mode.ts:129-133
const createExtensionUIContext = (): ExtensionUIContext => ({
  select: (title, options, opts) =>
    createDialogPromise(opts, undefined, { method: "select", title, options, timeout: opts?.timeout }, (r) =>
      "cancelled" in r && r.cancelled ? undefined : "value" in r ? r.value : undefined,
    ),
```

The `createDialogPromise` helper emits an `extension_ui_request` to stdout, registers a pending entry keyed by a UUID, then suspends until the host sends back an `extension_ui_response` with the same `id`. Timeout and abort-signal support are built in.

### RPC vs Interactive UI capability matrix

| Capability | Interactive | Print | RPC |
|------------|-------------|-------|-----|
| `select`, `confirm`, `input` | TUI overlay | — | `extension_ui_request` / response roundtrip |
| `editor` | Full TUI editor overlay | — | `extension_ui_request` / response roundtrip |
| `notify` | TUI notification | — | Fire-and-forget `extension_ui_request` |
| `setStatus` | Footer status line | — | Fire-and-forget `extension_ui_request` |
| `setWidget` | TUI widget panel | — | String arrays only via `extension_ui_request`; React component factories are silently dropped |
| `setWorkingMessage` / `setWorkingVisible` | TUI spinner | — | No-op (requires TUI loader) |
| `setHiddenThinkingLabel` | TUI label | — | No-op (requires TUI renderer) |
| `setFooter`, `setHeader` | TUI custom components | — | No-op |
| `setTheme` | Live theme switch | — | Returns `{ success: false }` |
| `getToolsExpanded` / `setToolsExpanded` | TUI state | — | Always false / no-op |
| `addAutocompleteProvider` | Autocomplete integration | — | No-op |
| `pasteToEditor` | Clipboard → editor | — | Redirects to `setEditorText` |
| `getEditorText` | Returns live text | — | Always returns `""` |

The pattern is: anything requiring access to TUI component state is silently ignored in RPC mode; anything that can be represented as a structured JSON message is forwarded to the host.

Sources: [packages/coding-agent/src/modes/rpc/rpc-mode.ts:129-304]()

---

## The `prompt` Command: Asynchronous by Design

The `prompt` command is special. When the agent receives it, the session starts processing immediately but the acknowledgment response is not emitted until a "preflight" check inside the session has passed. This prevents the host from treating a queued or immediately-handled prompt as a failure.

```ts
// packages/coding-agent/src/modes/rpc/rpc-mode.ts:379-400
case "prompt": {
  let preflightSucceeded = false;
  void session
    .prompt(command.message, {
      ...
      preflightResult: (didSucceed) => {
        if (didSucceed) {
          preflightSucceeded = true;
          output(success(id, "prompt"));
        }
      },
    })
    .catch((e) => {
      if (!preflightSucceeded) {
        output(error(id, "prompt", e.message));
      }
    });
  return undefined; // response emitted later via preflightResult callback
}
```

After the `{ type: "response", command: "prompt", success: true }` line, the session streams `AgentSessionEvent` objects — token chunks, tool calls, idle signals — which the host observes via its `onEvent` listener. The host must not assume the agent is idle after the `prompt` response; it must wait for an `agent_end` event.

Sources: [packages/coding-agent/src/modes/rpc/rpc-mode.ts:379-401]()

---

## The RpcClient: Protocol from the Other Side

`rpc-client.ts` provides a typed TypeScript wrapper that spawns the agent with `--mode rpc` and drives the protocol from the host side. It demonstrates the intended usage pattern clearly.

```ts
// packages/coding-agent/src/modes/rpc/rpc-client.ts:77-93
const args = ["--mode", "rpc"];
...
this.process = spawn("node", [cliPath, ...args], {
  stdio: ["pipe", "pipe", "pipe"],
});
this.stopReadingStdout = attachJsonlLineReader(this.process.stdout!, (line) => {
  this.handleLine(line);
});
```

Internally, `RpcClient.send()` assigns a sequential `req_N` id to every command, stores a resolve/reject in `pendingRequests`, writes the serialized command to stdin, and resolves when the matching `{ type: "response", id: "req_N" }` arrives. Events without a matching id fall through to `eventListeners`.

```ts
// packages/coding-agent/src/modes/rpc/rpc-client.ts:456-475
private handleLine(line: string): void {
  const data = JSON.parse(line);
  if (data.type === "response" && data.id && this.pendingRequests.has(data.id)) {
    const pending = this.pendingRequests.get(data.id)!;
    this.pendingRequests.delete(data.id);
    pending.resolve(data as RpcResponse);
    return;
  }
  for (const listener of this.eventListeners) {
    listener(data as AgentEvent);
  }
}
```

`RpcClient` also provides `waitForIdle()` (resolves on `agent_end` event), `collectEvents()` (accumulates all events until idle), and the convenience `promptAndWait()` that races collection and prompt delivery.

Sources: [packages/coding-agent/src/modes/rpc/rpc-client.ts:477-505](), [packages/coding-agent/src/modes/rpc/rpc-client.ts:404-450]()

---

## Session Rebinding Across Mode Changes

All three modes implement a `rebindSession` callback registered with the runtime via `runtimeHost.setRebindSession(...)`. When the user forks, clones, or switches sessions — which replaces the underlying `AgentSession` — the runtime calls this callback, prompting the mode to re-subscribe to the new session's event stream and re-bind extensions.

```ts
// packages/coding-agent/src/modes/rpc/rpc-mode.ts:306-349
runtimeHost.setRebindSession(async () => {
  await rebindSession();
});

const rebindSession = async (): Promise<void> => {
  session = runtimeHost.session;
  await session.bindExtensions({ uiContext: createExtensionUIContext(), ... });
  unsubscribe?.();
  unsubscribe = session.subscribe((event) => { output(event); });
};
```

Print mode does the same pattern. Interactive mode handles this within its class lifecycle. The rebind contract is part of the shared runtime interface, not mode-specific.

Sources: [packages/coding-agent/src/modes/rpc/rpc-mode.ts:306-349](), [packages/coding-agent/src/modes/print-mode.ts:67-108]()

---

## Protocol Flow Diagram

```
Host process                            Agent process (--mode rpc)
────────────────                        ──────────────────────────
                    stdin (JSONL)
{"type":"prompt","message":"...","id":"req_1"}  ──────────────────▶  handleInputLine()
                                                                           │
                                                                     session.prompt()
                                                                     preflightResult()
{"type":"response","command":"prompt","success":true,"id":"req_1"} ◀──── output()
                                                                           │ (streaming)
{"type":"token","text":"Hello"}                                    ◀──── output(event)
{"type":"token","text":" world"}                                   ◀──── output(event)
{"type":"agent_end"}                                               ◀──── output(event)

    (extension needs user pick)
{"type":"extension_ui_request","id":"ui-uuid","method":"select",...} ◀── output()
{"type":"extension_ui_response","id":"ui-uuid","value":"option-A"} ──────▶ handleInputLine()
                                                                           pending.resolve()
```

---

## What RPC Mode Reveals About the True API Surface

The RPC command set is the most honest inventory of `AgentSession`'s public capabilities. Every command maps directly to a session or runtime method:

- `session.prompt()`, `session.steer()`, `session.followUp()`, `session.abort()`
- `session.setModel()`, `session.cycleModel()`, `session.modelRegistry.getAvailable()`
- `session.setThinkingLevel()`, `session.cycleThinkingLevel()`
- `session.compact()`, `session.setAutoCompactionEnabled()`
- `session.executeBash()`, `session.abortBash()`
- `session.getSessionStats()`, `session.exportToHtml()`
- `runtimeHost.newSession()`, `runtimeHost.fork()`, `runtimeHost.switchSession()`
- `session.messages`, `session.extensionRunner.getRegisteredCommands()`, `session.promptTemplates`, `session.resourceLoader.getSkills()`

Interactive mode adds TUI rendering, keyboard shortcuts, OAuth login flows, clipboard integration, theming, and extension autocomplete — none of which changes how the session runs. Print mode removes persistence and stays single-shot. RPC mode proves that all of those are presentation concerns: the underlying API fits in 27 typed command variants.

The implication for extension authors: everything an extension can do in interactive mode that does not require direct TUI component access is available to RPC hosts via the `extension_ui_request` / `extension_ui_response` roundtrip, because `ExtensionUIContext` is the only abstraction between extension code and the surface mode.

Sources: [packages/coding-agent/src/modes/rpc/rpc-types.ts:19-69](), [packages/coding-agent/src/modes/rpc/rpc-mode.ts:370-660]()

---

## Summary

All three modes share the same `AgentSession` and `AgentSessionRuntime`. Interactive mode wraps the session in a full TUI class with dozens of imported components; print mode is a thin single-shot wrapper; RPC mode is a long-lived JSONL gateway. The key implementation work in each mode is the `ExtensionUIContext` adapter: RPC translates every UI request into a structured `extension_ui_request` line and suspends until the host responds, while TUI-specific methods (spinners, custom components, theme switching) are silently no-opped because they require direct renderer access that the subprocess boundary cannot provide. The `jsonl.ts` framing layer deliberately avoids `readline` to ensure U+2028/U+2029 inside JSON strings never split a record — a subtle invariant that any re-implementation must preserve.

Sources: [packages/coding-agent/src/modes/rpc/jsonl.ts:14-19]()

---