# Compaction: When the Context Window Is the Enemy, What Gets Thrown Away?

> As conversations grow, token counts approach the context window limit. The compaction subsystem (core/compaction/) answers: when to compact, which messages to summarize, how branch-level summaries differ from turn-level summaries, and how the agent resumes coherently after a compaction round. The tests agent-session-compaction.test.ts and harness/compaction.test.ts reveal the boundary conditions.

- Repository: earendil-works/pi
- GitHub: https://github.com/earendil-works/pi
- Human wiki: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234
- Complete Markdown: https://grok-wiki.com/public/wiki/earendil-works-pi-8b87608fc234/llms-full.txt

## Source Files

- `packages/coding-agent/src/core/compaction/compaction.ts`
- `packages/coding-agent/src/core/compaction/branch-summarization.ts`
- `packages/coding-agent/src/core/compaction/utils.ts`
- `packages/coding-agent/test/suite/agent-session-compaction.test.ts`
- `packages/agent/test/harness/compaction.test.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [packages/coding-agent/src/core/compaction/compaction.ts](packages/coding-agent/src/core/compaction/compaction.ts)
- [packages/coding-agent/src/core/compaction/branch-summarization.ts](packages/coding-agent/src/core/compaction/branch-summarization.ts)
- [packages/coding-agent/src/core/compaction/utils.ts](packages/coding-agent/src/core/compaction/utils.ts)
- [packages/coding-agent/src/core/agent-session.ts](packages/coding-agent/src/core/agent-session.ts)
- [packages/coding-agent/test/suite/agent-session-compaction.test.ts](packages/coding-agent/test/suite/agent-session-compaction.test.ts)
- [packages/agent/test/harness/compaction.test.ts](packages/agent/test/harness/compaction.test.ts)
</details>

# Compaction: When the Context Window Is the Enemy, What Gets Thrown Away?

Every conversation with a language model is a race against a fixed budget. Token counts accumulate — user prompts, assistant replies, tool calls, tool results, images, bash output — and eventually the window fills. When that happens, the model receives an overflow error or, worse, silently degrades. The compaction subsystem answers this problem by deciding, precisely and reversibly, what history to discard and what to keep, and by generating a structured summary that lets the agent resume as if it remembered everything.

This page traces the full lifecycle: how the system detects that a compaction is needed, where the cut point falls, which messages are summarized versus kept, how branch-level summarization differs from turn-level summarization, and how the agent reconstructs coherent state after a compaction round.

---

## Why the Problem Is Harder Than It Looks

The naive answer — "drop the oldest messages" — breaks tool integrity. A `toolResult` message must follow its parent `toolCall`; cutting between them corrupts the conversation structure. Equally, slicing in the middle of a multi-step assistant turn (one user prompt generating several tool calls and partial replies) leaves the kept suffix without sufficient context. The compaction system must reason about conversational boundaries, not raw token counts.

---

## When Does Compaction Trigger?

The session checks for compaction in two places, both via the private `_checkCompaction` method in `AgentSession`:

1. **After every assistant response** — normal housekeeping.
2. **Before resending a prompt**, if the last assistant message was aborted.

Two trigger modes exist:

| Trigger | Condition | `willRetry` |
|---------|-----------|-------------|
| **Overflow** | `isContextOverflow(assistantMessage)` — the LLM returned a "prompt is too long" error | `true` — agent retries immediately after compaction |
| **Threshold** | `contextTokens > contextWindow - reserveTokens` | `false` — next user turn is fine |

The threshold check uses `calculateContextTokens` from actual `usage` data when available, falling back to `estimateContextTokens` for error messages (which carry no usage). For error messages, the system walks backwards to find the last *successful* assistant response's usage, then estimates trailing tokens from messages after that point.

```typescript
// packages/coding-agent/src/core/compaction/compaction.ts:219-222
export function shouldCompact(contextTokens: number, contextWindow: number, settings: CompactionSettings): boolean {
  if (!settings.enabled) return false;
  return contextTokens > contextWindow - settings.reserveTokens;
}
```

Default settings: `reserveTokens: 16384`, `keepRecentTokens: 20000`, `enabled: true`.

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:115-125](), [packages/coding-agent/src/core/agent-session.ts:1768-1845]()

---

## The Cut Point: Where to Slice History

Once a compaction is decided, `prepareCompaction` computes exactly which session entries get summarized and which get kept.

### The Budget Walk

`findCutPoint` walks the entry list **backwards from newest to oldest**, accumulating estimated token sizes. When the running total reaches `keepRecentTokens`, it stops and marks that position as the cut point.

```typescript
// packages/coding-agent/src/core/compaction/compaction.ts:386-448
export function findCutPoint(entries, startIndex, endIndex, keepRecentTokens): CutPointResult {
  // Walks backwards, accumulates tokens, stops at keepRecentTokens
  // Returns firstKeptEntryIndex, turnStartIndex, isSplitTurn
}
```

### Valid Cut Points

Not all entries are eligible cut points. `findValidCutPoints` excludes `toolResult` messages entirely — they cannot stand alone without their preceding tool call. Valid positions are:

- `user` messages
- `assistant` messages (when cut here, subsequent tool results come with them)
- `bashExecution` messages (treated as user-role context)
- `branch_summary` and `custom_message` entries (they represent user-initiated context)

Non-message entries like `model_change`, `thinking_level_change`, `label`, and `compaction` markers are never cut points.

### Split Turn Detection

A "split turn" occurs when the cut falls inside a turn — for example, when a user's request spawned a very long assistant response whose prefix must be dropped but whose suffix must be kept. In this case:

- `isSplitTurn: true`
- `turnStartIndex` points to the user message that opened the turn
- `turnPrefixMessages` holds the discarded prefix of that turn
- `firstKeptEntryIndex` is inside the turn, not at a turn boundary

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:299-448]()

---

## Token Estimation

Because token counts cannot always be read from the LLM's usage field (especially for messages that haven't been sent yet), the system uses a conservative `chars / 4` heuristic:

```typescript
// packages/coding-agent/src/core/compaction/compaction.ts:232-290
export function estimateTokens(message: AgentMessage): number {
  // user/assistant/custom/toolResult/bashExecution/branchSummary/compactionSummary
  // images: estimated at 4800 chars (≈1200 tokens)
  return Math.ceil(chars / 4);
}
```

Images are hard-coded at 4,800 char-equivalents (1,200 tokens), a deliberate overestimate to be conservative.

`estimateContextTokens` combines real usage data with estimated trailing tokens:

```
total = lastAssistantUsageTokens + estimatedTokensAfterLastAssistant
```

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:186-214]()

---

## What Gets Summarized vs. Kept

```text
Session entries (chronological):
  ┌──────────────────────────────────────────────────┐
  │  [prior compaction marker]                        │
  │  [u1] user msg                                    │  ← boundaryStart
  │  [a1] assistant msg                               │
  │  [u2] user msg                                    │  ← historyEnd (to summarize)
  │  [a2] assistant msg (start of last kept turn)     │  ← firstKeptEntryIndex (kept)
  │  [u3] user msg                                    │
  │  [a3] assistant msg (most recent)                 │  ← boundaryEnd
  └──────────────────────────────────────────────────┘

  messagesToSummarize = [u1..a1] → discarded, replaced by summary
  kept = [a2..a3] → passed to LLM as context
```

For a **split turn**:

```text
  ┌──────────────────────────────────────────────────┐
  │  [u1] user msg (very long turn starts)            │  ← turnStartIndex
  │  [a1] assistant reply part 1                      │  ← turnPrefixMessages (summarized separately)
  │  [a2] assistant reply part 2 (suffix, kept)       │  ← firstKeptEntryIndex
  └──────────────────────────────────────────────────┘
```

In the split-turn case, two LLM summarization calls run **in parallel**:

1. History summary (prior turns) — uses `SUMMARIZATION_PROMPT` or `UPDATE_SUMMARIZATION_PROMPT`
2. Turn-prefix summary — uses `TURN_PREFIX_SUMMARIZATION_PROMPT` (smaller budget: 50% of `reserveTokens`)

The final compaction summary is their concatenation, separated by a horizontal rule and a `**Turn Context (split turn):**` header.

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:644-830]()

---

## The Summary Format

The LLM is asked to produce a structured Markdown checkpoint following a rigid template:

```
## Goal
## Constraints & Preferences
## Progress
  ### Done
  ### In Progress
  ### Blocked
## Key Decisions
## Next Steps
## Critical Context
```

For repeat compactions, an `UPDATE_SUMMARIZATION_PROMPT` is used instead, instructing the model to merge new activity into the `previousSummary` from the prior compaction — preserving history across multiple compaction rounds without re-processing everything from scratch.

The system prompt for the summarization LLM call explicitly prevents conversation continuation:

```typescript
// packages/coding-agent/src/core/compaction/utils.ts:168-170
export const SUMMARIZATION_SYSTEM_PROMPT = `You are a context summarization assistant...
Do NOT continue the conversation. ONLY output the structured summary.`;
```

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:454-524](), [packages/coding-agent/src/core/compaction/utils.ts:168-171]()

---

## File Operation Tracking

Both compaction and branch summarization track which files the agent read or modified during the discarded conversation. This is appended to the summary as XML tags:

```xml
<read-files>
src/index.ts
</read-files>

<modified-files>
src/core/session-manager.ts
</modified-files>
```

File operations are collected from:
1. Tool calls in the messages being discarded (`read`, `write`, `edit` tool names)
2. The `details` field of the previous compaction entry (cumulative carry-forward)

`modifiedFiles` = union of `write` and `edit` operations. `readFiles` = files read but never modified in the same session window.

```typescript
// packages/coding-agent/src/core/compaction/utils.ts:62-66
export function computeFileLists(fileOps: FileOperations): { readFiles: string[]; modifiedFiles: string[] } {
  const modified = new Set([...fileOps.edited, ...fileOps.written]);
  const readOnly = [...fileOps.read].filter((f) => !modified.has(f)).sort();
  ...
}
```

Sources: [packages/coding-agent/src/core/compaction/utils.ts:29-82]()

---

## Branch Summarization: A Different Kind of Summary

Turn-level compaction discards old history when the window fills. Branch summarization is a different concept: when the user navigates away from one conversation branch (in a tree-structured session), the branch being abandoned is summarized so that its context is not lost when the user returns.

### Key Differences

| Aspect | Turn Compaction | Branch Summarization |
|--------|----------------|---------------------|
| Trigger | Context window pressure | Branch navigation |
| Cut point | `keepRecentTokens` budget | No cut — whole branch |
| Token budget | `keepRecentTokens` for kept portion | `contextWindow - reserveTokens` |
| Compaction entry? | Yes, written to session | No — creates `branch_summary` entry |
| Previous summary? | Yes, iterative update | No |
| Summary format | Full structured checkpoint | Branch-focused summary |
| Preamble | None | "The user explored a different conversation branch before returning here." |

Branch summarization walks the session tree from the old leaf position back to the **common ancestor** with the target position, collects those entries in chronological order, and summarizes them. Compaction boundaries inside the branch are **not** stopped at — existing compaction summaries are included as context.

```typescript
// packages/coding-agent/src/core/compaction/branch-summarization.ts:98-136
export function collectEntriesForBranchSummary(session, oldLeafId, targetId): CollectEntriesResult {
  // Find common ancestor via set intersection of branch paths
  // Walk from old leaf to ancestor, collecting entries
}
```

When a branch is very long, `prepareBranchEntries` uses a **newest-first** walk with a token budget, keeping the most recent context if the branch cannot fit in the window. Unlike turn compaction, summaries and compaction entries can be squeezed in past the soft budget limit (up to 90% consumed) because they carry high-value context.

Sources: [packages/coding-agent/src/core/compaction/branch-summarization.ts:86-237]()

---

## The Compaction Lifecycle (Session Layer)

```mermaid
stateDiagram-v2
    [*] --> Running: agent prompt
    Running --> CheckCompaction: assistant response received
    CheckCompaction --> Running: below threshold
    CheckCompaction --> AutoCompaction: threshold exceeded
    CheckCompaction --> OverflowCompaction: LLM overflow error
    AutoCompaction --> ExtensionHook: session_before_compact event
    OverflowCompaction --> ExtensionHook: session_before_compact event
    ExtensionHook --> Summarizing: proceed (or extension provides summary)
    ExtensionHook --> Cancelled: extension cancels
    Summarizing --> SessionReload: summary saved to session file
    SessionReload --> Running: agent state rebuilt from new context
    OverflowCompaction --> Running: willRetry=true, agent.continue()
    Cancelled --> Running: no-op
```

The session emits `compaction_start` and `compaction_end` events for observability. The `compaction_end` event carries the result, the reason (`"manual"`, `"threshold"`, or `"overflow"`), and whether a retry is pending.

### Overflow Recovery Guard

A guard prevents infinite overflow loops: if overflow recovery has already been attempted once in the current turn, subsequent overflow errors are reported as a fatal event instead of triggering another compaction:

```typescript
// packages/coding-agent/src/core/agent-session.ts:1796
if (this._overflowRecoveryAttempted) {
  this._emit({ type: "compaction_end", reason: "overflow", result: undefined, ... });
  // message: "Context overflow recovery failed after one compact-and-retry attempt."
}
```

Sources: [packages/coding-agent/src/core/agent-session.ts:1768-2020]()

---

## Extension Hook: Custom Compaction

Extensions can intercept compaction via `session_before_compact`. The event carries a `CompactionPreparation` object with `firstKeptEntryId`, `messagesToSummarize`, `tokensBefore`, and `fileOps`. An extension can either:

- Return `{ compaction: { summary, firstKeptEntryId, tokensBefore, details } }` to supply its own summary (skipping the LLM call)
- Return `{ cancel: true }` to cancel the compaction
- Return nothing to let the default summarization run

This is how structured artifact indices (e.g., ArtifactIndex) are wired into compaction: the extension generates richer metadata and stores it in `details`, which is carried forward in the `CompactionEntry`.

Sources: [packages/coding-agent/test/suite/agent-session-compaction.test.ts:97-123]()

---

## Resuming After Compaction: How the Agent Reconstructs State

After a compaction, the session is reloaded. `buildSessionContext` walks the session entries from root to leaf and, when it encounters a `compaction` entry, emits a `compactionSummary` role message — a synthetic message type that the LLM sees instead of all the discarded turns:

```typescript
// packages/agent/test/harness/compaction.test.ts:316-327
it("builds session context with a compaction entry", () => {
  const loaded = buildSessionContext([u1, a1, u2, a2, compaction, u3, a3]);
  expect(loaded.messages).toHaveLength(5);
  expect(loaded.messages[0]?.role).toBe("compactionSummary");
});
```

The agent resumes from this reconstructed context. The LLM receives: one `compactionSummary` message containing the structured checkpoint, followed by all kept messages. From the model's perspective, history before the compaction is a single coherent narrative rather than a raw message dump.

---

## Boundary Conditions (from Tests)

The test suites encode the edge cases that the implementation must handle correctly:

| Scenario | Behavior |
|----------|----------|
| Last session entry is a `compaction` marker | `prepareCompaction` returns `undefined` — nothing to compact |
| Error message with no prior usage | No threshold compaction; cannot estimate context size |
| Error message with prior successful usage | Uses last successful usage + estimated trailing tokens |
| Stale pre-compaction usage kept across boundary | Ignored — timestamp check prevents false threshold trigger |
| `toolResult` at cut-point search | Never a valid cut point; `findCutPoint` skips it |
| All entries are `thinking_level_change` / `model_change` | `findCutPoint` falls back to `firstKeptEntryIndex: 0`, `isSplitTurn: false` |
| `branch_summary` / `custom_message` at entry | Valid cut point and valid turn-start marker |
| Compaction entry between user and assistant | `findCutPoint` stops backward scan at compaction boundary |
| Overflow recovery already attempted | Fatal error emitted, no second retry |
| `maxTokens` capped by model output limit | `Math.min(0.8 * reserveTokens, model.maxTokens)` |

Sources: [packages/agent/test/harness/compaction.test.ts:150-670](), [packages/coding-agent/test/suite/agent-session-compaction.test.ts:86-407]()

---

## Summary

The compaction subsystem is a surgical trimmer, not a blunt truncator. It preserves the most recent `keepRecentTokens` of conversation history by walking backwards from the newest message, finds a structurally valid cut point (never inside a tool call/result pair), and replaces everything older with a structured LLM-generated summary that updates iteratively across multiple compaction rounds. Branch navigation triggers a separate but parallel mechanism that summarizes abandoned session tree branches rather than overflow history. After any compaction, the agent resumes from a reconstructed context where the entire discarded history is visible to the model as a single `compactionSummary` role message — keeping coherence without keeping tokens.

Sources: [packages/coding-agent/src/core/compaction/compaction.ts:644-831]()