# The Agent Loop: 2,000 Lines of Tool Orchestration

> How agent_loop.py, agent_tools.py, and the tool_* modules turn a chat turn into a sequenced run of shell, file, web, memory, and MCP tool calls — including tool security, parsing, and the per-turn context compactor.

- Repository: pewdiepie-archdaemon/odysseus
- GitHub: https://github.com/pewdiepie-archdaemon/odysseus
- Human wiki: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124
- Complete Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/llms-full.txt

## Source Files

- `src/agent_loop.py`
- `src/agent_tools.py`
- `src/tool_execution.py`
- `src/tool_schemas.py`
- `src/tool_security.py`
- `src/context_compactor.py`
- `routes/chat_routes.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/agent_loop.py](src/agent_loop.py)
- [src/agent_tools.py](src/agent_tools.py)
- [src/tool_execution.py](src/tool_execution.py)
- [src/tool_schemas.py](src/tool_schemas.py)
- [src/tool_parsing.py](src/tool_parsing.py)
- [src/tool_security.py](src/tool_security.py)
- [src/context_compactor.py](src/context_compactor.py)
- [routes/chat_routes.py](routes/chat_routes.py)
</details>

# The Agent Loop: 2,000 Lines of Tool Orchestration

Odysseus's agent mode is one fat generator. `stream_agent_loop()` is ~900 lines of Python that takes a chat turn and walks a model through up to 20 rounds of tool execution, streaming SSE events the whole way. Around it sits a parser that has to handle five competing tool-call dialects, a dispatcher that fans out to MCP servers and forty-odd in-process implementations, a security policy that blocks shell/email/admin tools from non-admin owners, and a per-turn context compactor. Together that's ~2,100 lines anchoring everything an Odysseus chat can *do* — and it shows how much glue a serious tools-runtime actually needs.

This page walks through the moving parts: how a chat turn enters the loop, how the LLM's output is parsed into tool blocks, how tools are dispatched and budgeted, what the security gate refuses, and how the compactor keeps context from blowing up over long sessions.

## Why agent mode lives in one big generator

`stream_agent_loop` is an `AsyncGenerator[str, None]` that produces SSE-formatted `data: …` events. The HTTP layer (`routes/chat_routes.py:792-829`) just iterates it and forwards relevant event types to the browser. Everything stateful — round counter, native-vs-fenced tool parsing, stall detection, doc streaming, metrics, verifier — lives inside that one generator's locals.

```text
HTTP route ── SSE chunks ──► stream_agent_loop ── messages[] ──► stream_llm_with_fallback ──► provider
                                  │                                       │
                                  │◄────── tool_calls / deltas ──────────┘
                                  ▼
                          parse_tool_blocks  ──► execute_tool_block ──► MCP / native impl
                                  │                       │
                                  └──── format_tool_result ◄────────────┘
```

Sources: [src/agent_loop.py:1218-1246](), [routes/chat_routes.py:785-829]()

## The chat turn lifecycle

A single chat turn entering agent mode goes through six prep steps before any tool runs.

| Step | What happens | Where |
|---|---|---|
| 1. Admin & ownership check | `blocked_tools_for_owner()` adds shell/email/admin tools to `disabled_tools` for non-admins; MCP schemas are hidden | `agent_loop.py:1250-1257`, `tool_security.py:14-74` |
| 2. RAG tool selection | `get_tool_index().get_tools_for_query()` retrieves ~8 likely-relevant tool names; keyword hints fill in if the vector index is down | `agent_loop.py:1267-1325` |
| 3. Endpoint capability probe | DB lookup for `supports_tools` flag, then a model-name heuristic decides if function schemas are sent | `agent_loop.py:1336-1368` |
| 4. Prompt assembly | `_build_system_prompt` injects preamble, rules, per-tool sections, skill index, integrations, MCP descriptions | `agent_loop.py:1369-1376`, `agent_loop.py:912-998` |
| 5. Soft context trim | If estimated tokens exceed `agent_input_token_budget`, `trim_for_context()` drops older system messages first, then old turns | `agent_loop.py:1378-1404`, `context_compactor.py:98-172` |
| 6. Drop internal `_protected` markers | Strips Odysseus-only metadata before serializing to the LLM API | `agent_loop.py:1407` |

Each step's wall-clock time is captured into `prep_timings` and emitted as an `agent_prep` SSE event before the first model call. Sources: [src/agent_loop.py:1258-1409]()

## Five tool-call dialects, one parser

Different models emit tool calls in different shapes. `parse_tool_blocks()` in [src/tool_parsing.py:321-385]() tries five formats in priority order:

1. Fenced code blocks: ```` ```bash\n…\n``` ```` (the canonical format taught in the system prompt).
2. `[TOOL_CALL] {…} [/TOOL_CALL]` blocks.
3. XML `<tool_call><invoke name="…"><parameter name="…">…</parameter></invoke></tool_call>`.
4. MiniMax-style `<tool_code>{tool => 'name', args => …}</tool_code>`.
5. DeepSeek DSML markup with fullwidth-pipe delimiters (`<｜｜DSML｜｜tool_calls>`), normalized into form 3 by `_normalize_dsml()`.

The fenced-block regex is built from `TOOL_TAGS`, a 60-entry set declared in `agent_tools.py:29-61` that lists every legal tag — including the cookbook tools added because *"without these entries, native function calls to e.g. list_served_models are rejected as 'Unknown function call' before reaching the dispatcher"* ([src/agent_tools.py:46-50]()).

When the model uses native function calling instead, `function_call_to_tool_block()` ([src/tool_schemas.py:1016-1171]()) does the inverse: parse JSON arguments and rebuild the text content each tool expects. Email tools get auto-routed to the `mcp__email__…` namespace there, and an unknown name falls through to `mcp__` if it starts with that prefix.

Sources: [src/tool_parsing.py:22-83](), [src/tool_parsing.py:321-385](), [src/tool_schemas.py:1016-1171]()

## Inside one round

Each iteration of the `for round_num in range(1, max_rounds + 1)` loop does this:

```mermaid
flowchart TD
    A[Stream LLM round] -->|deltas + tool_calls| B[_resolve_tool_blocks]
    B -->|native or parsed| C{tool_blocks empty?}
    C -->|yes| D[Verifier? Done.]
    C -->|no| E[Loop-breaker check]
    E -->|stuck or runaway| F[Force-answer next round]
    E -->|ok| G[For each block]
    G --> H[Budget check]
    H --> I[execute_tool_block]
    I --> J[format_tool_result]
    J --> K[Append to messages]
    K --> A
```

Sources: [src/agent_loop.py:1444-2074]()

**Streaming.** The loop forwards raw `delta` chunks to the frontend as they arrive, and intercepts three other event types it gets from `stream_llm`: `tool_call_delta` (incremental native-call JSON, used to live-stream `create_document` content into the editor panel), `tool_calls` (the final list of native calls), and `usage` (real token counts when the provider reports them) ([src/agent_loop.py:1525-1632]()).

**Loop-breaker.** A `deque` of recent call signatures and a `Counter` of per-tool calls detect two stall modes: repeating the same call without writing prose for 4 rounds, or firing one tool ≥15 times. When either trips, `_force_answer = True` and the next round is run with `tools=None` plus a system note telling the model to write the answer or declare blocked. If the model still emits no prose, a "grace synthesis" non-streaming call is made over the same message history to salvage an answer ([src/agent_loop.py:1772-1825], [src/agent_loop.py:1661-1690]()).

**Per-tool progress streaming.** Long bash/python jobs run inside `_run_subprocess_streaming()` which keeps a 12-line tail ring buffer and pushes `{elapsed_s, tail}` payloads every 2 s through an `asyncio.Queue` the loop drains while awaiting the tool task — the UI shows live elapsed-time without the loop blocking on the subprocess ([src/tool_execution.py:59-167](), [src/agent_loop.py:1887-1913]()).

**Completion verifier (opt-in).** If `_effectful_used` (a write_file/bash/python/document tool ran) and `agent_verifier_subagent` is on, a fresh-context model call reads only the user request + an actions snapshot and decides `VERIFICATION: SUCCESS|FAIL`. The setting is off by default because *"on weak local models the verifier can't judge from the action-snapshot … and false-rejects … forces a costly extra round every effectful turn"* ([src/agent_loop.py:1742-1769]()).

## Tool dispatcher: MCP, native, or admin-blocked

`execute_tool_block()` in [src/tool_execution.py:477-731]() is a big elif chain. Before dispatch it runs three gates:

1. **Misformatted JSON detection** — a `{…}` JSON object inside a `python`/`json`/`xml` fence triggers a teaching error explaining the correct fence tag.
2. **User-disabled tools** — anything in `disabled_tools` returns `{"error": "…disabled by user."}`.
3. **Admin gate** — `_ADMIN_TOOLS` plus `is_public_blocked_tool()` reject anything sensitive when the owner isn't admin (or auth isn't configured at all, which is treated as single-user).

The bash background marker (`#!bg` as first line) is handled before any normal dispatch: the command is launched detached via `bg_jobs.launch()` and the agent gets back a job id immediately, with monitoring re-invoking the agent when the job exits ([src/tool_execution.py:566-584]()).

After gates, dispatch splits by category:

| Category | Path | Notes |
|---|---|---|
| MCP-extracted (bash, python, web_search, read_file, write_file, generate_image, manage_memory) | `_call_mcp_tool` then `_direct_fallback` | Fallback runs the work in-process when MCP server isn't connected |
| Document tools | `do_create_document`, `do_update_document`, `do_edit_document`, `do_suggest_document` | Frontend gets a `doc_update` SSE event with the real doc id |
| AI/session dispatcher | `dispatch_ai_tool` | chat_with_model, create_session, list_sessions, send_to_session, pipeline, manage_session, manage_memory, list_models, ui_control, ask_teacher |
| Admin/management | `do_manage_*` | Tasks, skills, endpoints, MCP servers, webhooks, tokens, documents, settings, notes, calendar |
| Cookbook LLM serving | `do_download_model`, `do_serve_model`, `do_list_*`, `do_serve_preset`, `do_adopt_served_model` | Backed by tmux + a cookbook state file the UI watches |
| Generic | `do_app_api` (loopback to any UI-button HTTP endpoint), `mcp__*` (raw MCP call) | |

Sources: [src/tool_execution.py:477-731](), [src/agent_tools.py:29-61]()

## What the `bash` block really does

The fenced `bash` path is worth a closer look because it's where most of the security exposure lives. After the dialect parser hands `execute_tool_block` a `ToolBlock("bash", "<cmd>")`, the runtime:

1. Adds `disabled_tools`-check and the public-policy check; non-admins always see `bash` in `NON_ADMIN_BLOCKED_TOOLS` so the call is refused before any subprocess starts ([src/tool_security.py:14-46]()).
2. If the first line is `#!bg`, the command is handed to `bg_jobs.launch()` and detached.
3. Otherwise, `_direct_fallback` calls `asyncio.create_subprocess_shell()` with a hardened env: real `os.environ`, but `TERM=xterm-256color`, `COLUMNS=120`, `LINES=40` so commands that probe terminfo don't fail ([src/tool_execution.py:317-322]()).
4. `_run_subprocess_streaming` watches stdout/stderr line-by-line, kills the process on `asyncio.CancelledError` (chat stop button), and enforces a `DEFAULT_BASH_TIMEOUT` of 1 hour with a TERM→SIGKILL ladder.

Output is truncated to `MAX_OUTPUT_CHARS = 10_000` and stderr is appended onto stdout as `STDERR: …`. Sources: [src/tool_execution.py:33-44](), [src/tool_execution.py:293-344]()

## Security: what gets refused

`tool_security.py` declares the entire non-admin denylist as one set:

```python
# src/tool_security.py:14-46
NON_ADMIN_BLOCKED_TOOLS = {
    "bash", "python", "read_file", "write_file",
    "search_chats", "manage_memory", "manage_skills",
    "manage_tasks", "manage_endpoints", "manage_mcp",
    "manage_webhooks", "manage_tokens", "manage_documents",
    "manage_settings", "api_call", "app_api",
    "send_email", "reply_to_email", "list_emails", "read_email",
    "resolve_contact", "manage_contact", "manage_calendar",
    "vault_search", "vault_get", "vault_unlock",
    "download_model", "serve_model", "stop_served_model",
    "cancel_download", "adopt_served_model",
}
```

`is_public_blocked_tool()` returns True for any name in that set or any name starting with `mcp__`. `owner_is_admin_or_single_user()` short-circuits to True when `AuthManager().is_configured` is False — a deliberate choice that keeps the dev/single-user setup wide open while the multi-tenant deployment is locked down.

The check happens in two places that both matter:

- **Schema scrubbing** in `stream_agent_loop` so blocked tools never even reach the LLM's tool list ([src/agent_loop.py:1250-1257]()).
- **Execution refusal** in `execute_tool_block` so a model that imagines a tool name out of training data still can't run it ([src/tool_execution.py:550-560]()).

The two-layer design matters: schema scrubbing makes the model less likely to try, but doesn't trust the model to actually obey.

Sources: [src/tool_security.py:1-74](), [src/agent_loop.py:1250-1257](), [src/tool_execution.py:544-560]()

## Context compaction: trim now, summarize later

Two independent mechanisms keep the message list from outgrowing the model's window.

**Per-turn soft trim** (`trim_for_context`, [src/context_compactor.py:98-172]()) runs *every* round of `stream_agent_loop` before sending. It walks a priority ladder:

1. Drop extra system messages (RAG context, memory) but keep the first system prompt.
2. Add some back if budget allows.
3. Truncate the kept system prompt to 2 000 chars with a "[System prompt truncated…]" marker.
4. Drop oldest non-system turns, but protect the last 10.

It also runs `_sanitize_tool_messages` ([src/context_compactor.py:52-95]()) which drops orphan `role: "tool"` messages whose parent assistant `tool_calls` got trimmed away — a real-world OpenAI-API constraint: *"messages with role 'tool' must be a response to a preceding message with 'tool_calls'"*. Without that pass, front-trimming can produce a request the API just rejects.

**Cross-turn compaction** (`maybe_compact`, [src/context_compactor.py:175-272]()) is the heavier one, triggered when token usage crosses 85% of the context window. It splits the conversation in half, summarizes the older half with a Cursor-style structured prompt (User Goal / What Was Done / Current State / Pending / Key Context), and replaces those messages with one system message that prefixes the summary with `[Conversation summary …]`. The compaction model is routed through `resolve_endpoint("utility")` so a small/cheap model can do the busy-work without dragging in the main one.

Sources: [src/context_compactor.py:18-49](), [src/context_compactor.py:52-95](), [src/context_compactor.py:175-272]()

## Surprising details

A few things in here aren't what you'd guess from reading the README:

- **The auto-document fallback** ([src/agent_loop.py:1691-1720]()) watches chat output for any unrequested code block longer than 30 lines and synthesizes a `create_document` tool block on the model's behalf — so a model that ignores the "never paste long code in chat" rule still produces an artifact in the editor panel.
- **DSML normalization** ([src/tool_parsing.py:70-83]()) exists solely because DeepSeek sometimes emits raw markup with fullwidth-pipe delimiters when its provider didn't parse tool schemas correctly. Odysseus rewrites that into the standard XML shape before parsing *and* stripping, so the garbage never reaches the user.
- **Per-endpoint `supports_tools`** ([src/agent_loop.py:1336-1368]()) is stored in the DB and set by the cookbook serve command. A vLLM run with `--enable-auto-tool-choice` flips it on at registration time — that means the agent's tool-schema choice is partly a function of how the model was launched.
- **`reasoning_content` echo-back** ([src/agent_loop.py:1040-1080]()) — DeepSeek's API rejects follow-up requests in thinking mode that don't include the prior reasoning, so the loop accumulates reasoning deltas separately and attaches them to the assistant message on the next round. Other vendors ignore the extra field.
- **Force-answer round** ([src/agent_loop.py:1461-1465]()) is the only place tools are sent as `None`. The loop-breaker uses it as a hard escape hatch — even if the model wants to call another tool, no schema means no call to make.
- **The `bash`/`python` timeout is one hour**, not the 60 s the README-era comment in `agent_tools.py` still says. The author moved it to a per-tool constant in `tool_execution.py:33-34` after seeing the agent *"go silent because it had nothing to report"* when 60 s killed real installs.

## What builders should notice

If you're putting together your own tool-using agent runtime, the shape here is worth borrowing:

- **Two-layer tool security** — scrub schemas *and* refuse at the dispatcher. The model is not your access control.
- **Multi-dialect parsing as a hard requirement** — once you support more than one model family, you're going to ship a regex jungle. Build it on day one.
- **Stall detection separate from rounds** — round budget alone doesn't stop a stuck model; signature-based loop-breakers do, with a clean handoff to a "force-answer" round.
- **Compaction is two problems, not one** — there's the per-turn trim that needs to be cheap and synchronous, and the cross-turn summary that needs an LLM call. Don't fuse them.
- **Progress events from inside subprocess tools** — the difference between "the chat looks dead" and "the user can see what's happening" is one async queue.

The whole thing isn't elegant — `stream_agent_loop` is the kind of function code-review checklists were invented to complain about. But the messiness is doing real work: every loop-breaker, every fallback, every normalize pass exists because some specific model misbehaved in a specific way against a specific provider, and the author chose to absorb the variance in glue rather than push it back onto users.

Sources: [src/agent_loop.py:1218-2106](), [src/tool_execution.py:477-731](), [src/tool_security.py:14-74](), [src/context_compactor.py:98-272]()