Agent-readable wiki
Odysseus Tech Reader Brief
A source-grounded breakdown of Odysseus, a self-hosted, batteries-included AI workspace that bundles a chat UI, an MCP-powered agent loop, hardware-aware model recommendations, multi-step deep research, and durable memory into a single FastAPI monolith.
Pages
- Why This Repo Matters: A Self-Hosted ChatGPT With Batteries IncludedThe hook: Odysseus packs chat, an agent, a hardware-aware model recommender, deep research, memory, email, calendar, and notes into one FastAPI app you can run on your own box — what is special, and what to notice before opening the code.
- From Clone to Localhost: The Bundled StackDocker Compose, env defaults, and bundled sidecars (ChromaDB, SearXNG, ntfy) — the non-obvious pieces of the install path, including the auto-generated admin password and the SSH key flow for Cookbook remote servers.
- The FastAPI Monolith: app.py and the 45-Router MountHow a single ~950-line app.py wires authentication, middleware, and ~45 route modules into one server — and what the routes/ directory reveals about the surface area (chat, sessions, documents, memory, MCP, webhooks, vaults, and more).
- The Agent Loop: 2,000 Lines of Tool OrchestrationHow agent_loop.py, agent_tools.py, and the tool_* modules turn a chat turn into a sequenced run of shell, file, web, memory, and MCP tool calls — including tool security, parsing, and the per-turn context compactor.
- Local MCP Servers: Email, Images, Memory, RAG as First-Class ToolsThe mcp_servers/ tree ships Odysseus-native Model Context Protocol servers so the agent can call email, image generation, memory, and RAG through the same MCP interface used for third-party tools — wired through mcp_manager and builtin_mcp.
- Cookbook: Hardware-Aware Model Recommendations via hwfitHow services/hwfit/ scans GPU/CPU/RAM, fits GGUF/FP8/AWQ candidates against the box, and proposes a download-and-serve plan — the practical heart of the "click to install a local LLM" pitch, adapted from the llmfit library.
- Deep Research: A Multi-Step Synthesis Engine in 800 LinesThe deep_research module adapts Alibaba's Tongyi DeepResearch pattern into a self-contained loop that plans, searches, reads, and produces a visual report — and how SearXNG, the search service, and the visual_report renderer fit together.
- Memory & Skills: ChromaDB + Skill Extraction That EvolvesHow the memory subsystem combines ChromaDB vector storage, fastembed ONNX embeddings, keyword fallback, and a skill_extractor that distills recurring patterns into reusable skills — the mechanism behind the "your agent gets better over time" claim.
- Email, Calendar, Notes: The Personal-Productivity SideIMAP/SMTP triage with AI auto-tagging, CalDAV sync to Radicale / Nextcloud / Apple / Fastmail, ntfy-channel reminders, and cron-style tasks the agent can act on — features that lean on background pollers and per-account routing.
- Builder Takeaways: What's Surprising, What's Hard, What to WatchClosing synthesis: the monolith-with-sidecars architecture, the breadth of admin-only surface area, the security implications called out in SECURITY.md, and the roadmap items the maintainers explicitly want help on.
Complete Markdown
# Odysseus Tech Reader Brief
> A source-grounded breakdown of Odysseus, a self-hosted, batteries-included AI workspace that bundles a chat UI, an MCP-powered agent loop, hardware-aware model recommendations, multi-step deep research, and durable memory into a single FastAPI monolith.
## Context Links
- [Agent index](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/llms.txt)
- [Human interactive wiki](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124)
- [GitHub repository](https://github.com/pewdiepie-archdaemon/odysseus)
## Repository Metadata
- Repository: pewdiepie-archdaemon/odysseus
- Generated: 2026-05-31T19:54:15.859Z
- Updated: 2026-05-31T19:57:17.833Z
- Runtime: Claude Code
- Format: Tech Reader Brief
- Pages: 10
## Page Index
- 01. [Why This Repo Matters: A Self-Hosted ChatGPT With Batteries Included](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/01-why-this-repo-matters-a-self-hosted-chatgpt-with-batteries-included.md) - The hook: Odysseus packs chat, an agent, a hardware-aware model recommender, deep research, memory, email, calendar, and notes into one FastAPI app you can run on your own box — what is special, and what to notice before opening the code.
- 02. [From Clone to Localhost: The Bundled Stack](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/02-from-clone-to-localhost-the-bundled-stack.md) - Docker Compose, env defaults, and bundled sidecars (ChromaDB, SearXNG, ntfy) — the non-obvious pieces of the install path, including the auto-generated admin password and the SSH key flow for Cookbook remote servers.
- 03. [The FastAPI Monolith: app.py and the 45-Router Mount](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/03-the-fastapi-monolith-app.py-and-the-45-router-mount.md) - How a single ~950-line app.py wires authentication, middleware, and ~45 route modules into one server — and what the routes/ directory reveals about the surface area (chat, sessions, documents, memory, MCP, webhooks, vaults, and more).
- 04. [The Agent Loop: 2,000 Lines of Tool Orchestration](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/04-the-agent-loop-2-000-lines-of-tool-orchestration.md) - How agent_loop.py, agent_tools.py, and the tool_* modules turn a chat turn into a sequenced run of shell, file, web, memory, and MCP tool calls — including tool security, parsing, and the per-turn context compactor.
- 05. [Local MCP Servers: Email, Images, Memory, RAG as First-Class Tools](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/05-local-mcp-servers-email-images-memory-rag-as-first-class-tools.md) - The mcp_servers/ tree ships Odysseus-native Model Context Protocol servers so the agent can call email, image generation, memory, and RAG through the same MCP interface used for third-party tools — wired through mcp_manager and builtin_mcp.
- 06. [Cookbook: Hardware-Aware Model Recommendations via hwfit](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/06-cookbook-hardware-aware-model-recommendations-via-hwfit.md) - How services/hwfit/ scans GPU/CPU/RAM, fits GGUF/FP8/AWQ candidates against the box, and proposes a download-and-serve plan — the practical heart of the "click to install a local LLM" pitch, adapted from the llmfit library.
- 07. [Deep Research: A Multi-Step Synthesis Engine in 800 Lines](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/07-deep-research-a-multi-step-synthesis-engine-in-800-lines.md) - The deep_research module adapts Alibaba's Tongyi DeepResearch pattern into a self-contained loop that plans, searches, reads, and produces a visual report — and how SearXNG, the search service, and the visual_report renderer fit together.
- 08. [Memory & Skills: ChromaDB + Skill Extraction That Evolves](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/08-memory-skills-chromadb-skill-extraction-that-evolves.md) - How the memory subsystem combines ChromaDB vector storage, fastembed ONNX embeddings, keyword fallback, and a skill_extractor that distills recurring patterns into reusable skills — the mechanism behind the "your agent gets better over time" claim.
- 09. [Email, Calendar, Notes: The Personal-Productivity Side](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/09-email-calendar-notes-the-personal-productivity-side.md) - IMAP/SMTP triage with AI auto-tagging, CalDAV sync to Radicale / Nextcloud / Apple / Fastmail, ntfy-channel reminders, and cron-style tasks the agent can act on — features that lean on background pollers and per-account routing.
- 10. [Builder Takeaways: What's Surprising, What's Hard, What to Watch](https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/10-builder-takeaways-what-s-surprising-what-s-hard-what-to-watch.md) - Closing synthesis: the monolith-with-sidecars architecture, the breadth of admin-only surface area, the security implications called out in SECURITY.md, and the roadmap items the maintainers explicitly want help on.
## Source File Index
- `.env.example`
- `ACKNOWLEDGMENTS.md`
- `app.py`
- `core/auth.py`
- `core/database.py`
- `core/middleware.py`
- `core/session_manager.py`
- `docker-compose.yml`
- `Dockerfile`
- `install-service.sh`
- `mcp_servers/email_server.py`
- `mcp_servers/image_gen_server.py`
- `mcp_servers/memory_server.py`
- `mcp_servers/rag_server.py`
- `odysseus-ui.service`
- `pyproject.toml`
- `README.md`
- `requirements.txt`
- `ROADMAP.md`
- `routes/calendar_routes.py`
- `routes/chat_routes.py`
- `routes/cookbook_helpers.py`
- `routes/cookbook_routes.py`
- `routes/email_helpers.py`
- `routes/email_pollers.py`
- `routes/email_routes.py`
- `routes/hwfit_routes.py`
- `routes/mcp_routes.py`
- `routes/note_routes.py`
- `routes/research_routes.py`
- `routes/task_routes.py`
- `SECURITY.md`
- `services/hwfit/fit.py`
- `services/hwfit/hardware.py`
- `services/hwfit/image_models.py`
- `services/hwfit/models.py`
- `services/memory/memory_vector.py`
- `services/memory/memory.py`
- `services/memory/skill_extractor.py`
- `services/memory/skills.py`
- `services/research/service.py`
- `services/search/core.py`
- `setup.py`
- `src/agent_loop.py`
- `src/agent_tools.py`
- `src/app_initializer.py`
- `src/builtin_mcp.py`
- `src/caldav_sync.py`
- `src/chroma_client.py`
- `src/context_compactor.py`
- `src/deep_research.py`
- `src/embeddings.py`
- `src/mcp_manager.py`
- `src/memory_vector.py`
- `src/research_handler.py`
- `src/research_utils.py`
- `src/task_scheduler.py`
- `src/tool_execution.py`
- `src/tool_schemas.py`
- `src/tool_security.py`
- `src/visual_report.py`
---
## 01. Why This Repo Matters: A Self-Hosted ChatGPT With Batteries Included
> The hook: Odysseus packs chat, an agent, a hardware-aware model recommender, deep research, memory, email, calendar, and notes into one FastAPI app you can run on your own box — what is special, and what to notice before opening the code.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/01-why-this-repo-matters-a-self-hosted-chatgpt-with-batteries-included.md
- Generated: 2026-05-31T19:47:30.573Z
### Source Files
- `README.md`
- `pyproject.toml`
- `requirements.txt`
- `ACKNOWLEDGMENTS.md`
- `ROADMAP.md`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md)
- [ROADMAP.md](ROADMAP.md)
- [requirements.txt](requirements.txt)
- [docker-compose.yml](docker-compose.yml)
- [app.py](app.py)
- [services/hwfit/fit.py](services/hwfit/fit.py)
- [src/agent_loop.py](src/agent_loop.py)
- [src/deep_research.py](src/deep_research.py)
- [services/memory/memory_vector.py](services/memory/memory_vector.py)
</details>
# Why This Repo Matters: A Self-Hosted ChatGPT With Batteries Included
Odysseus is a single FastAPI app that boots a ChatGPT/Claude-style workspace on your own box and then keeps going: an agent loop, a hardware-aware model recommender, a multi-step deep-research engine, vector memory, IMAP/SMTP email triage, CalDAV calendar sync, notes, and a scheduler — all bundled with three companion services (ChromaDB, SearXNG, ntfy) in one `docker compose up`. There are plenty of "local ChatGPT UI" projects; what makes this one worth a closer look is how much *integrated* surface area lives behind one login. This page is a tour of what is special, what to notice before reading the code, and where the seams are.
## The hook: one repo, one process, ten surfaces
Read the README's feature list as a load-bearing claim, not marketing: the same Python process exposes chat, an agent, a model "Cookbook," deep research, model compare, a documents editor, memory/skills, email, notes/tasks, calendar, and assorted extras (image editor, theme editor, vision uploads, web search, presets, sessions, 2FA). Sources: [README.md:10-22]()
That breadth is the design. `app.py` is described in its own header as a "slim orchestrator" — the work is split into `core/` (auth, DB, middleware, constants), `routes/` (one router file per feature: chat, document, memory, model, calendar, email, notes, vault, webhooks, MCP, …), `services/` (docs, memory, search, hwfit, research, …), and `src/` (LLM core, agent loop, agent tools, chat processor). The `routes/` directory alone holds 47 router modules. Sources: [README.md:184-198](), [app.py:1-48]()
## Why it matters: an admin console, not a chat box
The README and `SECURITY.md` are unusually blunt about this: Odysseus has shell access, file uploads, model downloads, email/calendar integrations, web research, and API tokens — "treat it like an admin console." That sentence is the right mental model for the whole codebase. Sources: [README.md:125-136]()
You can see it in `app.py` itself. There is a hard-timeout middleware that aborts requests after `REQUEST_HARD_TIMEOUT` seconds with a 504, *except* on an explicit allowlist of long-running paths:
```python
# app.py
_TIMEOUT_EXEMPT_PREFIXES = (
"/api/chat", # streaming
"/api/shell/stream", # SSE
"/api/research", # multi-minute jobs
"/api/model/download", # tmux setup may run pip installs
"/api/model/probe", # SSE; iterates models ...
"/api/model-endpoints", # /probe sub-route ...
"/api/cookbook/setup", # remote pacman/apt installs
"/api/upload", # large files
"/api/image", # diffusion proxies (inpaint/harmonize/upscale/etc.)
)
```
Sources: [app.py:64-99]()
Auth is a layered middleware: cookie sessions for the UI, bearer tokens (`ody_…`) for integrations cached in-process with bcrypt verification on miss, a loopback-only `X-Odysseus-Internal-Tool` header so the agent can call admin-gated routes during a turn, and an opt-in `LOCALHOST_BYPASS` for dev. Bearer-token requests are pinned to `current_user = "api"` so they cannot ride normal cookie routes. Sources: [app.py:104-271]()
## The mechanism, one layer at a time
### Provider-neutral LLM plumbing
Chat is wired around `stream_llm()` / `stream_llm_with_fallback()` in `src/llm_core`, with model endpoints stored in the DB and auto-detected at startup. The README lists vLLM, llama.cpp, Ollama, OpenRouter, and OpenAI as adapters; a startup task probes `http://localhost:11434/v1/models` and, if Ollama is running, silently inserts a `ModelEndpoint` row called "Ollama (local)". Sources: [README.md:11](), [app.py:899-930](), [src/agent_loop.py:17]()
There is no hard-coded provider in the agent loop. The system prompt is provider-agnostic; tools are surfaced through `agent_tools` and an MCP manager that's loaded at startup with a 20-second timeout so a flaky MCP server cannot block the UI. Sources: [app.py:712-725](), [src/agent_loop.py:22-35]()
### The agent loop: fenced blocks + MCP
`src/agent_loop.py` (≈2,100 lines) wraps streaming completions in a multi-round tool-execution loop. The model emits tool calls as fenced code blocks; the loop parses, executes, formats, and feeds results back. The same loop integrates a built-in MCP client, with per-server "disabled tools" loaded from the `McpServer` table — so an operator can ban specific tools without uninstalling the server. Sources: [src/agent_loop.py:1-56]()
### Cookbook: a hardware-aware model recommender
The "what model fits on my GPU?" feature is more concrete than most readmes admit. `services/hwfit/fit.py` ships a hard-coded GPU memory-bandwidth table (RTX 50/40/30/20/16-series, H100/H200, A100, L40S, Radeon 7000/6000, MI300/250/210/100, AMD 9070) and a per-use-case scoring weight vector for `general / coding / reasoning / chat / multimodal / embedding / tts / stt`. Speed is estimated from `bandwidth / model_size` with a 0.55 efficiency factor, then adjusted for MoE (active params only) and CPU offload. Sources: [services/hwfit/fit.py:9-86]()
That table is why the Cookbook can recommend a quant and serve mode (vLLM vs llama.cpp) instead of just listing model names. It is also why operators on the wrong card get realistic numbers and not vendor-PR throughput.
### Deep Research: an iterative LLM-in-the-loop
`src/deep_research.py` describes itself as an "IterResearch-style" Think → Search → Extract → Synthesize loop, adapted from Alibaba's Tongyi DeepResearch (Apache-2.0). Round zero generates a research plan (sub-questions, key topics, success criteria) as JSON; subsequent rounds generate fresh queries against "what we know so far" and a round counter. The synthesis output flows into `src/visual_report.py`, which is why `markdown` is a hard core dependency, not optional. Sources: [src/deep_research.py:1-60](), [ACKNOWLEDGMENTS.md:33-39](), [requirements.txt:21-23]()
### Memory: ChromaDB + ONNX embeddings, with a fallback
Semantic memory lives in `services/memory/memory_vector.py` on a Chroma collection literally named `odysseus_memories`, with cosine HNSW and pre-computed embeddings from `EmbeddingClient`. `chromadb-client` and `fastembed` are in `requirements.txt` with an explicit comment that they're "installed by default — the app still degrades to keyword fallback if they're ever missing." That graceful-degrade pattern shows up again in the roadmap's "Better degraded-state reporting" item. Sources: [services/memory/memory_vector.py:1-50](), [requirements.txt:13-19](), [ROADMAP.md:19]()
### Email, calendar, notes, tasks
Email is IMAP/SMTP with per-account routing and AI triage (urgency, auto-tag, auto-summary, auto-reply, auto-spam) — see the polling/handler split in `routes/email_pollers.py` and `routes/email_helpers.py`. Calendar uses `icalendar` for `.ics` import/export and the `caldav` library for PROPFIND/REPORT sync against Radicale, Nextcloud, Apple, and Fastmail; `ACKNOWLEDGMENTS.md` notes that `caldav` is dual-licensed and used under Apache-2.0 specifically to keep the core permissive. Notes/tasks ship with a `croniter`-based scheduler and ntfy/browser/email channels. Sources: [requirements.txt:24-35](), [ACKNOWLEDGMENTS.md:111-115](), [ACKNOWLEDGMENTS.md:139-155]()
## Surprising details a README reader will miss
- **It's not just MIT throughout — license hygiene is explicit.** `pypdf` (BSD) replaces `chardet` (LGPL) for PDF text, `charset-normalizer` (MIT) replaces chardet, and PyMuPDF (AGPL) is quarantined to `src/pdf_forms.py` and the optional requirements file so the MIT core can run without it. AGPL only "activates" if you install the optional form-filling feature. Sources: [ACKNOWLEDGMENTS.md:139-155]()
- **Three startup warm-ups happen in parallel before the first request.** A background-job monitor (re-invokes the agent when a `#!bg` shell job completes), an MCP `register_builtin_servers` + `connect_all_enabled` with a 20-second budget, a RAG tool-index warmup that pre-loads embeddings + opens Chroma + indexes built-in tools, and an LLM endpoint warmup ping (looped every 60s). This is the kind of work an "MVP" usually skips. Sources: [app.py:700-773]()
- **Static assets are served with `Cache-Control: no-cache` for `.js/.css/.html`.** A custom `_RevalidatingStatic` mount forces conditional revalidation because the app ships raw ES modules with no build step or versioned URLs — without it, deploys would not show up until a manual hard refresh. Sources: [app.py:277-293]()
- **Bearer tokens are cached in-process with explicit invalidation.** `_token_cache` maps prefix → list of `(id, hash, owner, scopes)`. The DB rebuild is triggered by `app.state._token_cache_dirty`, flipped by API-token routes on create/revoke. `last_used_at` is updated fire-and-forget on a `to_thread` task so bcrypt + DB write doesn't sit on the request. Sources: [app.py:136-250]()
- **Nightly "skill audit" job.** At ~02:00 local, the scheduler tests + judges the least-recently-checked entries in the skill library and auto-fixes/escalates weak ones (never deletes). Gated by `skill_audit_nightly`, hour by `skill_audit_hour`, batch by `skill_audit_batch`. Sources: [app.py:875-898]()
- **Disk-backed skill files get repaired on boot.** Orphaned/`test-owner` `SKILL.md` files get reassigned to the primary admin so a fresh deploy doesn't see an empty library. A periodic null-owner sweep does the same for anything created while auth was disabled or localhost-bypassed. Sources: [app.py:822-868]()
## Tradeoffs the README and ROADMAP openly admit
ROADMAP.md is unusually honest about where the seams are: "SQUASH BUGS" is item 1, Cookbook reliability "across different machines, GPUs, drivers, shells, and Python environments" is called out as the area most likely to break, popup/dropdown placement inside transformed modals is a known brute-force fix, and `static/style.css` is "basically Calypso's island atm." The author also notes that "most of Odysseus's code was written *with* AI models, not just by a human." That context matters when reading the code — expect lots of inline comments explaining *why*, lots of route files, and the occasional load-bearing comment a maintainer left for themselves. Sources: [ROADMAP.md:8-30](), [ACKNOWLEDGMENTS.md:158-168]()
The security posture is the other big tradeoff. The README repeats it three times in one section: keep `AUTH_ENABLED=true`, don't expose to the public internet without HTTPS + a trusted reverse proxy, and review per-user privileges before exposing a deployment. The non-admin default is sensible (no shell/Python/file read-write, admin-gated MCP/API-token/webhook/cookbook/backup/settings), but the surface area is real. Sources: [README.md:125-149]()
## How the pieces fit (one diagram)
```mermaid
flowchart LR
subgraph Client["Browser PWA"]
UI["static/index.html + js/"]
end
subgraph Process["FastAPI process (app.py)"]
MW["Middleware:\nSecurityHeaders + Timeout + Auth\n(cookie / Bearer ody_… / loopback)"]
subgraph Routes["routes/ (47 router files)"]
Chat["chat_routes"]
Doc["document_routes"]
Mem["memory_routes"]
Cal["calendar_routes"]
Mail["email_routes"]
Cook["cookbook_routes / hwfit_routes"]
Res["research_routes"]
end
subgraph SrcCore["src/ engine"]
Loop["agent_loop.py\n(stream + tool rounds)"]
Tools["agent_tools / tool_index\n+ MCP manager"]
LLM["llm_core (provider-neutral)"]
DR["deep_research.py\n(IterResearch loop)"]
end
subgraph Svc["services/"]
Hw["hwfit/ fit + hardware\n(GPU bandwidth table)"]
MemSvc["memory/ memory_vector\n(ChromaDB)"]
Search["search/ (SearXNG client)"]
end
end
subgraph Companions["Compose-bundled"]
Chroma["ChromaDB :8000"]
Sx["SearXNG :8080"]
Ntfy["ntfy :80"]
end
subgraph External["External / BYOK"]
Provs["LLM endpoints\nvLLM / llama.cpp / Ollama /\nOpenAI / OpenRouter / Anthropic / Gemini"]
IMAP["IMAP / SMTP"]
DAV["CalDAV (Radicale / Nextcloud / Apple / Fastmail)"]
end
UI --> MW --> Routes
Routes --> SrcCore
SrcCore --> Svc
MemSvc --> Chroma
Search --> Sx
Routes --> Ntfy
LLM --> Provs
Mail --> IMAP
Cal --> DAV
```
Sources: [README.md:184-198](), [app.py:104-271](), [docker-compose.yml:1-79](), [src/agent_loop.py:17-35](), [services/memory/memory_vector.py:14-50]()
## What builders should notice
If you are evaluating Odysseus to fork, extend, or just borrow ideas:
| What | Where | Why it's worth a closer look |
|---|---|---|
| Provider-neutral LLM core | `src/llm_core.py`, `src/model_discovery.py`, `app.py:899-930` | Same code path serves vLLM, llama.cpp, Ollama, and OpenAI-compatible HTTP. No vendor lock. |
| Fenced-block agent loop with MCP | `src/agent_loop.py` | Multi-round tool execution that works with any model that can write code fences, plus an MCP manager for external tools. |
| Hardware-aware recommender | `services/hwfit/fit.py`, `services/hwfit/hardware.py` | Concrete bandwidth table + quant-aware speed/quality model. Reusable outside the UI via `scripts/odysseus-cookbook`. |
| Iterative research engine | `src/deep_research.py`, `services/research/` | A real Plan → Search → Extract → Synthesize loop; reuses the same LLM/search infra. |
| ChromaDB + fastembed memory with keyword fallback | `services/memory/memory_vector.py`, `requirements.txt:13-19` | Graceful degrade is a design principle here, not an afterthought. |
| Auth + token caching pattern | `app.py:136-271` | Worth reading even if you don't use the rest — it's a clean example of bcrypt-cache + invalidation + loopback bypass for in-process tools. |
| Bundled-but-detachable companion services | `docker-compose.yml`, `ACKNOWLEDGMENTS.md:42-53` | ChromaDB, SearXNG, ntfy are pulled as upstream images, not vendored. You can swap or remove them. |
| Honest roadmap | `ROADMAP.md` | The author flags where the code is brittle. Take it at face value. |
The closing summary: Odysseus is interesting less because any one of its features is novel and more because the integration cost — auth, scheduling, memory, search, tools, providers, calendar, email — has actually been paid in one repo, under a permissive core license, on a stack you can run on a laptop or a home server. If you're considering self-hosted AI workspaces, it's worth at least a `docker compose up` before you build your own.
---
## 02. From Clone to Localhost: The Bundled Stack
> Docker Compose, env defaults, and bundled sidecars (ChromaDB, SearXNG, ntfy) — the non-obvious pieces of the install path, including the auto-generated admin password and the SSH key flow for Cookbook remote servers.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/02-from-clone-to-localhost-the-bundled-stack.md
- Generated: 2026-05-31T19:47:51.023Z
### Source Files
- `docker-compose.yml`
- `Dockerfile`
- `.env.example`
- `setup.py`
- `install-service.sh`
- `odysseus-ui.service`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [docker-compose.yml](docker-compose.yml)
- [Dockerfile](Dockerfile)
- [docker/entrypoint.sh](docker/entrypoint.sh)
- [.env.example](.env.example)
- [setup.py](setup.py)
- [install-service.sh](install-service.sh)
- [odysseus-ui.service](odysseus-ui.service)
- [routes/cookbook_routes.py](routes/cookbook_routes.py)
- [config/searxng/settings.yml](config/searxng/settings.yml)
- [README.md](README.md)
</details>
# From Clone to Localhost: The Bundled Stack
Most "self-hosted AI workspace" repos punt on the rough edges: the vector database, the web search backend, the notification fan-out, the SSH key for talking to a remote GPU box. Odysseus bundles all of that into a single `docker compose up` and tries to make the rest invisible. This page walks through what actually starts when you bring the stack up, where the defaults come from, and the two install-path details that surprise people: the auto-generated admin password and the SSH key that Odysseus minted for itself inside the container.
The whole install path is unusually compressed — four services, one Dockerfile, one `.env.example`, one setup script, and one systemd unit — but each piece carries non-obvious load. If you skim them, the system looks generic; if you read them carefully, you see why the defaults are the way they are.
## What Compose Actually Starts
`docker compose up -d --build` brings up four containers in one network. Three of them are sidecars the app talks to over the Compose-internal DNS; the fourth is Odysseus itself.
| Service | Image | Container port | Host port | Purpose |
|---|---|---|---|---|
| `odysseus` | built from `./Dockerfile` | 7000 | `7000` | FastAPI app (uvicorn) |
| `chromadb` | `chromadb/chroma:latest` | 8000 | `8100` | Vector store for semantic memory |
| `searxng` | `searxng/searxng:latest` | 8080 | `127.0.0.1:8080` | Meta-search backend for web search |
| `ntfy` | `binwiederhier/ntfy` | 80 | `8091` | Local push-notification server |
Two routing details are worth knowing. SearXNG is intentionally bound to `127.0.0.1:8080` on the host, so it is not reachable from the LAN even though it's exposed on a port — only Odysseus inside Compose talks to it. ChromaDB is exposed on `8100` (not `8000`) on the host because port 8000 is a common collision with locally hosted model servers; inside the Compose network, Odysseus connects to `chromadb:8000` directly.
The Odysseus container itself depends on `searxng: service_healthy` and `chromadb: service_started`, where SearXNG ships its own HTTP probe via `urllib.request.urlopen` against `/` with a five-second timeout, retried up to 20 times. That gate matters: without it, the app warms up its search subsystem against a SearXNG that's still booting and logs spurious "DEGRADED" lines.
```yaml
healthcheck:
test: ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8080/', timeout=5).read(1)\""]
interval: 5s
timeout: 6s
retries: 20
start_period: 10s
```
Sources: [docker-compose.yml:1-79](), [README.md:174-180]()
## Env Defaults and the In-Network Overrides
`.env.example` documents the user-facing knobs; `docker-compose.yml` quietly overrides three of them so the container network "just works" without the user editing anything.
| Variable | `.env.example` default | Compose override |
|---|---|---|
| `SEARXNG_INSTANCE` | `http://localhost:8080` | `http://searxng:8080` |
| `CHROMADB_HOST` | `localhost` | `chromadb` |
| `CHROMADB_PORT` | `8100` | `8000` |
The host-side defaults are tuned for a manual install (uvicorn run directly on macOS or Linux, with `docker run -p 8100:8000 chromadb/chroma`). In Compose the in-network hostnames are correct because containers resolve each other by service name on the default user-defined bridge.
Anything else in `.env.example` is genuinely optional: `LLM_HOST` defaults to `localhost`, `LOCALHOST_BYPASS` defaults to `false` (the loopback auth bypass is dev-only), `AUTH_ENABLED` defaults to `true`, embeddings fall back to a local `fastembed` ONNX model if no HTTP endpoint is configured, and the in-process schedulers can be turned off with `ODYSSEUS_INPROCESS_POLLERS=0` / `ODYSSEUS_INPROCESS_TASKS=0` when an external cron drives them. The README's "defaults work out of the box" claim is load-bearing: you can `cp .env.example .env` and run.
Sources: [.env.example:1-103](), [docker-compose.yml:17-30](), [README.md:160-180]()
## The PUID/PGID Footgun, Solved at Entrypoint
The most subtle piece of the install path is the Docker entrypoint. The Dockerfile installs `gosu` alongside the usual system deps (`build-essential`, `cmake`, `git`, `tmux`, `openssh-client`, `nodejs`, `npm`) and then `COPY`s an entrypoint shell script that fixes a classic self-host bug: a container running as root writes root-owned files into bind-mounted host directories, and the host user (or any non-root caller) then can't update them. The result is silent EPERM failures on skill extraction, prefs saves, and mail attachments.
The entrypoint resolves `PUID`/`PGID` (defaulting to `1000:1000`, the typical first-user UID on Linux), reuses an existing matching user/group if one already exists in `/etc/passwd`, and otherwise creates `odysseus` with the right ids. Then it walks `/app`, `/app/data`, and `/app/logs` with `find ... -not -uid "$PUID"` and chowns only files that need fixing — keeping startup `O(touched-files)` rather than `O(everything)`, so terabyte-sized maildirs don't slow startup. Finally it `exec`s the actual command under `gosu` so signals from `docker stop` reach uvicorn directly without an extra shell layer.
```sh
for dir in /app /app/data /app/logs; do
if [ -d "$dir" ]; then
find "$dir" -not -uid "$PUID" -print0 2>/dev/null \
| xargs -0 -r chown "$PUID:$PGID" 2>/dev/null || true
fi
done
exec gosu "$PUID:$PGID" "$@"
```
If your host user isn't UID 1000, you override `PUID=...`/`PGID=...` in `.env` (the compose file forwards them with `${PUID:-1000}` defaults). The trick is that the chown sweep covers not just the bind mounts but also paths the app writes to inside the image's source tree at runtime — `services/cache/{search,content}/*`, `services/search_analytics.json`, the TTS cache — which were created as root during `docker build`.
Sources: [Dockerfile:1-47](), [docker/entrypoint.sh:1-52](), [docker-compose.yml:21-30]()
## The Auto-Generated Admin Password
There is no preset password. There is also no manual setup step on the manual install path — `python setup.py` does it for you, and if `ODYSSEUS_ADMIN_PASSWORD` isn't set in the environment, it generates an 18-byte URL-safe random password with `secrets.token_urlsafe(18)`, bcrypt-hashes it, writes `data/auth.json`, and prints the plaintext to stdout exactly once:
```python
username = os.getenv("ODYSSEUS_ADMIN_USER", "admin").strip() or "admin"
password = os.getenv("ODYSSEUS_ADMIN_PASSWORD") or __import__("secrets").token_urlsafe(18)
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
auth_data = {
"users": {
username: {
"password_hash": hashed,
"is_admin": True,
}
}
}
```
`setup.py` is idempotent — if `data/auth.json` already exists, it prints `[skip] auth.json already exists` and does nothing. The Docker path takes a different route: the `Dockerfile`'s `CMD` is `uvicorn app:app --host 0.0.0.0 --port 7000`, so `setup.py` doesn't run inside the container. Instead, `core/auth.py` boots with `self._config = {}` and logs `"No auth config found — first-run setup required"`; the UI then routes you to `/api/auth/setup`, which calls `auth_manager.setup(username, password)` only while `is_configured` is false (rate-limited at 3 requests per 5 minutes per host).
So the two install paths produce two different first-login experiences:
```text
Manual (python setup.py) Docker (compose up)
------------------------- ----------------------------
generates random password no auth.json on disk
prints once on stdout UI shows first-run setup page
writes data/auth.json POST /api/auth/setup creates admin
log in with printed pwd log in with the password you typed
```
`ODYSSEUS_ADMIN_PASSWORD` in `.env` is a third option that pre-seeds the password on either path. The README is explicit that you'd only do that if you don't want the generated one shown on the manual install.
Sources: [setup.py:46-75](), [core/auth.py:60-76](), [routes/auth_routes.py:78-90](), [README.md:38-46]()
## The Cookbook SSH Key Flow
Cookbook is Odysseus's hardware-aware model fitter and runner. When you point it at a remote GPU server, it talks to that box over SSH — and because the app runs inside Docker, "Cookbook's home directory" isn't yours. The compose file mounts `./data/ssh` from the host to `/app/.ssh` inside the container, and the Cookbook routes manage a dedicated `ed25519` key inside that directory.
The key is generated lazily by an admin-only POST to `/api/cookbook/ssh-key`:
```python
def _cookbook_ssh_dir() -> Path:
app_ssh = Path("/app/.ssh")
if Path("/app").exists():
return app_ssh
return Path.home() / ".ssh"
def _cookbook_ssh_key_path() -> Path:
return _cookbook_ssh_dir() / "id_ed25519"
```
```python
proc = await asyncio.create_subprocess_exec(
"ssh-keygen", "-t", "ed25519", "-N", "", "-C", "odysseus-cookbook", "-f", str(key_path),
...
)
```
Three details matter for operators. First, the key has no passphrase (`-N ""`) and a recognisable comment (`-C "odysseus-cookbook"`), so when you see it in a remote `authorized_keys` file you know exactly who it is. Second, after generation the route locks down perms with `chmod 700` on the directory, `600` on the private key, and `644` on the public key — necessary because the bind mount inherits whatever the host gave you, which OpenSSH typically rejects. Third, the route auto-detects whether it's running in Docker: if `/app` exists, the key lives under `/app/.ssh` (the bind mount); otherwise it falls back to `~/.ssh` on the manual-install path.
The README also documents installing the key from the host without going through the UI:
```bash
ssh-copy-id -i data/ssh/id_ed25519.pub user@server
```
The local Hugging Face cache uses the same bind-mount trick: `./data/huggingface` on the host maps to `/app/.cache/huggingface` in the container, so Cookbook's "local" downloads survive image rebuilds.
Sources: [routes/cookbook_routes.py:206-255](), [docker-compose.yml:6-14](), [README.md:55-65]()
## SearXNG: The Smallest Config That Works
The bundled SearXNG instance carries the minimum config necessary to make it useful as a programmatic backend:
```yaml
use_default_settings: true
server:
secret_key: "odysseus-local-searxng-json-2026-05-30"
search:
formats:
- html
- json
```
The `formats: [html, json]` line is the operative one — without it SearXNG returns HTML only and Odysseus can't parse results. The secret key is hard-coded because the instance is bound to `127.0.0.1` and only reached from inside the Compose network; rotating it doesn't buy security in that topology. If you front Odysseus with a public reverse proxy, this is still fine since SearXNG never gets exposed.
Sources: [config/searxng/settings.yml:1-9](), [docker-compose.yml:48-63]()
## The Systemd Path (Option Three)
For non-Docker installs the repo ships `odysseus-ui.service` and a tiny installer. The unit file is a template — `User`, `WorkingDirectory`, and `ExecStart` all carry literal `YOURUSER` placeholders you must edit:
```ini
User=YOURUSER
WorkingDirectory=/home/YOURUSER/odysseus-ui
ExecStart=/home/YOURUSER/odysseus-ui/venv/bin/uvicorn app:app --port 8000 --host 0.0.0.0
Restart=always
RestartSec=3
EnvironmentFile=-/home/YOURUSER/odysseus-ui/.env
```
`install-service.sh` is a five-command wrapper: copy the unit into `/etc/systemd/system/`, `daemon-reload`, `enable`, `start`, `status`. Two oddities worth flagging: the unit defaults to port `8000` (not `7000` like the Docker path), and `EnvironmentFile` is prefixed with `-` so a missing `.env` won't fail the unit. There is no port-binding restriction, so you should pair it with a reverse proxy if you bind `0.0.0.0` — the SECURITY notes in `README.md:138-149` lean on Caddy with auto-renewed Let's Encrypt certs.
Sources: [install-service.sh:1-20](), [odysseus-ui.service:1-19]()
## What Builders Should Notice
Three patterns in this install path are reusable:
- **The PUID/PGID entrypoint** is the cleanest fix for the bind-mount footgun I've seen in a small project — `gosu` plus a one-pass chown that only touches files with the wrong uid. It is a much better default than running as root or shipping a UID-baked image.
- **The dual-default trick** — `.env.example` is tuned for the manual install path, and Compose overrides the in-network names — gives both audiences a working config without conditionals or templating.
- **The auto-detecting SSH-key location** in `routes/cookbook_routes.py` is a small but important nicety: the same code path generates `~/.ssh/id_ed25519` on bare-metal installs and `/app/.ssh/id_ed25519` in Docker, with no env var to set. It is the kind of thing that costs ten lines and saves a thousand support tickets.
What's still on the user: writing the public key to your remote GPU box (Cookbook prints it in the UI, but doesn't push it for you), and overriding `PUID`/`PGID` if your host UID isn't 1000. Everything else — the vector store, the search backend, the notification server, the admin password, the SSH key — is generated, mounted, and chown'd for you before uvicorn binds the port.
Sources: [docker/entrypoint.sh:1-52](), [routes/cookbook_routes.py:206-255](), [setup.py:46-75]()
---
## 03. The FastAPI Monolith: app.py and the 45-Router Mount
> How a single ~950-line app.py wires authentication, middleware, and ~45 route modules into one server — and what the routes/ directory reveals about the surface area (chat, sessions, documents, memory, MCP, webhooks, vaults, and more).
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/03-the-fastapi-monolith-app.py-and-the-45-router-mount.md
- Generated: 2026-05-31T19:47:25.315Z
### Source Files
- `app.py`
- `core/middleware.py`
- `core/auth.py`
- `core/database.py`
- `core/session_manager.py`
- `src/app_initializer.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [app.py](app.py)
- [core/middleware.py](core/middleware.py)
- [core/auth.py](core/auth.py)
- [core/database.py](core/database.py)
- [core/session_manager.py](core/session_manager.py)
- [src/app_initializer.py](src/app_initializer.py)
- [routes/chat_routes.py](routes/chat_routes.py)
</details>
# The FastAPI Monolith: app.py and the 45-Router Mount
Odysseus is a self-hosted, multi-modal AI workstation, and almost every HTTP entry point flows through a single `app.py` file. At ~957 lines it is unapologetically a monolith: one `FastAPI` instance, three custom middlewares stacked above the framework's CORS handler, a single auth gate that understands both browser session cookies and `Bearer ody_` API tokens, and roughly 45 `setup_*_routes(...)` factories pulled from the `routes/` directory and mounted with `app.include_router(...)`. If you want to know what this product *does*, you read `app.py` top-to-bottom and the answer is right there.
This page walks the seam between `app.py` and `routes/` — how the orchestrator wires authentication, security headers, request timeouts, and per-feature routers into one server, what the route filenames tell you about the product surface, and the small but pointed engineering details (a hand-rolled API-token cache, an internal-tool loopback escape hatch, a per-request CSP nonce, two named ghost sessions purged at startup) that turn an otherwise vanilla FastAPI app into something specific to this codebase.
## The shape of `app.py`
`app.py` calls itself a "slim orchestrator" in its first comment, which is a generous reading: it is slim only in the sense that the heavy logic lives elsewhere. The file is laid out as a single linear script — no factory, no `create_app()` — punctuated by `# =========` banner comments that double as a table of contents:
```
# ========= LOGGING =========
# ========= APP =========
# ========= CORS =========
# ========= SECURITY HEADERS MIDDLEWARE =========
# ========= REQUEST TIMEOUT (FALLBACK FOR HUNG HANDLERS) =========
# ========= AUTH =========
# ========= STATIC FILES =========
# ========= GENERATED IMAGES =========
# ========= YOUTUBE INIT =========
# ========= RAG (vector document RAG — DISABLED) =========
# ========= IMPORT CONFIG =========
# ========= COMPONENT INITIALIZATION =========
# ========= EXCEPTION HANDLERS =========
# ========= WEBHOOK MANAGER =========
# ========= INCLUDE ROUTERS =========
# ========= ROUTES (kept in app.py) =========
# ========= LIFECYCLE =========
```
The mental model is: configure the app, plug in the middleware stack, build the dependency graph in one shot via `initialize_managers(...)`, then `include_router` every feature. The `@app.on_event("startup")` handler at the bottom owns everything that has to happen *after* the server is accepting traffic — MCP server discovery, background-job monitoring, scheduled task runner, a keep-alive loop for upstream LLM endpoints, an Ollama auto-detector, and a nightly skill-audit loop.
Sources: [app.py:1-48](app.py), [app.py:599-672](app.py), [app.py:675-931](app.py)
## The middleware sandwich
Three custom middlewares sit on top of FastAPI's `CORSMiddleware`, in the order they are added (which in Starlette means the *reverse* order of execution — the last added wraps first):
```
request
└─► AuthMiddleware (cookie/bearer gate, exempt list)
└─► _RequestTimeoutMiddleware (45s default, path-exempt for streams)
└─► SecurityHeadersMiddleware (per-request CSP nonce)
└─► CORSMiddleware
└─► route handler
```
`SecurityHeadersMiddleware` generates a fresh `secrets.token_hex(16)` nonce on every request, attaches it to `request.state.csp_nonce`, and templates it into the page's `script-src` directive. The `_serve_html_with_nonce` helper in `app.py` substitutes the nonce into the literal token `{{CSP_NONCE}}` inside `static/index.html` before sending it, so inline scripts the SPA needs can run while arbitrary injected scripts cannot. Tool-render iframes (`/api/tools/.../render`) and self-contained research reports (`/api/research/report/...`) get different CSPs entirely — the iframes lean on `sandbox="allow-scripts"` for isolation and the reports relax `script-src` to `'unsafe-inline'` because they're standalone HTML artifacts.
`_RequestTimeoutMiddleware` is the most pragmatic piece in the file. The comment explains why it exists: *"Without this, a single hung `subprocess.run` or missing-timeout `httpx` call locks up the entire server for everyone."* The default cap is 45 seconds (`REQUEST_HARD_TIMEOUT`), and there is a hard-coded prefix exemption list for endpoints that legitimately stream or run for minutes — `/api/chat`, `/api/shell/stream`, `/api/research`, `/api/model/download`, `/api/cookbook/setup`, `/api/upload`, image diffusion proxies, and the model-probe SSE. Everything else gets `asyncio.wait_for` wrapped around it and returns a 504 on timeout.
Sources: [app.py:51-102](app.py), [app.py:601-617](app.py), [core/middleware.py:47-100](core/middleware.py)
## Auth: cookies, bearer tokens, and a loopback bypass
The `AuthMiddleware` defined inline in `app.py:163-266` is the single gate every request passes through (it's only added when `AUTH_ENABLED` is true, and it can be wholly disabled with the env var). It handles three credential types and an explicit dev/test exemption:
| Path | What it does | Code |
|---|---|---|
| Exact-match exemption | `/login`, `/api/health`, `/api/version`, `/api/auth/*` setup/login/status, etc. pass through unauthenticated. | `AUTH_EXEMPT_EXACT` / `_is_auth_exempt` |
| Loopback internal-tool token | If the request comes from `127.0.0.1`/`::1` *and* carries `X-Odysseus-Internal-Token` matching the per-process secret, it is admitted as `internal-tool` — or impersonates the user from `X-Odysseus-Owner`. | `app.py:172-188`, `core/middleware.py:16-17` |
| `LOCALHOST_BYPASS=true` | Lets unauthenticated localhost requests through. Off by default; meant to be off when fronted by a reverse proxy or Tailscale Funnel. | `app.py:110, 191-194` |
| `Bearer ody_...` API token | Bcrypt-checks against an in-memory prefix cache, attributes the request to the token's owner. | `app.py:202-254` |
| Session cookie | The default browser path — `auth_manager.validate_token(cookie)` decides. | `app.py:257-266` |
The API-token path is where there's real engineering. Tokens are 51 bytes (`ody_` + 43 base64 chars), and naively each request would do a full DB scan plus a bcrypt verify per row. Instead, `_token_cache` is a `dict[prefix → list[(id, hash, owner, scopes)]]` that's lazily refreshed when a `_token_cache_dirty` flag is set. The flag is bumped by `app.state.invalidate_token_cache`, which `routes/api_token_routes` calls on create/revoke. Inside the request handler, bcrypt is still run, but only against candidates whose 8-char prefix matches the incoming token — so it's O(collisions-in-prefix), effectively O(1). `last_used_at` is updated fire-and-forget via `asyncio.create_task(_touch_last_used(...))` so the request doesn't wait on the extra commit.
The "internal-tool" loopback deserves a callout: when the in-app agent makes tool calls that need admin-gated routes (e.g. creating a note in another user's account), it doesn't have a session cookie, so it loops back via HTTP with the per-process `INTERNAL_TOOL_TOKEN` from `core/middleware.py`. The token is regenerated on every process start via `secrets.token_hex(32)` if `ODYSSEUS_INTERNAL_TOKEN` isn't set, and `require_admin()` accepts either the header directly or a request whose middleware already stamped `request.state.current_user = "internal-tool"`. The impersonation header (`X-Odysseus-Owner`) was added so agent-created records land with the *user's* ownership instead of being orphaned to a generic `internal-tool` owner.
Sources: [app.py:104-271](app.py), [core/middleware.py:12-44](core/middleware.py)
## The dependency graph in one call
Component construction is centralized in `src/app_initializer.py:initialize_managers(...)`. `app.py` calls it once, unpacks the returned dict into module globals, and from then on every router factory receives explicit dependencies. There is no FastAPI `Depends()` graph here — it's plain constructor injection, with each `setup_*_routes(...)` taking whatever managers it needs and returning an `APIRouter`.
```
┌────────────────────────────────────────┐
│ src/app_initializer.initialize_managers │
└────────────────────────────────────────┘
│
┌────────────┬───────────┼───────────┬────────────────┐
▼ ▼ ▼ ▼ ▼
MemoryManager SkillsManager SessionManager UploadHandler PresetManager
│ │ │ │ │
└────────┬───┴───────────┼───────────┴──────┬─────────┘
▼ ▼ ▼
ChatProcessor ChatHandler ModelDiscovery
│ │
└───── plus MemoryVectorStore (Chroma, optional) ─┘
```
Two intentional fragilities are worth noting. `rag_manager` is hard-wired to `None` (with `rag_available = False`) in `app.py:354-356` because the ChromaDB vector-document RAG was never indexed in practice and its 1.4.1 client cost ~30s of startup time. Personal-doc routes still receive the `None` and degrade cleanly because every consumer guards on `rag_available`. The `MemoryVectorStore` (also Chroma-backed, but for memory rather than documents) is initialized in `app_initializer.py:55-74` and is kept — when healthy, it rebuilds its index from existing memories on first start.
Sources: [app.py:344-385](app.py), [src/app_initializer.py:32-114](src/app_initializer.py)
## The 45-router mount: what the surface area reveals
`app.py:412-597` calls `app.include_router(...)` exactly 40 times (with some files exposing helpers rather than routers, the `routes/` directory totals 47 Python files — ~45 of them are real route modules). The pattern is uniform: every module exposes a `setup_<feature>_routes(...)` factory that returns an `APIRouter`, instantiated with whatever managers the feature needs:
```python
# routes/chat_routes.py:90-101
def setup_chat_routes(
session_manager,
chat_handler,
chat_processor,
...
) -> APIRouter:
router = APIRouter(tags=["chat"])
```
This is the entire abstraction. There is no plugin registry, no automatic discovery, no decorator-based registration — just a literal list of imports and `include_router` calls in `app.py`. Adding a feature is an explicit, single-file diff to the orchestrator, which is why a fresh reader can know the system's full HTTP surface by scrolling one screen.
Grouping the routers by responsibility shows what kind of product this actually is:
| Group | Routers | What it covers |
|---|---|---|
| Auth & identity | `auth_routes`, `api_token_routes`, `prefs_routes` | First-run setup, login/logout, OAuth-like API tokens (`ody_...`), per-user preferences. |
| Chat core | `chat_routes`, `chat_helpers`, `session_routes`, `history_routes`, `compare_routes` | Streaming chat, sessions, message history, A/B model comparison. |
| Knowledge & memory | `memory_routes`, `skills_routes`, `personal_routes`, `embedding_routes`, `note_routes` | Long-term memory, skill packs (Markdown SKILL.md files), personal documents, notes/todos. |
| Documents & media | `document_routes`, `upload_routes`, `gallery_routes`, `signature_routes`, `editor_draft_routes`, `emoji_routes`, `font_routes` | Canvas/artifact documents, file uploads, image gallery + drafts, sig stamps, Twemoji proxy, custom fonts. |
| Voice & vision | `tts_routes`, `stt_routes` | Text-to-speech, speech-to-text. |
| Models & infra | `model_routes`, `cookbook_routes`, `hwfit_routes`, `diagnostics_routes`, `cleanup_routes`, `backup_routes`, `admin_wipe_routes` | Endpoint discovery, model download/serve via the "cookbook", hardware fit calculator, diagnostics, danger-zone wipes, export/import. |
| Integrations | `mcp_routes`, `webhook_routes`, `email_routes`, `email_pollers`, `calendar_routes`, `contacts_routes`, `vault_routes`, `search_routes`, `research_routes` | Model Context Protocol servers, webhooks, IMAP email + pollers, CalDAV, CardDAV, password vault, web search, deep research. |
| Agent & automation | `assistant_routes`, `task_routes`, `shell_routes`, `preset_routes` | Personal assistant, scheduled tasks, user-facing shell exec, preset prompts. |
The mix tells the story: this isn't a chat wrapper, it's a personal AI workstation that has accreted features into one server — notes, calendar, contacts, password vault, image gallery, shell, scheduled tasks, MCP, webhooks. Every one of those gets a router, and every router is mounted explicitly in `app.py`.
Sources: [app.py:412-597](app.py), [routes/chat_routes.py:90-101](routes/chat_routes.py)
## Routes that live in `app.py` itself
A small handful of routes are *not* factored out to the `routes/` directory:
- `/` and a dozen tool deep-links (`/notes`, `/calendar`, `/cookbook`, `/email`, `/memory`, `/gallery`, `/tasks`, `/library`) all serve the same `static/index.html` SPA — the client-side router reads `window.location.pathname` and auto-opens the matching modal. The deep-link comment explains the bookmark-UX motivation: each route can pin a unique favicon and title.
- `/backgrounds` and `/login` serve standalone HTML files.
- `/api/health` and `/api/version` are kept inline because they are tiny and must be exempt from auth anyway.
- `/api/generated-image/{filename}` is a single ownership-checked file server: filenames must match a content-hash regex, ownership is verified against the `GalleryImage` table (treating an empty row as "not yet imported, allow"), and the response sets `Cache-Control: public, max-age=31536000, immutable` because the bytes for a content-hash filename never change.
Sources: [app.py:296-342](app.py), [app.py:601-671](app.py)
## The startup choreography
`startup_event` runs after the socket is open, which is deliberate — anything slow or network-bound goes here as a background task, so the UI starts serving immediately. A list under `app.state._startup_tasks` holds strong references so the GC doesn't reap fire-and-forget tasks before they finish. The notable startup work:
| Stage | Why | Source |
|---|---|---|
| Purge `Nobody` and `Incognito` sessions | They're ephemeral by design and must not survive a restart. | `app.py:682-696` |
| `start_bg_monitor()` | Always-on monitor that re-invokes the agent when `#!bg` shell jobs finish. | `app.py:706-709` |
| `register_builtin_servers` + `mcp_manager.connect_all_enabled()` (20s cap) | MCP server discovery is wrapped in `wait_for(..., 20)` because local tooling can block it. | `app.py:712-725` |
| Tool index pre-warm | Loads the local embedding model + ChromaDB and runs a dummy query so the *first* user message doesn't pay the 1-3s cost. | `app.py:732-743` |
| LLM endpoint warmup + 60s keep-alive | Pings each endpoint's `/models` to prime connections and prevent cold starts. | `app.py:745-773` |
| `_ensure_default_tasks` | Reconciles built-in scheduled tasks per user; the comment notes this also sweeps stale demo/deleted-user rows that would otherwise fire forever. | `app.py:775-820` |
| Skill owner backfill | Disk-backed Markdown skills aren't covered by the DB legacy-owner sweep, so ownerless `SKILL.md` files are assigned to the primary admin. | `app.py:825-842` |
| `task_scheduler.start()` (gated) | Skipped when `ODYSSEUS_INPROCESS_TASKS=0` so an external cron worker can drive task firing instead. | `app.py:847-854` |
| Hourly null-owner sweep | Re-runs the legacy-owner assignment so data created while auth was disabled gets claimed by the admin instead of staying world-visible. | `app.py:858-868` |
| Nightly skill audit | At ~02:00 local, the LLM judges and auto-fixes a batch of stale skills. | `app.py:875-898` |
| Ollama auto-detect | If the `ollama` binary is on PATH and `localhost:11434` responds, an endpoint row is auto-added. | `app.py:900-930` |
`shutdown_event` is comparatively boring — cancel the upload-cleanup task, stop the scheduler, close the webhook manager, disconnect MCP servers — but everything is wrapped in try/except so a single misbehaving subsystem can't block a clean process exit.
Sources: [app.py:675-957](app.py)
## What builders should take away
The interesting choice in this codebase isn't any individual feature — it's the deliberate refusal to hide the wiring. There is no `create_app()` factory, no plugin manifest, no auto-discovery of routes; just an explicit list of `include_router` calls. The cost is that adding a feature touches `app.py`. The benefit is that the entire HTTP surface, the full middleware stack, the exemption rules for auth, and every startup task are all readable in one file. For a personal-server project where the operator is also the developer, that's a defensible trade.
The places where it stops being naive are exactly where the operator would feel pain otherwise: a request timeout middleware so one hung subprocess doesn't freeze the whole event loop; a bcrypt-cached API-token path so external integrations don't pay a linear DB scan per request; a per-request CSP nonce templated into the SPA so XSS surface stays small; an in-process loopback token so the agent can call its own admin routes without a cookie; a startup task list with strong references so warmups don't get garbage-collected. Each one reads like a "we got bit by this once" fix, and the inline comments mostly say so.
Sources: [app.py:64-99](app.py), [app.py:130-254](app.py), [core/middleware.py:47-100](core/middleware.py)
---
## 04. The Agent Loop: 2,000 Lines of Tool Orchestration
> How agent_loop.py, agent_tools.py, and the tool_* modules turn a chat turn into a sequenced run of shell, file, web, memory, and MCP tool calls — including tool security, parsing, and the per-turn context compactor.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/04-the-agent-loop-2-000-lines-of-tool-orchestration.md
- Generated: 2026-05-31T19:48:15.499Z
### Source Files
- `src/agent_loop.py`
- `src/agent_tools.py`
- `src/tool_execution.py`
- `src/tool_schemas.py`
- `src/tool_security.py`
- `src/context_compactor.py`
- `routes/chat_routes.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/agent_loop.py](src/agent_loop.py)
- [src/agent_tools.py](src/agent_tools.py)
- [src/tool_execution.py](src/tool_execution.py)
- [src/tool_schemas.py](src/tool_schemas.py)
- [src/tool_parsing.py](src/tool_parsing.py)
- [src/tool_security.py](src/tool_security.py)
- [src/context_compactor.py](src/context_compactor.py)
- [routes/chat_routes.py](routes/chat_routes.py)
</details>
# The Agent Loop: 2,000 Lines of Tool Orchestration
Odysseus's agent mode is one fat generator. `stream_agent_loop()` is ~900 lines of Python that takes a chat turn and walks a model through up to 20 rounds of tool execution, streaming SSE events the whole way. Around it sits a parser that has to handle five competing tool-call dialects, a dispatcher that fans out to MCP servers and forty-odd in-process implementations, a security policy that blocks shell/email/admin tools from non-admin owners, and a per-turn context compactor. Together that's ~2,100 lines anchoring everything an Odysseus chat can *do* — and it shows how much glue a serious tools-runtime actually needs.
This page walks through the moving parts: how a chat turn enters the loop, how the LLM's output is parsed into tool blocks, how tools are dispatched and budgeted, what the security gate refuses, and how the compactor keeps context from blowing up over long sessions.
## Why agent mode lives in one big generator
`stream_agent_loop` is an `AsyncGenerator[str, None]` that produces SSE-formatted `data: …` events. The HTTP layer (`routes/chat_routes.py:792-829`) just iterates it and forwards relevant event types to the browser. Everything stateful — round counter, native-vs-fenced tool parsing, stall detection, doc streaming, metrics, verifier — lives inside that one generator's locals.
```text
HTTP route ── SSE chunks ──► stream_agent_loop ── messages[] ──► stream_llm_with_fallback ──► provider
│ │
│◄────── tool_calls / deltas ──────────┘
▼
parse_tool_blocks ──► execute_tool_block ──► MCP / native impl
│ │
└──── format_tool_result ◄────────────┘
```
Sources: [src/agent_loop.py:1218-1246](), [routes/chat_routes.py:785-829]()
## The chat turn lifecycle
A single chat turn entering agent mode goes through six prep steps before any tool runs.
| Step | What happens | Where |
|---|---|---|
| 1. Admin & ownership check | `blocked_tools_for_owner()` adds shell/email/admin tools to `disabled_tools` for non-admins; MCP schemas are hidden | `agent_loop.py:1250-1257`, `tool_security.py:14-74` |
| 2. RAG tool selection | `get_tool_index().get_tools_for_query()` retrieves ~8 likely-relevant tool names; keyword hints fill in if the vector index is down | `agent_loop.py:1267-1325` |
| 3. Endpoint capability probe | DB lookup for `supports_tools` flag, then a model-name heuristic decides if function schemas are sent | `agent_loop.py:1336-1368` |
| 4. Prompt assembly | `_build_system_prompt` injects preamble, rules, per-tool sections, skill index, integrations, MCP descriptions | `agent_loop.py:1369-1376`, `agent_loop.py:912-998` |
| 5. Soft context trim | If estimated tokens exceed `agent_input_token_budget`, `trim_for_context()` drops older system messages first, then old turns | `agent_loop.py:1378-1404`, `context_compactor.py:98-172` |
| 6. Drop internal `_protected` markers | Strips Odysseus-only metadata before serializing to the LLM API | `agent_loop.py:1407` |
Each step's wall-clock time is captured into `prep_timings` and emitted as an `agent_prep` SSE event before the first model call. Sources: [src/agent_loop.py:1258-1409]()
## Five tool-call dialects, one parser
Different models emit tool calls in different shapes. `parse_tool_blocks()` in [src/tool_parsing.py:321-385]() tries five formats in priority order:
1. Fenced code blocks: ```` ```bash\n…\n``` ```` (the canonical format taught in the system prompt).
2. `[TOOL_CALL] {…} [/TOOL_CALL]` blocks.
3. XML `<tool_call><invoke name="…"><parameter name="…">…</parameter></invoke></tool_call>`.
4. MiniMax-style `<tool_code>{tool => 'name', args => …}</tool_code>`.
5. DeepSeek DSML markup with fullwidth-pipe delimiters (`<||DSML||tool_calls>`), normalized into form 3 by `_normalize_dsml()`.
The fenced-block regex is built from `TOOL_TAGS`, a 60-entry set declared in `agent_tools.py:29-61` that lists every legal tag — including the cookbook tools added because *"without these entries, native function calls to e.g. list_served_models are rejected as 'Unknown function call' before reaching the dispatcher"* ([src/agent_tools.py:46-50]()).
When the model uses native function calling instead, `function_call_to_tool_block()` ([src/tool_schemas.py:1016-1171]()) does the inverse: parse JSON arguments and rebuild the text content each tool expects. Email tools get auto-routed to the `mcp__email__…` namespace there, and an unknown name falls through to `mcp__` if it starts with that prefix.
Sources: [src/tool_parsing.py:22-83](), [src/tool_parsing.py:321-385](), [src/tool_schemas.py:1016-1171]()
## Inside one round
Each iteration of the `for round_num in range(1, max_rounds + 1)` loop does this:
```mermaid
flowchart TD
A[Stream LLM round] -->|deltas + tool_calls| B[_resolve_tool_blocks]
B -->|native or parsed| C{tool_blocks empty?}
C -->|yes| D[Verifier? Done.]
C -->|no| E[Loop-breaker check]
E -->|stuck or runaway| F[Force-answer next round]
E -->|ok| G[For each block]
G --> H[Budget check]
H --> I[execute_tool_block]
I --> J[format_tool_result]
J --> K[Append to messages]
K --> A
```
Sources: [src/agent_loop.py:1444-2074]()
**Streaming.** The loop forwards raw `delta` chunks to the frontend as they arrive, and intercepts three other event types it gets from `stream_llm`: `tool_call_delta` (incremental native-call JSON, used to live-stream `create_document` content into the editor panel), `tool_calls` (the final list of native calls), and `usage` (real token counts when the provider reports them) ([src/agent_loop.py:1525-1632]()).
**Loop-breaker.** A `deque` of recent call signatures and a `Counter` of per-tool calls detect two stall modes: repeating the same call without writing prose for 4 rounds, or firing one tool ≥15 times. When either trips, `_force_answer = True` and the next round is run with `tools=None` plus a system note telling the model to write the answer or declare blocked. If the model still emits no prose, a "grace synthesis" non-streaming call is made over the same message history to salvage an answer ([src/agent_loop.py:1772-1825], [src/agent_loop.py:1661-1690]()).
**Per-tool progress streaming.** Long bash/python jobs run inside `_run_subprocess_streaming()` which keeps a 12-line tail ring buffer and pushes `{elapsed_s, tail}` payloads every 2 s through an `asyncio.Queue` the loop drains while awaiting the tool task — the UI shows live elapsed-time without the loop blocking on the subprocess ([src/tool_execution.py:59-167](), [src/agent_loop.py:1887-1913]()).
**Completion verifier (opt-in).** If `_effectful_used` (a write_file/bash/python/document tool ran) and `agent_verifier_subagent` is on, a fresh-context model call reads only the user request + an actions snapshot and decides `VERIFICATION: SUCCESS|FAIL`. The setting is off by default because *"on weak local models the verifier can't judge from the action-snapshot … and false-rejects … forces a costly extra round every effectful turn"* ([src/agent_loop.py:1742-1769]()).
## Tool dispatcher: MCP, native, or admin-blocked
`execute_tool_block()` in [src/tool_execution.py:477-731]() is a big elif chain. Before dispatch it runs three gates:
1. **Misformatted JSON detection** — a `{…}` JSON object inside a `python`/`json`/`xml` fence triggers a teaching error explaining the correct fence tag.
2. **User-disabled tools** — anything in `disabled_tools` returns `{"error": "…disabled by user."}`.
3. **Admin gate** — `_ADMIN_TOOLS` plus `is_public_blocked_tool()` reject anything sensitive when the owner isn't admin (or auth isn't configured at all, which is treated as single-user).
The bash background marker (`#!bg` as first line) is handled before any normal dispatch: the command is launched detached via `bg_jobs.launch()` and the agent gets back a job id immediately, with monitoring re-invoking the agent when the job exits ([src/tool_execution.py:566-584]()).
After gates, dispatch splits by category:
| Category | Path | Notes |
|---|---|---|
| MCP-extracted (bash, python, web_search, read_file, write_file, generate_image, manage_memory) | `_call_mcp_tool` then `_direct_fallback` | Fallback runs the work in-process when MCP server isn't connected |
| Document tools | `do_create_document`, `do_update_document`, `do_edit_document`, `do_suggest_document` | Frontend gets a `doc_update` SSE event with the real doc id |
| AI/session dispatcher | `dispatch_ai_tool` | chat_with_model, create_session, list_sessions, send_to_session, pipeline, manage_session, manage_memory, list_models, ui_control, ask_teacher |
| Admin/management | `do_manage_*` | Tasks, skills, endpoints, MCP servers, webhooks, tokens, documents, settings, notes, calendar |
| Cookbook LLM serving | `do_download_model`, `do_serve_model`, `do_list_*`, `do_serve_preset`, `do_adopt_served_model` | Backed by tmux + a cookbook state file the UI watches |
| Generic | `do_app_api` (loopback to any UI-button HTTP endpoint), `mcp__*` (raw MCP call) | |
Sources: [src/tool_execution.py:477-731](), [src/agent_tools.py:29-61]()
## What the `bash` block really does
The fenced `bash` path is worth a closer look because it's where most of the security exposure lives. After the dialect parser hands `execute_tool_block` a `ToolBlock("bash", "<cmd>")`, the runtime:
1. Adds `disabled_tools`-check and the public-policy check; non-admins always see `bash` in `NON_ADMIN_BLOCKED_TOOLS` so the call is refused before any subprocess starts ([src/tool_security.py:14-46]()).
2. If the first line is `#!bg`, the command is handed to `bg_jobs.launch()` and detached.
3. Otherwise, `_direct_fallback` calls `asyncio.create_subprocess_shell()` with a hardened env: real `os.environ`, but `TERM=xterm-256color`, `COLUMNS=120`, `LINES=40` so commands that probe terminfo don't fail ([src/tool_execution.py:317-322]()).
4. `_run_subprocess_streaming` watches stdout/stderr line-by-line, kills the process on `asyncio.CancelledError` (chat stop button), and enforces a `DEFAULT_BASH_TIMEOUT` of 1 hour with a TERM→SIGKILL ladder.
Output is truncated to `MAX_OUTPUT_CHARS = 10_000` and stderr is appended onto stdout as `STDERR: …`. Sources: [src/tool_execution.py:33-44](), [src/tool_execution.py:293-344]()
## Security: what gets refused
`tool_security.py` declares the entire non-admin denylist as one set:
```python
# src/tool_security.py:14-46
NON_ADMIN_BLOCKED_TOOLS = {
"bash", "python", "read_file", "write_file",
"search_chats", "manage_memory", "manage_skills",
"manage_tasks", "manage_endpoints", "manage_mcp",
"manage_webhooks", "manage_tokens", "manage_documents",
"manage_settings", "api_call", "app_api",
"send_email", "reply_to_email", "list_emails", "read_email",
"resolve_contact", "manage_contact", "manage_calendar",
"vault_search", "vault_get", "vault_unlock",
"download_model", "serve_model", "stop_served_model",
"cancel_download", "adopt_served_model",
}
```
`is_public_blocked_tool()` returns True for any name in that set or any name starting with `mcp__`. `owner_is_admin_or_single_user()` short-circuits to True when `AuthManager().is_configured` is False — a deliberate choice that keeps the dev/single-user setup wide open while the multi-tenant deployment is locked down.
The check happens in two places that both matter:
- **Schema scrubbing** in `stream_agent_loop` so blocked tools never even reach the LLM's tool list ([src/agent_loop.py:1250-1257]()).
- **Execution refusal** in `execute_tool_block` so a model that imagines a tool name out of training data still can't run it ([src/tool_execution.py:550-560]()).
The two-layer design matters: schema scrubbing makes the model less likely to try, but doesn't trust the model to actually obey.
Sources: [src/tool_security.py:1-74](), [src/agent_loop.py:1250-1257](), [src/tool_execution.py:544-560]()
## Context compaction: trim now, summarize later
Two independent mechanisms keep the message list from outgrowing the model's window.
**Per-turn soft trim** (`trim_for_context`, [src/context_compactor.py:98-172]()) runs *every* round of `stream_agent_loop` before sending. It walks a priority ladder:
1. Drop extra system messages (RAG context, memory) but keep the first system prompt.
2. Add some back if budget allows.
3. Truncate the kept system prompt to 2 000 chars with a "[System prompt truncated…]" marker.
4. Drop oldest non-system turns, but protect the last 10.
It also runs `_sanitize_tool_messages` ([src/context_compactor.py:52-95]()) which drops orphan `role: "tool"` messages whose parent assistant `tool_calls` got trimmed away — a real-world OpenAI-API constraint: *"messages with role 'tool' must be a response to a preceding message with 'tool_calls'"*. Without that pass, front-trimming can produce a request the API just rejects.
**Cross-turn compaction** (`maybe_compact`, [src/context_compactor.py:175-272]()) is the heavier one, triggered when token usage crosses 85% of the context window. It splits the conversation in half, summarizes the older half with a Cursor-style structured prompt (User Goal / What Was Done / Current State / Pending / Key Context), and replaces those messages with one system message that prefixes the summary with `[Conversation summary …]`. The compaction model is routed through `resolve_endpoint("utility")` so a small/cheap model can do the busy-work without dragging in the main one.
Sources: [src/context_compactor.py:18-49](), [src/context_compactor.py:52-95](), [src/context_compactor.py:175-272]()
## Surprising details
A few things in here aren't what you'd guess from reading the README:
- **The auto-document fallback** ([src/agent_loop.py:1691-1720]()) watches chat output for any unrequested code block longer than 30 lines and synthesizes a `create_document` tool block on the model's behalf — so a model that ignores the "never paste long code in chat" rule still produces an artifact in the editor panel.
- **DSML normalization** ([src/tool_parsing.py:70-83]()) exists solely because DeepSeek sometimes emits raw markup with fullwidth-pipe delimiters when its provider didn't parse tool schemas correctly. Odysseus rewrites that into the standard XML shape before parsing *and* stripping, so the garbage never reaches the user.
- **Per-endpoint `supports_tools`** ([src/agent_loop.py:1336-1368]()) is stored in the DB and set by the cookbook serve command. A vLLM run with `--enable-auto-tool-choice` flips it on at registration time — that means the agent's tool-schema choice is partly a function of how the model was launched.
- **`reasoning_content` echo-back** ([src/agent_loop.py:1040-1080]()) — DeepSeek's API rejects follow-up requests in thinking mode that don't include the prior reasoning, so the loop accumulates reasoning deltas separately and attaches them to the assistant message on the next round. Other vendors ignore the extra field.
- **Force-answer round** ([src/agent_loop.py:1461-1465]()) is the only place tools are sent as `None`. The loop-breaker uses it as a hard escape hatch — even if the model wants to call another tool, no schema means no call to make.
- **The `bash`/`python` timeout is one hour**, not the 60 s the README-era comment in `agent_tools.py` still says. The author moved it to a per-tool constant in `tool_execution.py:33-34` after seeing the agent *"go silent because it had nothing to report"* when 60 s killed real installs.
## What builders should notice
If you're putting together your own tool-using agent runtime, the shape here is worth borrowing:
- **Two-layer tool security** — scrub schemas *and* refuse at the dispatcher. The model is not your access control.
- **Multi-dialect parsing as a hard requirement** — once you support more than one model family, you're going to ship a regex jungle. Build it on day one.
- **Stall detection separate from rounds** — round budget alone doesn't stop a stuck model; signature-based loop-breakers do, with a clean handoff to a "force-answer" round.
- **Compaction is two problems, not one** — there's the per-turn trim that needs to be cheap and synchronous, and the cross-turn summary that needs an LLM call. Don't fuse them.
- **Progress events from inside subprocess tools** — the difference between "the chat looks dead" and "the user can see what's happening" is one async queue.
The whole thing isn't elegant — `stream_agent_loop` is the kind of function code-review checklists were invented to complain about. But the messiness is doing real work: every loop-breaker, every fallback, every normalize pass exists because some specific model misbehaved in a specific way against a specific provider, and the author chose to absorb the variance in glue rather than push it back onto users.
Sources: [src/agent_loop.py:1218-2106](), [src/tool_execution.py:477-731](), [src/tool_security.py:14-74](), [src/context_compactor.py:98-272]()
---
## 05. Local MCP Servers: Email, Images, Memory, RAG as First-Class Tools
> The mcp_servers/ tree ships Odysseus-native Model Context Protocol servers so the agent can call email, image generation, memory, and RAG through the same MCP interface used for third-party tools — wired through mcp_manager and builtin_mcp.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/05-local-mcp-servers-email-images-memory-rag-as-first-class-tools.md
- Generated: 2026-05-31T19:49:53.885Z
### Source Files
- `mcp_servers/email_server.py`
- `mcp_servers/image_gen_server.py`
- `mcp_servers/memory_server.py`
- `mcp_servers/rag_server.py`
- `src/mcp_manager.py`
- `src/builtin_mcp.py`
- `routes/mcp_routes.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [mcp_servers/email_server.py](mcp_servers/email_server.py)
- [mcp_servers/image_gen_server.py](mcp_servers/image_gen_server.py)
- [mcp_servers/memory_server.py](mcp_servers/memory_server.py)
- [mcp_servers/rag_server.py](mcp_servers/rag_server.py)
- [mcp_servers/_common.py](mcp_servers/_common.py)
- [src/mcp_manager.py](src/mcp_manager.py)
- [src/builtin_mcp.py](src/builtin_mcp.py)
- [routes/mcp_routes.py](routes/mcp_routes.py)
</details>
# Local MCP Servers: Email, Images, Memory, RAG as First-Class Tools
Most agent frameworks treat the Model Context Protocol (MCP) as a way to consume *other people's* tool servers — a Slack adapter here, a Notion server there. Odysseus inverts that: its own core capabilities — IMAP/SMTP, image generation, the memory store, and the RAG index — are shipped as four Python MCP servers under `mcp_servers/`, started as stdio subprocesses on boot, and reachable through the exact same `McpManager` plumbing that handles a third-party Playwright server. The agent does not see a "built-in tool" path and an "external tool" path; it sees one MCP tool surface where some names happen to start with `mcp__email__` or `mcp__rag__`.
This page covers how those servers are structured, how they are wired in through `builtin_mcp.py` and `mcp_manager.py`, and the design choices a builder should notice — including which servers stayed as MCP processes and which ones were deliberately pulled back into the host.
## What "first-class" actually means here
A built-in MCP server in Odysseus has the same contract as a third-party one. Each script in `mcp_servers/` instantiates `mcp.server.Server`, declares tools via `@server.list_tools()`, handles invocations via `@server.call_tool()`, and runs over `stdio_server()`. For example, `image_gen_server.py` is essentially:
```python
# mcp_servers/image_gen_server.py
server = Server("image_gen")
@server.list_tools()
async def list_tools() -> list[Tool]:
return [Tool(name="generate_image", description=..., inputSchema=...)]
@server.call_tool()
async def call_tool(name, arguments) -> list[TextContent]:
...
async def run():
async with stdio_server() as (read_stream, write_stream):
await server.run(read_stream, write_stream, server.create_initialization_options())
```
That is the exact MCP server shape an external vendor would write. The only Odysseus-specific bits are an `sys.path.insert(0, ...)` so the subprocess can import the host codebase, and a small `_common.py` of shared output limits and timeouts.
Sources: [mcp_servers/image_gen_server.py:13-39](mcp_servers/image_gen_server.py), [mcp_servers/image_gen_server.py:160-166](mcp_servers/image_gen_server.py), [mcp_servers/_common.py:1-19](mcp_servers/_common.py)
## The four servers and what each owns
| Server (id) | Script | Tools exposed | What it actually does |
|---|---|---|---|
| `email` | `email_server.py` | `list_email_accounts`, `list_emails`, `read_email`, `search_emails`, `send_email`, `reply_to_email`, `archive_email`, `delete_email`, `mark_email_read`, `bulk_email`, `download_attachment` | Multi-account IMAP/SMTP using `imaplib`/`smtplib`, reads accounts from `data/app.db :: email_accounts`, decrypts SMTP/IMAP passwords via `src.secret_storage`, pulls AI-summary text from `email_cache.db` |
| `image_gen` | `image_gen_server.py` | `generate_image` | Calls an OpenAI-compatible `/images/generations` endpoint (`gpt-image-1.5`, `gpt-image-1`, `dall-e-3` auto-detect), saves PNGs to `data/generated_images/`, records each in the `GalleryImage` table |
| `memory` | `memory_server.py` | `manage_memory` (list/add/edit/delete/search) | Wraps the host `MemoryManager`, optionally mirrors entries into a `MemoryVectorStore` for semantic search |
| `rag` | `rag_server.py` | `manage_rag` (list/add_directory/remove_directory) | Drives the RAG singleton and `PersonalDocsManager` — indexing and removing user document directories |
Each tool keeps its schema deliberately small. The email server, for instance, packs eleven distinct operations into one server but exposes a shared `ACCOUNT_PROP` so every call can target a specific mailbox by name, address, or id; the bulk operation accepts either an explicit `uids` list or `all_unread: true` to avoid one-RPC-per-message storms.
Sources: [mcp_servers/email_server.py:1036-1228](mcp_servers/email_server.py), [mcp_servers/memory_server.py:47-72](mcp_servers/memory_server.py), [mcp_servers/rag_server.py:46-65](mcp_servers/rag_server.py), [mcp_servers/image_gen_server.py:22-39](mcp_servers/image_gen_server.py)
## How they get wired in: `builtin_mcp.py` and `McpManager`
On startup, `register_builtin_servers` walks a static `_BUILTIN_SERVERS` dict and calls `mcp_manager.connect_server(...)` for each one, using the host's `sys.executable` as the command and the script path as the only arg. A second pass starts NPX-based servers (currently a Playwright "browser" server) after a 3 s delay so the Python ones grab their slots first.
```python
# src/builtin_mcp.py
_BUILTIN_SERVERS = {
"image_gen": ("mcp_servers/image_gen_server.py", "Built-in: Image Generation"),
"memory": ("mcp_servers/memory_server.py", "Built-in: Memory"),
"rag": ("mcp_servers/rag_server.py", "Built-in: RAG"),
"email": ("mcp_servers/email_server.py", "Built-in: Email"),
}
```
`McpManager._connect_stdio` then does what any MCP client does: spin up `stdio_client`, open a `ClientSession`, `initialize()`, then `list_tools()` to discover what the server actually exposes. The discovered tools are cached in `self._tools[server_id]`, and an `AsyncExitStack` per server holds the stdio pipes and session for clean teardown. There is no special path for built-ins at this layer — they are just connections whose `server_id` happens to be `email`, `image_gen`, etc.
```text
+-----------------+ stdio +----------------------------+
| McpManager | <--------------> | python mcp_servers/email_ |
| _sessions["email"] | server.py |
| _tools["email"] = [list_emails,...] | Server("email") |
+-----------------+ +----------------------------+
^ |
| call_tool("mcp__email__list_emails",...)|
| v
| imaplib -> data/app.db
| SMTP -> email_cache.db
```
The host marks these servers as built-ins purely by id membership in `is_builtin`, which is later used to enable auto-reconnect if the subprocess dies mid-call:
```python
# src/mcp_manager.py
def is_builtin(self, server_id: str) -> bool:
return server_id.startswith("builtin_") or server_id in {
"image_gen", "memory", "rag", "email",
}
```
When `call_tool` raises against a built-in, `_reconnect_builtin` looks the server up in `_BUILTIN_SERVERS`, tears down the dead exit stack, re-spawns the subprocess, and retries the call once. Third-party servers get no such second chance.
Sources: [src/builtin_mcp.py:48-103](src/builtin_mcp.py), [src/mcp_manager.py:53-105](src/mcp_manager.py), [src/mcp_manager.py:196-295](src/mcp_manager.py), [src/mcp_manager.py:349-356](src/mcp_manager.py)
## The qualified-name trick — and a deliberate asymmetry
Every MCP tool is exposed to the model under a namespaced name `mcp__{server_id}__{tool_name}`. `McpManager.call_tool` parses that back into `server_id` and `tool_name` and dispatches to the right session. This is what lets the agent loop treat email tools, the browser tool, and a third-party Slack tool identically.
But `get_all_openai_schemas` and `get_tool_descriptions_for_prompt` deliberately *skip* the Python built-ins when generating OpenAI function-calling schemas:
```python
# src/mcp_manager.py
if self.is_builtin(server_id) and server_id != "builtin_browser":
continue
```
The reason is that those four servers are also surfaced through Odysseus's own code-block tool format inside the agent prompt, so re-advertising them as JSON-schema function tools would double-list them. The NPX-based Playwright server — which is built-in in the sense of "ships with the product" but does not have hand-written agent-prompt entries — *does* go through function calling. So a builder reading the code finds two surfaces:
- For `email` / `image_gen` / `memory` / `rag`: the JSON-RPC plumbing is MCP, but the model sees a curated code-block tool description.
- For `builtin_browser` and every external server: the model sees a standard OpenAI function tool with the `mcp__...` qualified name.
That asymmetry is intentional and is the one thing easy to miss when porting another tool into this layout.
Sources: [src/mcp_manager.py:196-236](src/mcp_manager.py), [src/mcp_manager.py:297-330](src/mcp_manager.py), [src/mcp_manager.py:369-409](src/mcp_manager.py)
## What was *removed* tells you the design rule
A comment in `builtin_mcp.py` explains a recent refactor:
> `bash / python / filesystem / web_search` were folded into native in-process execution (`src/tool_execution.py:_direct_fallback`). Those trivial subprocess wrappers are gone.
>
> `image_gen / memory / rag / email` still run as stdio MCP servers — each carries hundreds of LOC of unique IMAP / HTTP / manager logic not worth duplicating into the native path right now.
The rule is roughly: if a tool is a thin shim over a stdlib call, keep it in-process; if it carries a body of unique logic that also benefits from process isolation (separate Python interpreter, can crash without taking the host down, can be hot-reconnected), keep it as an MCP server. Email is the clearest case — `email_server.py` is ~1,600 lines of IMAP folder resolution, multi-account routing, header decoding, MIME extraction, and SMTP send-with-Sent-folder-append.
Sources: [src/builtin_mcp.py:39-53](src/builtin_mcp.py), [mcp_servers/email_server.py:127-215](mcp_servers/email_server.py), [mcp_servers/email_server.py:771-833](mcp_servers/email_server.py)
## Subsystem-by-subsystem notes for builders
### Email server: multi-account by selector
`_load_config(account)` resolves an account selector by id, then exact-match against name/imap_user/from_address, then fuzzy match via `difflib.get_close_matches` at 0.72 cutoff. SMTP passwords are decrypted with `src.secret_storage.decrypt` before being handed to `smtplib` — falling back to raw ciphertext was a documented past bug. Port `993` is treated as implicit IMAP TLS; port `587` triggers `starttls()`; `465` uses `SMTP_SSL`. When no `account` is passed and 2+ accounts exist, `list_emails` *fans out across all accounts* and merges results sorted by `Date` header, prepending an `[EMAIL ACCOUNT CONTEXT: ...]` note so the model knows the result is merged.
Sources: [mcp_servers/email_server.py:88-215](mcp_servers/email_server.py), [mcp_servers/email_server.py:477-502](mcp_servers/email_server.py), [mcp_servers/email_server.py:1311-1384](mcp_servers/email_server.py)
### Image gen: provider-neutral by URL surgery
`image_gen_server.py` reads the *chat* model resolution from `src.ai_interaction._resolve_model`, then chops `/chat/completions` or `/v1/messages` off the URL and appends `/images/generations`. That keeps the server agnostic to which OpenAI-compatible backend a user configured. It enforces the per-model size whitelist (`gpt-image` accepts `1024x1024`, `1024x1536`, `1536x1024`, `auto`; `dall-e-3` accepts a different set), writes returned `b64_json` payloads under `data/generated_images/<uuid>.png`, and records a `GalleryImage` row so the host UI sees it. A 300 s read timeout is set explicitly because image models are slow.
Sources: [mcp_servers/image_gen_server.py:55-150](mcp_servers/image_gen_server.py)
### Memory: lazy init + dual-write to vector store
The memory server defers importing `MemoryManager` and `MemoryVectorStore` until the first `manage_memory` call (`_ensure_init`). When the vector store is healthy, every `add`/`edit`/`delete` is mirrored into it, but the mirror is best-effort — wrapped in `try/except` so a vector failure never blocks the JSON write. Search prefers `_memory_manager.get_relevant_memories(query, ..., threshold=0.05)` and falls back to substring filtering.
Sources: [mcp_servers/memory_server.py:27-44](mcp_servers/memory_server.py), [mcp_servers/memory_server.py:108-196](mcp_servers/memory_server.py)
### RAG: leans on host singletons
The RAG server is the thinnest of the four. It pulls `get_rag_manager()` and instantiates `PersonalDocsManager(PERSONAL_DIR, _rag_manager)` once, then delegates `list` / `add_directory` / `remove_directory` to those host objects. The only safeguards are `os.path.expanduser`, an `isdir` check before indexing, and a graceful "not available" path when either manager fails to load.
Sources: [mcp_servers/rag_server.py:25-44](mcp_servers/rag_server.py), [mcp_servers/rag_server.py:76-132](mcp_servers/rag_server.py)
## Lifecycle and admin surface
```mermaid
sequenceDiagram
participant App as app startup
participant Reg as register_builtin_servers
participant Mgr as McpManager
participant Proc as python mcp_servers/*.py
participant Agent as agent loop
App->>Reg: await register_builtin_servers(mcp_manager)
Reg->>Mgr: connect_server(id, name, stdio, python, [script])
Mgr->>Proc: spawn subprocess + open stdio pipes
Proc-->>Mgr: initialize() ack
Mgr->>Proc: list_tools()
Proc-->>Mgr: [Tool(name=...), ...]
Mgr-->>Reg: status=connected, tool_count=N
Agent->>Mgr: call_tool("mcp__email__list_emails", args)
Mgr->>Proc: session.call_tool("list_emails", args)
Proc-->>Mgr: TextContent(...)
Mgr-->>Agent: {stdout, exit_code}
Note over Mgr,Proc: on exception, if is_builtin(id):<br/>_reconnect_builtin → respawn → retry once
```
Both built-in and user-added servers show up in `routes/mcp_routes.py`'s `GET /api/mcp/servers`, which calls `mcp_manager.get_server_status(srv.id)` and reports `tool_count`, `disabled_tool_count`, and connection status. Adding a stdio server through `POST /api/mcp/servers` is gated by `require_admin(request)` with a blunt comment in the code: registering a stdio server "is equivalent to executing arbitrary binaries on the host." Per-server tool blocklists live in `McpServer.disabled_tools` and are merged in via `_load_disabled_map()` before tools are advertised to the model.
A global kill switch — `ODYSSEUS_DISABLE_MCP=1` — short-circuits `register_builtin_servers` entirely.
Sources: [src/builtin_mcp.py:65-103](src/builtin_mcp.py), [src/mcp_manager.py:358-365](src/mcp_manager.py), [routes/mcp_routes.py:22-79](routes/mcp_routes.py), [routes/mcp_routes.py:81-103](routes/mcp_routes.py)
## What builders should take away
Three things worth borrowing:
1. **Same protocol for first- and third-party tools.** By writing local features as MCP servers, Odysseus avoids forking the agent's tool-calling code between "internal" and "external" paths. New built-ins drop into `_BUILTIN_SERVERS` and inherit reconnection, status, and admin UI for free.
2. **Process isolation as a reliability tool.** The auto-reconnect path for `is_builtin` servers means an IMAP socket dying or an image API hanging can be recovered without restarting the host. That is cheap to get when each server is its own Python subprocess, and hard to get when the same logic is inlined into the agent runtime.
3. **Keep the cheap stuff native.** Bash, Python eval, filesystem reads, and web search were *removed* from `mcp_servers/` and folded into in-process execution because the subprocess overhead bought nothing for a `subprocess.run` shim. The rule the codebase implicitly follows is: pay for MCP isolation only when the wrapped logic is heavy enough to be worth isolating.
The asymmetry around `get_all_openai_schemas` skipping Python built-ins is the most easily missed footgun — a contributor adding a fifth Python server should expect to either add it to the curated code-block tool list in the agent prompt or change `is_builtin` so the new server gets exposed through function calling.
Sources: [src/builtin_mcp.py:39-53](src/builtin_mcp.py), [src/mcp_manager.py:266-330](src/mcp_manager.py)
---
## 06. Cookbook: Hardware-Aware Model Recommendations via hwfit
> How services/hwfit/ scans GPU/CPU/RAM, fits GGUF/FP8/AWQ candidates against the box, and proposes a download-and-serve plan — the practical heart of the "click to install a local LLM" pitch, adapted from the llmfit library.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/06-cookbook-hardware-aware-model-recommendations-via-hwfit.md
- Generated: 2026-05-31T19:50:32.997Z
### Source Files
- `services/hwfit/hardware.py`
- `services/hwfit/fit.py`
- `services/hwfit/models.py`
- `services/hwfit/image_models.py`
- `routes/cookbook_routes.py`
- `routes/cookbook_helpers.py`
- `routes/hwfit_routes.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [services/hwfit/hardware.py](services/hwfit/hardware.py)
- [services/hwfit/fit.py](services/hwfit/fit.py)
- [services/hwfit/models.py](services/hwfit/models.py)
- [services/hwfit/image_models.py](services/hwfit/image_models.py)
- [services/hwfit/data/hf_models.json](services/hwfit/data/hf_models.json)
- [routes/hwfit_routes.py](routes/hwfit_routes.py)
- [routes/cookbook_routes.py](routes/cookbook_routes.py)
- [licenses/llmfit-MIT-LICENSE.txt](licenses/llmfit-MIT-LICENSE.txt)
- [README.md](README.md)
</details>
# Cookbook: Hardware-Aware Model Recommendations via hwfit
The Cookbook tab is Odysseus's "click to install a local LLM" surface, and `services/hwfit/` is the brain behind it. It probes the box (NVIDIA, AMD, Apple, or remote Windows over SSH), reads a curated model catalog of ~hundreds of HuggingFace models, simulates how each one would fit in VRAM at various quantizations, scores them on quality/speed/fit/context, and hands the ranked list back to the UI so the user can click "Download" or "Serve" on a model that will actually run. The library is adapted from Alex Jones's open-source `llmfit` (MIT) — see [licenses/llmfit-MIT-LICENSE.txt:1-3](licenses/llmfit-MIT-LICENSE.txt) and [README.md:13](README.md) — and rebuilt inside Odysseus as a provider-neutral, BYOC sizer that targets local engines (vLLM, llama.cpp, Ollama) rather than any hosted API.
The interesting part is not the catalog — it's the arithmetic. The same model can fit, half-fit, or not fit depending on quant, KV-cache context, whether GPUs are identical, and whether vLLM can tensor-parallel across them. `hwfit` encodes those constraints in roughly 800 lines of pure Python, with no GPU code, so it runs on the web server and produces an answer in milliseconds.
## What the probe actually reads
`detect_system()` is the single entry point. It returns a flat dict of total/available RAM (GB), CPU cores, CPU model, GPU name, total VRAM, per-GPU detail, and a `backend` string drawn from `{cuda, rocm, cpu_x86, cpu_arm, mps}`. The detection is cached per host for 30 minutes (`CACHE_TTL = 1800`) because, as the comment notes, "hardware rarely changes; use the Rescan button to force a re-probe."
NVIDIA detection runs `nvidia-smi --query-gpu=memory.total,name --format=csv,noheader,nounits` and parses one row per device, using the row position as the CUDA index that later gets pinned via `CUDA_VISIBLE_DEVICES`. The remote-SSH path is hardened against the classic non-interactive PATH problem: if the first call comes back empty on a remote, it retries through `bash -lc` with `/usr/local/cuda/bin` added, and as a last resort tries `nvidia-smi` by absolute path in three common locations. When the binary is there but the driver isn't talking (the post-update / no-reboot case), it surfaces the NVML error string into `gpu_error` so the UI can say "GPU driver error" instead of the misleading "No GPU". See [services/hwfit/hardware.py:71-129](services/hwfit/hardware.py).
AMD detection walks `/sys/class/drm/card*/device/` looking for `vendor == 0x1002`, and is intentionally nuanced about APUs. Discrete cards report real VRAM in `mem_info_vram_total`; Strix Halo and similar unified-memory SoCs report a tiny `vram_total` with the real pool in `mem_info_vis_vram_total`, so the code takes the max of the two and only falls back to `mem_info_gtt_total` if both are zero. The `unified_memory` flag is set when `vis_vram >= vram`, and the code explicitly does not cap it at system RAM because BIOS-carved UMA is physically backed but invisible to `/proc/meminfo`. See [services/hwfit/hardware.py:132-204](services/hwfit/hardware.py).
Windows is detected differently: one giant PowerShell command bundles `Win32_OperatingSystem`, `Win32_Processor`, `nvidia-smi`, and `Win32_VideoController` into a single JSON blob, because round-tripping multiple commands over SSH to Windows is slow ([services/hwfit/hardware.py:286-360](services/hwfit/hardware.py)).
Sources: [services/hwfit/hardware.py:6-37](services/hwfit/hardware.py), [services/hwfit/hardware.py:286-457](services/hwfit/hardware.py)
## Tensor-parallel pools: why a mixed box gets split
vLLM only tensor-parallels across **identical** GPUs. A workstation with `1×4090 + 2×3090` cannot serve a model across all three — it has to pick a homogeneous subset. `_group_gpus()` does exactly that: it groups by `(name, round(vram_gb))`, carries each GPU's CUDA index, and sorts the resulting groups by total VRAM descending, so the largest single-tensor-parallel pool becomes the default serving target.
```text
detected: [4090(24), 3090(24), 3090(24)]
│
▼
groups: [{name:"3090", count:2, vram_each:24, indices:[1,2], vram_total:48},
{name:"4090", count:1, vram_each:24, indices:[0], vram_total:24}]
▲ ▲
largest pool → default serve target still selectable
```
The grouped list flows back to the UI as `system.gpu_groups`, and the `/api/hwfit/models` route lets the caller pick a pool via `gpu_group` and clamp the count via `gpu_count` ([routes/hwfit_routes.py:131-172](routes/hwfit_routes.py)). When `gpu_count` is set, `system.gpu_only = True` is flipped on, which is consequential: it tells the fit step to refuse offload-to-RAM fallbacks for that ranking ([services/hwfit/fit.py:226-230](services/hwfit/fit.py)). Without it, a 96 GB GPU would still list a 175 GB model because the model "fits" by spilling most of its layers to system RAM — the comment calls out that exact bug.
Sources: [services/hwfit/hardware.py:40-68](services/hwfit/hardware.py), [routes/hwfit_routes.py:131-175](routes/hwfit_routes.py)
## The catalog and its quant math
`get_models()` lazy-loads `services/hwfit/data/hf_models.json` once and caches it ([services/hwfit/models.py:162-173](services/hwfit/models.py)). Each entry carries provider, parameter count (raw or `"7B"`/`"355M"`), context length, native quantization, optional `gguf_sources` (a list of HF repos that ship GGUF variants), MoE metadata (`is_moe`, `active_parameters`), and so on. Parameter parsing has a deliberate trap-handler: a bare number ≥ 1,000,000 is treated as a raw parameter count, and `"355"` is treated as 0.355 B, because otherwise a 355M model would sort above every 70B model ([services/hwfit/models.py:52-83](services/hwfit/models.py)).
The fit math hinges on three tables, all keyed by quant label:
| Quant | Bytes/param (memory) | Speed multiplier | Quality penalty |
|---|---|---|---|
| F16 / BF16 | 2.0 | 0.60 | 0 |
| FP8 | 1.0 | 0.85 | 0 |
| Q8_0 | 1.0 | 0.80 | 0 |
| Q6_K | 0.75 | 0.95 | −1 |
| Q5_K_M | 0.625 | 1.00 | −2 |
| Q4_K_M / Q4_0 | 0.5 | 1.15 | −5 |
| Q3_K_M | 0.375 | 1.25 | −8 |
| Q2_K | 0.25 | 1.35 | −12 |
| AWQ-4bit / GPTQ-Int4 | 0.5 | 1.20 | −3 |
| AWQ-8bit / GPTQ-Int8 | 1.0 | 0.85 | 0 |
| mlx-4/6/8-bit | 0.5 / 0.75 / 1.0 | 1.15 / 1.0 / 0.85 | −4 / −1 / 0 |
Memory estimation is a one-liner: `pb * bpp + 0.000008 * active_params * ctx + 0.5`. The KV cache uses **active** params, not total, because for MoE only the active experts have KV state — total VRAM is dominated by weights, but speed and KV are dominated by what actually runs per token ([services/hwfit/models.py:86-101](services/hwfit/models.py)).
Sources: [services/hwfit/models.py:5-101](services/hwfit/models.py)
## How a single model gets ranked
`analyze_model()` is the per-model worker. The flow is more nuanced than a single budget check because of two subtleties — prequantized formats have fixed bit-widths (you can't try Q3 on an AWQ-4bit), and GGUF cannot be sharded across GPUs the way vLLM-served safetensors can.
```mermaid
flowchart TD
A[model + system] --> B{prequantized?<br/>AWQ/GPTQ/FP8/MLX}
B -- yes --> C[use native quant only]
B -- no --> D{target_quant set?}
D -- yes --> E[try target_quant]
D -- no --> F[default Q4_K_M]
C --> G[_try_quant_at]
E --> G
F --> G
G --> H{fits in VRAM?}
H -- yes --> I[run_mode=gpu]
H -- no --> J{fits in RAM + has GPU?}
J -- yes --> K[run_mode=cpu_offload]
J -- no --> L[halve ctx, retry until 1024]
L --> M{fit found?}
M -- no --> N[return too_tight badge]
M -- yes --> O[fit_level + speed + composite score]
I --> O
K --> O
```
The "shard or not" decision is the one most likely to surprise a reader of the README. `effective_vram` is `single_gpu_vram` for GGUF/dense builds (because llama.cpp can't shard), but full multi-GPU VRAM for prequantized formats served by vLLM — *even when the same model also lists a GGUF alternate download* ([services/hwfit/fit.py:236-247](services/hwfit/fit.py)). A `2×24GB` box ranks a 70B model as runnable in AWQ-4bit (~35 GB across both cards) but not in Q4_K_M GGUF (won't fit on one 24 GB card).
If nothing fits, the model isn't dropped — it's returned with `fit_level: "too_tight"` and `run_mode: "no_fit"` so the UI can render a red row. Without that, editing the manual-hardware sliders upward never revealed bigger models, because they were filtered out before the user could see what *would* fit ([services/hwfit/fit.py:278-303](services/hwfit/fit.py)).
When a fit *is* found, four sub-scores are computed and weighted by use case:
| Use case | quality | speed | fit | context |
|---|---|---|---|---|
| general | 0.45 | 0.30 | 0.15 | 0.10 |
| coding | 0.50 | 0.20 | 0.15 | 0.15 |
| reasoning | 0.55 | 0.15 | 0.15 | 0.15 |
| chat | 0.40 | 0.35 | 0.15 | 0.10 |
| multimodal | 0.50 | 0.20 | 0.15 | 0.15 |
| embedding | 0.30 | 0.40 | 0.20 | 0.10 |
`_quality_score` rewards larger param counts on a bucketed curve (30 → 95 from <1B to ≥40B), nudges scores for known family names (`+3` deepseek, `+2` qwen/llama, `+1` mistral/gemma), and applies the per-quant penalty. `_speed_score` estimates tok/s using a real bandwidth table (`GPU_BANDWIDTH` covers 60+ NVIDIA / AMD / datacenter cards from `5090` through `mi300x`), then divides by per-use-case targets — 40 tok/s is "good" for chat/coding, 25 for reasoning, 200 for embedding. `_fit_score` plateaus at 100 between 50% and 80% VRAM utilization and drops sharply above 90% (the "marginal" zone). `_context_score` rewards hitting the use case's context target (4096 chat, 8192 coding/reasoning, 512 embedding).
Sources: [services/hwfit/fit.py:9-49](services/hwfit/fit.py), [services/hwfit/fit.py:62-160](services/hwfit/fit.py), [services/hwfit/fit.py:212-356](services/hwfit/fit.py)
## Speed estimation: bandwidth, not benchmarks
The single most useful number in the response is `speed_tps`, and it comes from physics, not measurements. For a recognized GPU, `_estimate_speed` does:
```
model_gb = active_params_b * bytes_per_param
raw_tps = (gpu_bandwidth / model_gb) * 0.55
* (1.0 dense | 0.8 MoE | 0.5 cpu_offload)
```
The 0.55 efficiency factor is the realized fraction of peak memory bandwidth a typical transformer decode loop achieves. MoE gets 0.8× because routing overhead eats into the bandwidth win. CPU offload halves it because PCIe is the new bottleneck. If the GPU isn't in the lookup, the code falls back to a backend-keyed constant `k / pb * speed_mult`, with `k = 220` for CUDA, 180 for ROCm, 90 for ARM CPU, 70 for x86 CPU ([services/hwfit/fit.py:62-88](services/hwfit/fit.py)). The estimates are deliberately rough — the goal is to separate "60 tok/s, fine" from "3 tok/s, painful" at a glance.
## The image-model side path
`image_models.py` is a separate hard-coded registry of 15 diffusion models (FLUX, SDXL, SD 3.5, Qwen-Image, HunyuanImage, Tongyi Z-Image), each with `vram_bf16` / `vram_fp8` / `vram_q4` rows and a `quant_repos` map pointing to community FP8/Q4 weight repos. Ranking is dramatically simpler: try BF16, then FP8, then Q4, accept the first that fits under 90 % of GPU VRAM, and label the headroom as `perfect`/`good`/`tight`/`no_fit`. There is no per-quant memory formula; the VRAM numbers are precomputed because diffusion models have hand-tuned offload strategies that don't follow `params × bytes_per_param` ([services/hwfit/image_models.py:6-374](services/hwfit/image_models.py)). The `/api/hwfit/image-models` route also forces single-GPU VRAM because diffusion pipelines don't tensor-parallel ([routes/hwfit_routes.py:177-202](routes/hwfit_routes.py)).
## The "what if" simulator
`_apply_manual_hardware()` in `routes/hwfit_routes.py` is a deliberate redesign of the original additive behavior. The previous version added a fake "1× 400 GB" GPU to a detected `2× 70 GB` setup and then averaged: per-GPU cap went from 70 to 180 GB (= 540/3), so GGUF models larger than that still didn't surface — the "cap stuck at detected level" bug. The current code **replaces** the GPU configuration entirely, building a single homogeneous pool with the entered `vram_each` as the literal per-GPU cap. RAM-mode wipes GPUs and reroutes everything through `cpu_x86`. Two more switches — `ignore_detected_gpu` and `ignore_detected_ram` — let the UI strip the live box's contribution without entering manual values, which is what powers the "Suggest models for an RTX 5090 I don't own yet" workflow ([routes/hwfit_routes.py:9-83](routes/hwfit_routes.py), [routes/hwfit_routes.py:113-124](routes/hwfit_routes.py)).
A quieter detail: `rank_models()` sorts twice. First by composite score to pick the *visible set* of N, then by whatever the user clicked (params, vram, context). If it sorted once by the user's column, sorting by `params` would truncate to the biggest models that don't even fit, while sorting by `vram` would truncate to the smallest — the score-first prefilter keeps the visible cohort stable as the user re-sorts ([services/hwfit/fit.py:453-462](services/hwfit/fit.py)).
Sources: [routes/hwfit_routes.py:9-83](routes/hwfit_routes.py), [services/hwfit/fit.py:368-463](services/hwfit/fit.py)
## From recommendation to "click to install"
`hwfit` only computes; the Cookbook glues the ranked result to action. The route layer exposes two GETs (`/api/hwfit/system` and `/api/hwfit/models`) that return JSON the front-end renders as a model list. When the user clicks Download, the request lands at `POST /api/cookbook/api/model/download` in `routes/cookbook_routes.py`, which writes a generated bash (or PowerShell, for Windows remotes) runner script, `scp`s it to the target host if remote, and launches it inside a fresh `tmux` session named `cookbook-<hex>`. The runner installs `huggingface_hub` if missing, opportunistically pulls in `hf_transfer` for the parallel Rust downloader, and falls back to plain `snapshot_download(...)` when either binary or the Rust path is unavailable ([routes/cookbook_routes.py:307-535](routes/cookbook_routes.py)). The "Serve" button takes the same shape but writes a different runner — vLLM, llama.cpp, or Ollama depending on the engine — wired to the GPU subset the user picked ([routes/cookbook_routes.py:731-833](routes/cookbook_routes.py)).
```text
┌──────────────┐ GET /api/hwfit/system ┌────────────────────┐
│ Cookbook │ ───────────────────────► │ hwfit.hardware │
│ UI │ │ (probe + cache) │
│ │ GET /api/hwfit/models └────────────────────┘
│ │ ───────────────────────► ┌────────────────────┐
│ │ ◄── ranked JSON ────── │ hwfit.fit │
│ │ │ (rank_models) │
│ │ └────────────────────┘
│ click Down… │ ┌────────────────────┐
│ │ POST /api/cookbook/... │ cookbook_routes │
│ │ ───────────────────────► │ tmux + ssh + scp │
└──────────────┘ │ hf download / vllm│
└────────────────────┘
```
The separation is intentional: `services/hwfit/` is a pure-Python library with no FastAPI, no shell, no SSH, no auth — just hardware introspection and arithmetic. Everything stateful (sessions, tokens, tmux logs, admin-gating, validation) lives in the route layer.
## What builders should notice
A few things make this design portable to other projects:
- **The catalog is data, not code.** `hf_models.json` is a flat list of JSON records, the only logic baked into it is the field schema. `scripts/add_hwfit_models.py` exists for bulk-adding from HuggingFace search results, which means the recommender can grow without rebuilding the package.
- **Provider-neutral by construction.** Nothing in `services/hwfit/` calls an Anthropic, OpenAI, or HuggingFace inference API. It targets weights at rest (`gguf_sources`, `quant_repos`) and local engines (vLLM, llama.cpp, Ollama). Swap the catalog file or feed it a different `system` dict and it ranks against a different box, including hypothetical ones.
- **One sane fallback per branch.** Detection has three nvidia-smi paths, AMD has VRAM-vs-vis-VRAM-vs-GTT, Windows has nvidia-smi-then-WMI, downloads have hf-CLI-then-Python-then-pip-install. The pattern is consistent: try the fast/correct thing, then the slow/correct thing, then the "best we can do."
- **The "too tight" badge.** Surfacing what *doesn't* fit, with the exact GB it would need, is what makes the simulator usable. Without it, the recommender can only ever shrink the user's mental model of what's possible.
A wiki page can only point at the code; the math is in the source. The interesting takeaway is that "click to install a local LLM" is mostly a sizing problem, and sizing is a 200-line problem if you treat quantization, KV cache, and tensor-parallel pool homogeneity as first-class concepts instead of edge cases.
---
## 07. Deep Research: A Multi-Step Synthesis Engine in 800 Lines
> The deep_research module adapts Alibaba's Tongyi DeepResearch pattern into a self-contained loop that plans, searches, reads, and produces a visual report — and how SearXNG, the search service, and the visual_report renderer fit together.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/07-deep-research-a-multi-step-synthesis-engine-in-800-lines.md
- Generated: 2026-05-31T19:50:35.053Z
### Source Files
- `src/deep_research.py`
- `src/research_handler.py`
- `src/research_utils.py`
- `src/visual_report.py`
- `services/research/service.py`
- `services/search/core.py`
- `routes/research_routes.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/deep_research.py](src/deep_research.py)
- [src/research_handler.py](src/research_handler.py)
- [src/research_utils.py](src/research_utils.py)
- [src/visual_report.py](src/visual_report.py)
- [services/research/service.py](services/research/service.py)
- [services/search/core.py](services/search/core.py)
- [routes/research_routes.py](routes/research_routes.py)
- [src/goal_based_extractor.py](src/goal_based_extractor.py)
- [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md)
</details>
# Deep Research: A Multi-Step Synthesis Engine in 800 Lines
Odysseus ships a "Deep Research" mode that, on paper, sounds heavy — plan a topic, loop through web searches, read pages, decide when it's done, and hand back a magazine-style report with a hero image. In practice, the entire loop lives in a single ~820-line file (`src/deep_research.py`) backed by a handler, a search core, and a self-contained HTML renderer. The pattern is adapted from Alibaba-NLP's [Tongyi DeepResearch](https://github.com/Alibaba-NLP/DeepResearch) under Apache-2.0 (`ACKNOWLEDGMENTS.md:33-38`) and rewritten so the LLM — not hard-coded heuristics — drives every step.
Why this matters: most "deep research" agents either bolt onto a hosted API or carry a heavy framework. Odysseus's version is bring-your-own-LLM and bring-your-own-search: the same loop runs against a local Ollama model and SearXNG, or against a hosted OpenAI-compatible endpoint with Brave/Tavily as a fallback. The interesting engineering is in the cooperation between the four moving parts — `DeepResearcher`, `ResearchHandler`, the search provider chain, and `visual_report` — and the failure modes they were each built to survive.
## The Pattern: Plan → Think → Search → Extract → Synthesize → Decide
`DeepResearcher.research()` is the whole engine. After an optional one-shot category classification, it spins a `for round_num in range(1, max_rounds + 1)` loop where every step is a prompted LLM call (`src/deep_research.py:224-338`):
1. **Plan** (`_create_plan`) — a single call turns the question into `sub_questions`, `key_topics`, and a `success_criteria` sentence (`src/deep_research.py:25-46`, `:361-386`).
2. **Think** (`_generate_queries`) — produces 4 broad queries on round 1, then 3 targeted follow-ups per round, with the round-specific instruction switched in the prompt (`src/deep_research.py:422-462`). Already-used queries are deduped via `self.queries_used`.
3. **Search** (`_search`) — runs each query against the configured provider chain (see below) and only adds a provider to `self.providers_used` if it actually returned results, so the visual report can later credit the engines that carried the work (`src/deep_research.py:209-213`, `:511-544`).
4. **Extract** (`_fetch_and_extract`) — fetches each new URL with `src.search.fetch_webpage_content`, truncates to `max_content_chars` at a paragraph boundary, and asks the LLM to return JSON `{summary, evidence, rational}` using `EXTRACTOR_PROMPT` from `goal_based_extractor.py` (`src/deep_research.py:546-602`).
5. **Synthesize** (`_synthesize`) — folds the last `synthesis_window` (default 10) findings into the evolving report in a single prompt that says "remove redundancy, resolve contradictions, keep URLs as inline citations" (`src/deep_research.py:68-85`, `:607-632`).
6. **Decide** (`_should_stop`) — once `round_num >= min_rounds`, a tiny YES/NO prompt asks the LLM if the report is comprehensive enough. The parser tolerates `**YES**`, leading quotes, and thinking-block prefixes (`src/deep_research.py:87-106`, `:637-663`).
When the loop exits, `_final_report` rewrites the evolving notes into a long-form article using `FINAL_REPORT_PROMPT`. If the result is under 400 words, a follow-up message asks the model to expand it; the longer version is kept only if it actually grew (`src/deep_research.py:108-127`, `:668-714`).
```text
┌─────────── DeepResearcher.research() ──────────┐
│ plan ─▶ ┌──── for each round ────┐ ─▶ final │
│ │ think ─▶ search ─▶ │ │
│ │ extract ─▶ synth ─▶ │ │
│ │ decide (YES → break) │ │
│ └────────────────────────┘ │
└────────────────────────────────────────────────┘
```
## Category-Aware Output: Five Prompts, One Engine
A second LLM call (`_classify_category`) buckets the question into `product`, `comparison`, `howto`, `factcheck`, or `general`, and the matching `CATEGORY_PROMPTS` snippet is appended to the final-report prompt as an `IMPORTANT FORMAT OVERRIDE` block (`src/deep_research.py:129-160`, `:674-676`). The same evidence ends up structured very differently:
| Category | Output shape forced by the override prompt |
|-------------|---------------------------------------------------------------------|
| `product` | Ranked list with quick-compare table and **Verdict** section |
| `comparison`| Criteria-by-option markdown table + per-option strengths |
| `howto` | Quick Guide, Prerequisites, numbered detailed steps, Common Mistakes|
| `factcheck` | The Claim, Evidence For/Against, Verdict (Supported/Mixed/Unsupported)|
The classifier is defensive: weak local models often answer "the category is product" instead of `product`, so it falls back to a substring scan before defaulting to `None` (`src/deep_research.py:404-414`).
## SearXNG, With a Fallback Chain
The research loop never hard-codes a search engine. `_search` reads `research_search_provider` (or the global `search_provider`, default `searxng`) and then calls `_build_provider_chain` to assemble the order to try (`src/deep_research.py:511-544`, `services/search/core.py:91-105`).
```text
primary (searxng | brave | tavily | …) ─▶ user fallbacks ─▶ default [duckduckgo]
```
`_call_provider` dispatches to one of six provider modules — `searxng`, `brave`, `duckduckgo`, `google_pse`, `tavily`, `serper` (`services/search/core.py:71-85`). For non-research callers, `searxng_search_results` adds query-keyed disk caching with per-query expiry, a 2-attempt retry per provider, and post-call re-ranking (`services/search/core.py:111-189`). The research engine itself wraps the synchronous provider call in `asyncio.to_thread` and fans queries out in parallel (`src/deep_research.py:467-509`).
A subtle failure mode is handled: if two consecutive rounds return zero new findings (`max_empty_rounds`), the loop emits a `phase="error"` event with the captured `_last_search_error` and either returns a "Search unavailable" message (no findings at all) or stops gracefully and synthesizes what it has (`src/deep_research.py:295-308`).
## The Handler: Background Tasks, Hard Timeouts, Resumable Reports
`ResearchHandler` is the operational shell around `DeepResearcher`. Three things make it the part that survives production rather than just running once (`src/research_handler.py:25-270`):
- **Probe-before-commit.** Every research call first sends a one-token "hi" to the model. A 401 becomes "Model X requires an API key"; a connection error becomes "Cannot reach model X" — both raised before any expensive round starts (`src/research_handler.py:552-578`).
- **Hard wall-clock timeout.** The whole run sits inside `asyncio.wait_for(..., timeout=hard_timeout)`. If it fires, the handler checks the researcher's `evolving_report` and saves whatever was already synthesized, rather than discarding the run (`src/research_handler.py:208-259`).
- **Resumable + ownership-stamped persistence.** Each completed run is written to `data/deep_research/<session_id>.json` with `query`, `result`, `raw_report`, `sources`, `raw_findings`, `stats`, `category`, and `owner` (`src/research_handler.py:445-480`). The "spinoff" endpoint can then call `researcher.research(prior_report=..., prior_findings=..., prior_urls=...)` to continue a previous report instead of starting over (`src/deep_research.py:224-260`).
Two small but telling defenses live in `research_utils.py`: `strip_thinking` removes `<think>` blocks from reasoning models (without this, `_should_stop` would see `<THINK>...` and never stop), and `is_low_quality` filters extracted summaries containing markers like `"insufficient to"`, `"cookie"`, `"boilerplate"`, so the synthesis prompt isn't fed junk (`src/research_utils.py:12-56`, `src/deep_research.py:584-589`).
## The HTTP Surface: Start, Stream, Hide-an-Image
`routes/research_routes.py` exposes the engine over a small REST + Server-Sent-Events surface (`routes/research_routes.py:48-433`):
| Endpoint | Purpose |
|-------------------------------------------|---------------------------------------------------------|
| `POST /api/research/start` | Allocate `rp-<uuid>`, resolve endpoint, kick off task |
| `GET /api/research/stream/{sid}` | SSE feed of `progress` events at ~1.5s tick |
| `GET /api/research/status/{sid}` | One-shot status |
| `POST /api/research/cancel/{sid}` | Cooperative cancel via `researcher.cancel()` |
| `POST /api/research/result/{sid}` | Final markdown |
| `GET /api/research/report/{sid}` | Visual HTML report |
| `POST /api/research/{sid}/hide-image` | Persist a hidden image URL in the JSON |
| `POST /api/research/spinoff/{sid}` | Continue research from a prior report |
Endpoint resolution prefers a dedicated `research` endpoint, then falls back through `utility`, `default`, `chat`, and finally the first enabled `ModelEndpoint`. A small but important filter — `_first_chat_model` — skips known non-chat model names (`text-embedding`, `whisper`, `dall-e`, etc.) so research doesn't accidentally try to "complete" with `text-embedding-ada-002`, which was the actual bug the comment documents (`routes/research_routes.py:21-34`, `:351-388`).
Ownership is enforced at every read: `_owns_in_memory` checks the in-memory task's `owner` first, then falls back to the on-disk JSON's `owner` field for finished runs (`routes/research_routes.py:61-74`). Without this, refresh-after-completion would leak other users' reports.
A second consumer of the engine — `services/research/service.py` — exposes the same handler as a clean `async def research(topic, ...) -> ResearchResult` API, with a dataclass-shaped result (`services/research/service.py:30-117`). It's a thin façade over `ResearchHandler.call_research_service`, useful when the engine is embedded by other Python code rather than driven over HTTP.
## The Visual Report: One Big String of HTML
`src/visual_report.py` is bigger than the engine itself (~1,833 lines) but it's almost entirely a self-contained HTML template — no remote fonts, no external CSS, dark/light via `prefers-color-scheme`. The Python is the templating layer:
- `_md_to_html` autolinks bare URLs, runs `markdown` with `extra`/`codehilite`/`toc`/`tables`/`sane_lists`, and rewrites every external `<a href="https://...">` to add `target="_blank" rel="noopener noreferrer"` (`src/visual_report.py:33-63`).
- `_extract_headings` builds a TOC from `##`/`###` and, if the model emitted bold lines instead of headings, falls back to promoting `**Lead-in:**` lines to `h2` (`src/visual_report.py:66-89`, `:1642-1649`).
- `_inject_images` walks Open-Graph images collected from each source, picks index 0 as a hero, and inserts the rest as `<figure>` elements after every second `</h2>` boundary — with a guarded blocklist for icons/favicons/logos (`src/visual_report.py:1661-1707`).
- Every injected `<figure>` carries a reroll/hide button overlay (`_IMG_OVERLAY_BTNS`). The spare image pool is embedded in the page so the reroll button can swap an irrelevant image client-side without another server round-trip (`src/visual_report.py:95-138`, `:1703-1708`).
- Hidden images are persisted: `POST /api/research/{sid}/hide-image` appends to `hidden_images` in the JSON, and the next render filters them out (`src/research_handler.py:519-535`, `routes/research_routes.py:158-178`).
The stats bar at the top — Duration, Rounds, Queries, URLs Analyzed, Model, Search — is built directly from `DeepResearcher.get_stats()`, which is why the only providers shown are the ones that actually returned hits during the run (`src/deep_research.py:806-820`, `src/visual_report.py:1720-1727`).
## The Architecture, in One Picture
```mermaid
flowchart TB
subgraph UI["Browser"]
panel["Research panel"]
sse["SSE: /api/research/stream"]
report["Visual HTML report"]
end
subgraph API["FastAPI routes (routes/research_routes.py)"]
start["POST /api/research/start"]
stream["GET /api/research/stream/{sid}"]
htmlR["GET /api/research/report/{sid}"]
hide["POST /api/research/{sid}/hide-image"]
end
subgraph Handler["src/research_handler.py"]
rh["ResearchHandler<br/>_active_tasks, probe, hard timeout"]
disk[("data/deep_research/<br/>{sid}.json")]
end
subgraph Engine["src/deep_research.py"]
dr["DeepResearcher<br/>plan → loop → final"]
prompts["RESEARCH_PLAN_PROMPT<br/>QUERY_GEN_PROMPT<br/>SYNTHESIZE_PROMPT<br/>STOP_PROMPT<br/>FINAL_REPORT_PROMPT<br/>CATEGORY_PROMPTS"]
end
subgraph Search["services/search/core.py"]
chain["_build_provider_chain"]
providers["searxng | brave | duckduckgo<br/>google_pse | tavily | serper"]
end
subgraph Render["src/visual_report.py"]
md2html["_md_to_html"]
inject["_inject_images + TOC"]
tmpl["_TEMPLATE (HTML+CSS+JS)"]
end
llm[("BYO LLM<br/>OpenAI-compatible")]
panel --> start --> rh --> dr
dr -->|prompts| llm
dr --> chain --> providers
rh --> disk
panel --> sse --> stream --> rh
panel --> htmlR --> rh --> Render --> report
report -->|reroll/hide| hide --> rh --> disk
```
Sources: [src/deep_research.py:163-338](), [src/research_handler.py:25-270](), [services/search/core.py:71-105](), [routes/research_routes.py:48-433](), [src/visual_report.py:1621-1727]()
## What Builders Should Notice
A few details are worth pulling out for anyone building something similar:
- **The LLM owns every decision.** Query generation, extraction, synthesis, stop, and final formatting are all prompts — there's no rules engine or scoring heuristic deciding when to stop. The price is a `min_rounds`/`max_rounds` safety frame plus the empty-rounds counter; the reward is that a smarter local model produces a smarter loop without code changes.
- **Defensive JSON parsing.** `_parse_json_array` first tries `json.loads`, then a greedy `[…]` regex, then — if the array was truncated mid-string — recovers complete quoted items with a regex sweep so an aborted response still yields usable queries (`src/deep_research.py:741-773`).
- **Probe before you spend.** The 5-token probe in `_probe_endpoint` is cheap insurance against burning 5 minutes on a misconfigured endpoint, and the wrapped error messages turn raw HTTP failures into actionable settings advice (`src/research_handler.py:552-578`).
- **Provider neutrality is enforced by chains, not branches.** The same `_build_provider_chain` is used by deep research and ad-hoc search; swapping SearXNG for Brave or chaining `["brave","duckduckgo"]` is a setting, not a code path (`services/search/core.py:91-105`).
- **The HTML report is a single string.** Because `_TEMPLATE` is inlined CSS+JS with no remote assets, the rendered page is portable — saveable, shareable, and printable without losing styling. The reroll button works because the spare image pool is embedded in the page; the hide button works because the server persists the dismissal back into the run's JSON.
For an ~800-line implementation, this is a lot of moving parts wired with restraint: a single loop file, a single handler file, a single search-core file, and a single template file — each one owning one boundary, and each one written to fail gracefully when the LLM, the search engine, or the page fetch misbehaves.
Sources: [src/research_handler.py:208-259](), [src/research_handler.py:552-578](), [src/deep_research.py:741-773](), [services/search/core.py:91-189](), [src/visual_report.py:145-200]()
---
## 08. Memory & Skills: ChromaDB + Skill Extraction That Evolves
> How the memory subsystem combines ChromaDB vector storage, fastembed ONNX embeddings, keyword fallback, and a skill_extractor that distills recurring patterns into reusable skills — the mechanism behind the "your agent gets better over time" claim.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/08-memory-skills-chromadb-skill-extraction-that-evolves.md
- Generated: 2026-05-31T19:50:52.509Z
### Source Files
- `services/memory/memory.py`
- `services/memory/memory_vector.py`
- `services/memory/skill_extractor.py`
- `services/memory/skills.py`
- `src/memory_vector.py`
- `src/chroma_client.py`
- `src/embeddings.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [services/memory/memory.py](services/memory/memory.py)
- [services/memory/memory_vector.py](services/memory/memory_vector.py)
- [services/memory/skill_extractor.py](services/memory/skill_extractor.py)
- [services/memory/skills.py](services/memory/skills.py)
- [services/memory/skill_format.py](services/memory/skill_format.py)
- [services/memory/memory_extractor.py](services/memory/memory_extractor.py)
- [src/chroma_client.py](src/chroma_client.py)
- [src/embeddings.py](src/embeddings.py)
- [src/memory_vector.py](src/memory_vector.py)
</details>
# Memory & Skills: ChromaDB + Skill Extraction That Evolves
Most "agent memory" implementations are a vector store and a system prompt. Odysseus splits the problem into two: **memory** — the agent's working set of facts about you — and **skills** — short, replayable procedures the agent has *learned to do* from prior sessions. Both are stored locally, both index into the same ChromaDB collection family, and both degrade cleanly when the embedding backend disappears.
This page walks the mechanism: how a memory entry ends up in a Chroma `odysseus_memories` collection, how the embedding pipeline falls back from a remote HTTP server to a local fastembed ONNX model, how the keyword-only path keeps working when neither is available, and how `skill_extractor.py` distills "the agent took 2+ rounds and 2+ tool calls" into a SKILL.md file the next session can find.
## The Two-Track Architecture
Memory and skills are kept structurally separate on purpose: memories are facts (free text, owner-stamped, optionally vectorised), skills are typed procedures (frontmatter + structured markdown) stored as files on disk.
```mermaid
flowchart LR
subgraph Agent["Agent runtime"]
AL["agent_loop.py<br/>(rounds, tool calls)"]
SP["system prompt"]
end
subgraph Memory["Memory subsystem"]
MM["MemoryManager<br/>(memory.json)"]
MV["MemoryVectorStore<br/>collection: odysseus_memories"]
MX["memory_extractor.py<br/>(periodic LLM tidy)"]
end
subgraph Skills["Skills subsystem"]
SM["SkillsManager<br/>data/skills/<cat>/<name>/SKILL.md"]
SX["skill_extractor.py<br/>(post-run distillation)"]
SU["_usage.json sidecar"]
end
subgraph Embed["Embedding + storage"]
EC["get_embedding_client()<br/>HTTP → FastEmbed fallback"]
CC["get_chroma_client()<br/>chromadb.HttpClient"]
end
AL -->|"finish_round()"| SX
AL --> MM
SP <-->|"index_for()"| SM
SP <-->|"get_relevant_memories()"| MM
MM --> MV
MV --> EC
MV --> CC
MX --> MM
SX --> SM
SM --> SU
```
Sources: [services/memory/memory.py:35-360](), [services/memory/memory_vector.py:15-176](), [services/memory/skills.py:62-271](), [services/memory/skill_extractor.py:51-209]()
## ChromaDB: the Vector Store Layer
The vector store is intentionally thin. `MemoryVectorStore` opens a single Chroma collection called `odysseus_memories` configured with `hnsw:space=cosine` and stores **pre-computed** embeddings — Chroma does no embedding of its own. The same applies to RAG; only the collection name differs.
```python
# services/memory/memory_vector.py
self._collection = client.get_or_create_collection(
name=self.COLLECTION_NAME,
metadata={"hnsw:space": "cosine"},
)
```
Three operational details worth noting:
| Behaviour | Detail | Source |
|---|---|---|
| Health latch | `_healthy=False` on init failure; every op is a no-op when unhealthy, so a missing Chroma daemon doesn't crash agent runs. | [services/memory/memory_vector.py:23-49]() |
| Dedup on add | Before `add()`, the store does a `get(ids=[memory_id])` and skips if present — same memory id is idempotent. | [services/memory/memory_vector.py:65-79]() |
| Similarity convert | Chroma returns cosine *distance*. The code returns `round(1.0 - distance, 4)` so the rest of the app reasons in similarity space. | [services/memory/memory_vector.py:107-114]() |
| Rebuild path | `rebuild()` deletes and recreates the collection, then re-embeds in batches of 100 — used after bulk imports or schema migrations. | [services/memory/memory_vector.py:134-175]() |
| Soft-delete duplicates | `find_similar(text, threshold=0.92)` is the gate before adding new auto-extracted memories so the index doesn't bloat with near-duplicates. | [services/memory/memory_vector.py:116-132]() |
The Chroma client itself is a singleton over an HTTP transport — no in-process Chroma, no DuckDB — pointed at `CHROMADB_HOST:CHROMADB_PORT` (defaults `localhost:8100`). Importing `chromadb` is optional; if it isn't installed the factory raises a `RuntimeError` with the install hint rather than failing silently.
```python
# src/chroma_client.py
_client = chromadb.HttpClient(host=host, port=port)
_client.heartbeat() # immediate health check at startup
```
Sources: [src/chroma_client.py:16-48](), [services/memory/memory_vector.py:15-115]()
## fastembed: the Zero-Config Fallback
The embedding layer is where the "BYOC/BYOK friendly" claim shows up. `get_embedding_client()` tries an OpenAI-compatible HTTP endpoint first (Ollama, vLLM, llama.cpp, anything that speaks `POST /v1/embeddings`), and if that fails it loads a local fastembed ONNX model (default `sentence-transformers/all-MiniLM-L6-v2`, ~50 MB) cached under `data/fastembed_cache/`.
The fallback isn't reattempted on every call. Once an HTTP probe fails, a process-level latch `_http_embed_down = True` is tripped so subsequent RAG / memory / tool calls don't pay the ~3 s connect timeout every time:
```python
# src/embeddings.py
if not _http_embed_down:
try:
client = EmbeddingClient()
client.get_sentence_embedding_dimension() # health check
return client
except Exception:
_http_embed_down = True
# ...fall through to FastEmbedClient
```
`reset_http_embed_state()` is the explicit way to re-probe after the admin panel saves a new endpoint — without it, an Ollama instance that came up *after* startup would never be picked up.
Both clients expose the same minimal surface (`encode(texts, normalize_embeddings=True)`, `get_sentence_embedding_dimension()`), so `MemoryVectorStore` doesn't know which backend it's using. The HTTP path batches in 64s; fastembed embeds inline; both L2-normalise so the cosine collection's distances are well-behaved.
Sources: [src/embeddings.py:27-87](), [src/embeddings.py:90-143](), [src/embeddings.py:163-213]()
## Keyword Fallback: Memory Without Vectors
When Chroma is unreachable *and* fastembed isn't installed, the memory subsystem still works — it just falls back to a keyword scorer inside `MemoryManager`. `get_relevant_memories()` is essentially a hand-built BM25 lite: tokenize, Jaccard, then category-aware boosts.
The query is first classified into one of `identity / contact / preference / task / fact` by keyword presence, and matching memories get a multiplicative boost:
```python
# services/memory/memory.py
if query_type == "contact":
has_contact_info = any(word in memory_text for word in [
"@gmail.com", "@", ".com", "phone", "number",
"address", "http", "www", "tel:"])
if has_contact_info:
final_score *= 1.4 # 40% boost
```
The identity case is special: for an identity query, every memory that looks like a name (capitalised pair, `"i'm"`, `"my name"`, `"call me"`) is pre-seeded at score 0.9 *regardless* of token overlap. This is the bit that keeps "what's my name?" working even when the question shares no tokens with the stored fact.
This same module also recognises inline `remember: X` commands via a single regex (`remember | memorize | save | note | store`) before any LLM is involved — the cheapest possible path for explicit user saves.
Sources: [services/memory/memory.py:81-99](), [services/memory/memory.py:263-359]()
## Skill Extraction: How "It Gets Better"
The skill extractor is the piece behind the README's claim that the agent learns over time. It runs **after** an agent loop finishes, and the threshold is intentionally low:
```python
# services/memory/skill_extractor.py
if round_count < 2 and tool_count < 2:
return None # nothing complex enough to distill
```
A single-round, single-tool answer is treated as not worth turning into a skill. Two or more of either, and the extractor takes the last 12 messages of the session, hands them to the LLM with a tightly worded system prompt that is more about *refusal* than instruction, and asks for a JSON skill object — or the bare word `null`.
The prompt has explicit anti-patterns the model is told to reject:
- The real work happened **outside the computer** (user did it physically, in person, on another device).
- A one-off, personal, context-specific task that won't recur.
- A pure Q&A or explanation with no transferable method.
- The agent failed or gave up.
This conservative shape is the point. The library bloats fast if every agent run produces a skill; the prompt is engineered to bias toward `null`.
### Post-processing the LLM Response
Real models don't always cooperate. The extractor handles several known failure modes:
| Failure mode | Mitigation | Source |
|---|---|---|
| `null` or empty | Bail silently — debug log only | [services/memory/skill_extractor.py:120-125]() |
| `<think>...</think>` (R1-class models) | `strip_think(prose=True, prompt_echo=True)` removes reasoning preamble | [services/memory/skill_extractor.py:127-137]() |
| Markdown code fences around JSON | Strip first ` ``` ` line and trailing fence | [services/memory/skill_extractor.py:140-142]() |
| JSON embedded in surrounding prose | Slice from first `{` to last `}` | [services/memory/skill_extractor.py:143-149]() |
| Low-confidence "maybe" skills | Drop anything below `MIN_CONFIDENCE = 0.6` | [services/memory/skill_extractor.py:44-45](), [services/memory/skill_extractor.py:163-172]() |
| Duplicate titles | Case-insensitive title match against existing skills, drop | [services/memory/skill_extractor.py:174-179]() |
Successful extractions fire a `skill_added` event on the bus so the UI can show a toast without polling.
Sources: [services/memory/skill_extractor.py:15-41](), [services/memory/skill_extractor.py:51-209]()
## Skill Storage: SKILL.md on Disk, Not in a DB
Where memories live in `memory.json` and a vector index, skills live as actual files: `data/skills/<category>/<name>/SKILL.md`, each with YAML frontmatter and a structured body (`When to Use / Procedure / Pitfalls / Verification`). The format is deliberately inspired by Hermes' skills format — human-editable, git-friendly, and round-trippable.
Hot, churn-prone data is kept *out* of the file. Usage counters and audit verdicts live in a sidecar `data/skills/_usage.json` keyed by skill name:
> "Usage counters (`uses`, `last_used`) live in a sidecar so the SKILL.md content doesn't churn on every retrieval." — [services/memory/skill_format.py:43-44]()
`SkillsManager.add_skill()` does free dedup on every LLM-authored save: it tokenises name + description + when-to-use + procedure, runs Jaccard against every existing skill, and at `>= 0.82` overlap it bumps the existing skill's `uses` counter and returns it with `_deduped=True` instead of writing a new file. User-authored skills bypass the gate ("a human asked for it").
```python
# services/memory/skills.py
if _jaccard(cand, ex) >= 0.82:
self.record_use(s["name"])
return {**s, "_deduped": True, "_duplicate_of": s.get("name")}
```
### What the Agent Actually Sees
`index_for(owner, active_toolsets, platform)` is the function that produces the lightweight `[{name, description, category, status}]` list injected into the system prompt. The filtering rules are worth flagging because they aren't obvious:
- **Published** skills always included.
- **Drafts** are excluded — *except* drafts written by the teacher-escalation loop (`source == "teacher-escalation"`). The teacher loop's whole job is for the student to find a new procedure on the very next turn; gating it behind a manual publish click would defeat the loop.
- **`requires_toolsets`** hides a skill unless every required toolset is active.
- **`fallback_for_toolsets`** does the opposite — hides the skill *when* a named toolset is active, so a "manual scp" fallback disappears when the real SSH tool is loaded.
- **Platform gate** — `platforms: [linux, macos]` excludes the skill on Windows.
Sources: [services/memory/skills.py:276-365](), [services/memory/skills.py:494-545](), [services/memory/skill_format.py:7-44]()
## Audit & Tidy: The Self-Cleaning Loop
`memory_extractor.py` carries a complementary mechanism for memories. After each LLM turn, recent messages are sent to the model with an extraction prompt; the audit pass periodically rewrites vague entries, consolidates duplicates, and removes junk.
The audit is expensive (30–120 s LLM call) so it's gated by a fingerprint:
```python
# services/memory/memory_extractor.py
items = sorted(
(str(e.get("id", "")), e.get("text", ""), e.get("category", ""))
for e in entries
)
# sha256 over id+text+category — any add/edit/delete invalidates it
```
The fingerprint per owner is persisted in `memory_tidy_state.json` next to `memory.json`. If the current fingerprint matches the last successful audit, the LLM call is skipped — "running the LLM again on an already-clean list was wasting 30-120s per call and occasionally timing out on the second pass."
Sources: [services/memory/memory_extractor.py:1-58]()
## Tradeoffs and Surprising Details
A few things in this design that a README-only reader would miss:
- **Two parallel implementations.** `services/memory/memory_vector.py` and `src/memory_vector.py` are byte-identical at the time of writing. The `services/` tree is the newer modular layout; `src/` is the legacy package still imported by older callers. The same is true of `memory.py` in both trees.
- **Ownership is enforced by strict equality.** `SkillsManager.load(owner=…)` filters with `s.get("owner") == owner`. An earlier predicate also let skills with *no* owner through, which leaked legacy skills to every authenticated user — the inline comment calls this out as a fixed security regression ([services/memory/skills.py:265-270]()).
- **Skills can be `requires_toolsets` or `fallback_for_toolsets`.** This is a small but real mechanism for toolset-aware procedure routing — a skill describing "how to send mail by raw SMTP" can be marked as fallback for the Gmail toolset and only surface when Gmail isn't loaded.
- **No SentenceTransformer dependency.** `EmbeddingClient.encode` is "drop-in" for SentenceTransformer's interface but never imports it. fastembed is the only true local dependency and it ships ONNX, not PyTorch — keeping the install graph light.
- **Confidence floors at two levels.** The extractor drops anything below 0.6 ([services/memory/skill_extractor.py:44-45]()); the retrieval function multiplies score by `1.0 + confidence * 0.1` so higher-confidence skills win ties ([services/memory/skills.py:604]()).
- **The "skill index" the agent reads is sorted by `(category, name)`.** Skill ordering in the prompt is deterministic, which matters for prompt caching — the same skill set produces the same prefix bytes turn after turn.
## What Builders Should Notice
The pattern here is portable and provider-neutral by design. Three composable layers — Chroma + a pluggable embedding client + a keyword fallback — give the system three failure-graceful modes (full vector, vector with local ONNX, no vector at all). The skill extractor on top is essentially a *prompt-engineered classifier* whose first job is to say "this isn't worth saving" — the conservatism is what keeps the library small enough to inject into every system prompt.
The split between dynamic memory (JSON + Chroma) and procedural memory (markdown files on disk) is the load-bearing decision. Memories are short, owner-scoped, and easy to re-embed; skills are versioned, hand-editable, and intentionally durable. The fact that one of them is a vector store and the other is a directory of files is exactly the right mismatch.
Sources: [services/memory/memory_vector.py:90-114](), [services/memory/skills.py:494-545](), [services/memory/skill_extractor.py:15-45]()
---
## 09. Email, Calendar, Notes: The Personal-Productivity Side
> IMAP/SMTP triage with AI auto-tagging, CalDAV sync to Radicale / Nextcloud / Apple / Fastmail, ntfy-channel reminders, and cron-style tasks the agent can act on — features that lean on background pollers and per-account routing.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/09-email-calendar-notes-the-personal-productivity-side.md
- Generated: 2026-05-31T19:54:15.856Z
### Source Files
- `routes/email_routes.py`
- `routes/email_pollers.py`
- `routes/email_helpers.py`
- `routes/calendar_routes.py`
- `routes/task_routes.py`
- `routes/note_routes.py`
- `src/caldav_sync.py`
- `src/task_scheduler.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [routes/email_routes.py](routes/email_routes.py)
- [routes/email_pollers.py](routes/email_pollers.py)
- [routes/email_helpers.py](routes/email_helpers.py)
- [routes/calendar_routes.py](routes/calendar_routes.py)
- [routes/task_routes.py](routes/task_routes.py)
- [routes/note_routes.py](routes/note_routes.py)
- [src/caldav_sync.py](src/caldav_sync.py)
- [src/task_scheduler.py](src/task_scheduler.py)
- [src/builtin_actions.py](src/builtin_actions.py)
</details>
# Email, Calendar, Notes: The Personal-Productivity Side
Odysseus is a chat-first agent, but a surprising amount of its surface area is a personal-productivity OS welded to the same LLM stack. It talks IMAP/SMTP directly against multiple accounts, pulls CalDAV from Radicale / Nextcloud / Apple / Fastmail into local SQLite, fires reminders through ntfy or back through the user's own SMTP, and runs cron-style "built-in actions" the agent itself can compose with. None of it depends on a hosted service — the credentials live in the user's encrypted prefs, every poller is just an `asyncio.create_task` loop, and the LLM is whichever endpoint the user picked.
The interesting design choice is that each surface (mail, calendar, notes, tasks) is *both* a feature you can use directly in the UI and a tool the agent can call. The same `dispatch_reminder` that the note scanner fires is the same path a chat-tool reminder takes. The same `summarize_emails` action you can schedule via cron is what the AI auto-reply pass calls. That collapsing — UI feature == agent action == cron job — is what makes the productivity side feel coherent rather than four separate apps stapled together.
## What ships in the box
| Surface | Where it lives | Backed by | How the agent reaches it |
|---|---|---|---|
| IMAP inbox + AI triage | `routes/email_routes.py`, `routes/email_pollers.py` | `imaplib` against per-user `EmailAccount` rows | `summarize_emails`, `draft_email_replies`, `extract_email_events`, `check_email_urgency`, `mark_email_boundaries` built-in actions |
| SMTP send + scheduled send | `routes/email_routes.py` `/send`, `/schedule` | `scheduled_emails` SQLite + 30s poller | `_scheduled_poll_once` (also exposed as `odysseus-mail poll-scheduled` CLI) |
| Calendar | `routes/calendar_routes.py` | Local SQLite `CalendarEvent` + CalDAV one-way pull | `do_manage_calendar` tool, `extract_email_events` action |
| CalDAV sync | `src/caldav_sync.py` | `caldav` lib in a threadpool | Triggered on calendar open + periodic scheduler tick |
| Notes / todos / due-date reminders | `routes/note_routes.py` | `Note` rows + per-owner `data/note_pings_{slug}.json` cache | `action_ping_notes` scanner running every 60s |
| Scheduled tasks (cron / event / webhook) | `routes/task_routes.py`, `src/task_scheduler.py` | `ScheduledTask` rows | `TaskScheduler._loop` |
Sources: [src/builtin_actions.py:2138-2179](), [src/task_scheduler.py:186-198](), [routes/email_helpers.py:259-369]()
## Background pollers: three loops, all serial
Three asyncio loops run inside the FastAPI process by default, and all three are guarded by the same `ODYSSEUS_INPROCESS_POLLERS` switch so an operator running cron / systemd timers can opt out without a fork.
```text
┌─ _scheduled_email_poller (30s tick → SMTP deliver due rows)
routes/email_pollers.py │
_start_poller() ─────────────┤
└─ (legacy _auto_summarize_poller — now disabled;
scheduled Tasks own that work — see _launch())
src/task_scheduler.py
TaskScheduler.start() ───────┬─ _loop (≤60s, woken by next_run)
├─ _note_pings_loop (60s, owner-iterating)
└─ _event_pings_loop (10min — present but unwired:
calendar reminders are emitted
as Notes, so _note_pings_loop is
the single dispatch path)
```
The scheduler loop is one of the more careful pieces of the codebase. It snapshots `_executing` under a lock so a request handler dispatching a manual run cannot race the periodic sweep ([src/task_scheduler.py:493-511]()), it serializes LLM-using tasks behind `asyncio.Semaphore(1)` while letting pure-infra actions bypass that slot ([src/task_scheduler.py:223-224, 535-540]()), and on startup it marks any leftover `running`/`queued` rows as `aborted` with the reason "Server restarted…" so a crash doesn't leave phantom rows ([src/task_scheduler.py:278-295]()). It also sleeps only until the next due `next_run` capped at 60 s, which is why a `* * * * *` cron actually fires near the minute boundary instead of up to a minute late ([src/task_scheduler.py:470-486]()).
Sources: [routes/email_pollers.py:939-1006](), [src/task_scheduler.py:269-510]()
## The auto-triage pass: one loop, five jobs
`_auto_summarize_pass_single` is the centerpiece of the email side. One IMAP connection, a single `SINCE`-bounded fetch over the last `days_back` days, and then five independent LLM jobs gated by feature flags from `data/settings.json`:
| Flag | What the LLM is asked to do | Side-effect |
|---|---|---|
| `email_auto_summarize` | 1–3 bullet summary fenced by `<<<SUMMARY>>>` markers | Row in `email_summaries` |
| `email_auto_reply` | Draft reply fenced by `<<<REPLY>>>` markers, style-matched to user's writing | Row in `email_ai_replies` |
| `email_auto_tag` / `email_auto_spam` | Classify to a fixed 13-tag set + spam verdict | Row in `email_tags`; if spam and folder detected, IMAP `MOVE` to Junk/Spam |
| `email_auto_calendar` | Decide `create` / `update` / `cancel` / `noop` against next 60 days of events | `do_manage_calendar` invocations + `email_calendar_extractions` row |
| `auto_urgent` (currently force-disabled here; lives in `check_email_urgency` task) | Verdict `critical`/`high`/`medium`/`low`/`none` + send self-alert email | Row in `email_urgency_alerts` |
Three details worth noticing:
**Fenced output, not JSON.** The summary and reply prompts demand the model put its answer between `<<<SUMMARY>>>` / `<<<REPLY>>>` and `<<<END>>>`. `_extract_reply` strips everything outside. This sidesteps the eternal "the model thought out loud before its answer" problem more cleanly than think-tag stripping, and the same extractor doubles for replies and summaries ([routes/email_helpers.py:82-113]()).
**Calendar extraction reads the existing calendar first.** Before asking the LLM to decide what to do with an itinerary email, the pass injects the next 60 days of upcoming events as `EXISTING_EVENTS` JSON so the model can return `action=update` with the existing UID instead of creating a duplicate ([routes/email_pollers.py:365-441]()). It also scans the Sent folder when calendar extraction is on, so a confirmation reply the user wrote propagates ([routes/email_pollers.py:140-148]()).
**Regex-driven detail rescue.** Even when the LLM nails the high-level event, the pass runs a second pass of compiled regexes for meeting links (`teams.microsoft.com`, `zoom.us`, `meet.google.com`, `webex.com`, `meet.jit.si`), tracking URLs (Amazon, FedEx, UPS, DHL, Japan Post), and identifier patterns (meeting IDs, passcodes, PNRs, gates, seats, flight numbers — including Japanese-language variants like `便` and `予約`), then appends them to the description so they survive verbatim ([routes/email_pollers.py:502-549]()).
The whole pass caps itself at 10 processed messages per run, sleeps 1 second between messages, and runs LLM calls via `asyncio.to_thread` so the 240-second model timeouts don't block the event loop ([routes/email_pollers.py:204, 302-305, 802]()).
Sources: [routes/email_pollers.py:114-833](), [routes/email_helpers.py:85-113]()
## Per-account routing without leaking creds
The repo had to grow a multi-account story carefully because the original code stored a single mailbox in `data/settings.json`. The current shape is: `EmailAccount` rows in the database, with passwords run through `secret_storage.encrypt`/`decrypt`. `_get_email_config(account_id, owner)` resolves the right row with a fallback order — explicit ID, then default-flagged, then first-enabled — *always* scoped by `owner` when one is given ([routes/email_helpers.py:456-557]()).
The owner-scoping is the security-relevant part. The same file calls out previous incidents in comments: `_assert_owns_account` exists because a bare `id == account_id` filter let a multi-user deploy enumerate other users' accounts ([routes/email_helpers.py:172-186]()); the IMAP pool key is `(account_id, owner)` because two users both passing `account_id=None` were silently sharing a connection ([routes/email_routes.py:443-498]()); `email_tags` had its primary key promoted from `message_id` alone to `(message_id, owner)` because Message-IDs are globally shared (a newsletter has the same ID for every recipient) and a tag-write for user A's row was clobbering B's ([routes/email_helpers.py:303-361]()).
The same `_get_email_config` is what the note reminder dispatcher uses to find an SMTP route, with a settings-level `reminder_email_account_id` letting users pick *which* account reminders go through when they have several configured ([routes/note_routes.py:274-299]()).
Sources: [routes/email_helpers.py:154-186, 456-557](), [routes/email_routes.py:419-498]()
## CalDAV: one-way pull, idempotent on URL hash
CalDAV sync is deliberately scoped down. It's a one-way pull (remote → local SQLite), it uses the `caldav` Python lib through `asyncio.to_thread` so the FastAPI loop stays free, and the local row id is `caldav-<sha256(remote_url)[:24]>` so re-syncs always target the same row ([src/caldav_sync.py:40-44]()).
The discovery handshake tries PROPFIND → principal → calendars first, then falls back to treating the URL as a direct calendar reference — which is the difference between "user pasted the server root" and "user pasted a specific calendar collection" ([src/caldav_sync.py:75-95]()). The fetch window is 90 days back to 1 year forward, which keeps the REPORT cheap; far-future recurring events still render through frontend RRULE expansion ([src/caldav_sync.py:33-37, 97-98]()). VEVENT UIDs are the upsert key, and any locally-cached CalDAV-sourced event whose UID didn't appear in the latest pull within the window gets deleted so remote deletions propagate ([src/caldav_sync.py:185-225]()).
Datetime handling is the bit that bites most CalDAV clients: tz-aware datetimes get converted to UTC and stored naive with `is_utc=True` so the serializer adds the `Z` suffix and the frontend renders in the user's local timezone correctly; all-day `date` values get widened to `datetime` and stay flag-free ([src/caldav_sync.py:47-56, 168-174]()).
The `/api/calendar/test` route is also worth a note: it issues an actual PROPFIND with a real DAV XML body and translates the HTTP codes back into user-readable strings ("Auth failed — check username/password", "Forbidden — user can't access that URL"), accepting an un-saved body so the user can validate a configuration before storing the password ([routes/calendar_routes.py:438-493]()).
Sources: [src/caldav_sync.py:1-256](), [routes/calendar_routes.py:392-499]()
## The reminder fanout: browser, email, ntfy — and dedupe
```text
Note.due_date hits its ±90s window
│
▼
action_ping_notes(owner) ──┐
(src/builtin_actions.py) │ All three paths converge on
│ │ dispatch_reminder(...) and
▼ │ share data/note_pings_{slug}.json
dispatch_reminder(title, body, id) ──┘
│
reads settings["reminder_channel"]:
│
┌──────┴──────┬──────────────┐
▼ ▼ ▼
"browser" "email" "ntfy"
in-mem SMTP via POST to
queue + resolved {integrations.ntfy.base_url}/{topic}
frontend EmailAccount with Priority:high, Tags:bell
Notification + reminder_ header
email_to
```
A single reminder always pushes to the in-app notification queue regardless of channel — the user might be looking at the tab when the email arrives, and the toast confirms the reminder fired ([routes/note_routes.py:393-411]()). The dedupe state is a JSON file keyed per-owner (`data/note_pings_{owner_slug}.json`); both the route-level dispatcher and the background scanner write to it, and the entry remembers which channel it last fired on so a failed email send can be retried by the next scanner tick instead of being silently muted by a frontend-only entry ([routes/note_routes.py:139-174, 418-441](), [src/builtin_actions.py:1436-1452]()).
Calendar reminders deliberately go through this same path: the scheduler exposes `_event_pings_loop` but it's not started, with a comment explaining that calendar reminders are represented as Notes by the calendar UI, so the Notes scanner is the single dispatch path — running both produced duplicate emails for the same event ([src/task_scheduler.py:341-347, 411-429]()).
The ntfy integration is read out of `data/integrations.json` rather than its own settings stanza: any enabled integration with `preset == "ntfy"` and a `base_url` qualifies, and an `api_key` (if set) becomes `Authorization: Bearer …` ([routes/note_routes.py:367-391]()). Topic defaults to `"reminders"` but is configurable via `reminder_ntfy_topic`.
Sources: [routes/note_routes.py:111-450](), [src/builtin_actions.py:1421-1565]()
## The cron grammar: more than "every Wednesday at 9"
`ScheduledTask` supports five schedule shapes — `once`, `daily`, `weekly`, `monthly`, and full `cron` via `croniter` — plus three trigger types: `schedule`, `event` (fired off the in-process event bus when N matching events accumulate), and `webhook` (a tokenized POST endpoint at `/api/tasks/{id}/webhook/{token}`) ([routes/task_routes.py:21-99](), [src/task_scheduler.py:62-165]()).
The interesting wrinkle is timezone handling. `compute_next_run` takes an IANA `tz_name`, interprets `scheduled_time` as wall-clock in that zone, and converts the resulting datetime to naive UTC for storage. Without a tz the legacy naive-UTC behavior is preserved so existing tasks don't shift ([src/task_scheduler.py:62-103, 130-163]()). The timezone is sourced from the linked `CrewMember.timezone` if any — i.e., scheduling is per-persona, not per-server ([src/task_scheduler.py:168-179]()).
Three task types are dispatched:
- **`action`** — calls into `BUILTIN_ACTIONS[task.action]`. No LLM. This is how `summarize_emails`, `extract_email_events`, `classify_events`, `check_email_urgency`, and friends run on a `cron_expression` like `"0 */2 * * *"` ([src/task_scheduler.py:856-877](), [src/builtin_actions.py:2138-2159]()).
- **`research`** — drives the deep-research pipeline.
- **`llm`** — runs through the agent loop with full tool access; this is what a user-written prompt-task becomes.
`HOUSEKEEPING_DEFAULTS` seeds each owner with a canonical set of built-in tasks (Email Summary, Email AI Auto Reply, Email Calendar Events, Calendar Classify Events, Email Mark Boundaries, Email Tags, etc.). The UI flags `is_builtin` and `is_modified` so a user who tweaked a default can revert ([src/task_scheduler.py:182-198](), [routes/task_routes.py:101-117]()).
Sources: [src/task_scheduler.py:62-198, 535-877](), [routes/task_routes.py:21-117]()
## Scheduled send: the SMTP path that survives a restart
The `POST /api/email/schedule` endpoint isn't a wrapper around the task scheduler — it's a separate SQLite table (`scheduled_emails`) drained by its own 30-second poller. The handler validates `send_at` against now (with a 30s grace), refuses past timestamps to stop the poller from immediately firing `1970-01-01` mistakes, and stamps the row with the originating `account_id` and an `odysseus_kind` tag that flows through to the eventual `X-Odysseus-Kind` header ([routes/email_routes.py:1841-1895]()).
`_scheduled_poll_once` builds a multipart/alternative message (mixed if attachments are present), appends to the user's IMAP Sent folder so the message appears there, marks the row `sent` (or `failed` with the error), and is also exposed as a `odysseus-mail poll-scheduled` CLI for cron-driven deployments — set `ODYSSEUS_INPROCESS_POLLERS=0` so the two don't race on the same SQLite ([routes/email_pollers.py:848-980]()).
Sources: [routes/email_pollers.py:848-980](), [routes/email_routes.py:1841-1932]()
## What builders should notice
A few patterns are reusable beyond Odysseus:
- **Same path for "feature" and "action."** `dispatch_reminder` is what the note scanner calls, what the calendar feature ultimately reaches, and what an agent tool could call. The reminder dedupe cache lives on disk in the same shape regardless of who fired the reminder, which is why "fire from UI, then again from scanner 2 min later" can't double-send.
- **Fenced LLM output beats JSON-only.** `<<<REPLY>>>...<<<END>>>` survives chain-of-thought leakage better than asking for clean JSON, and the same extractor handles summaries and replies. Compare to the calendar-extraction prompt which *does* require JSON — and which has fallback regex matching against `r'\[\s*\{[^[\]]*?"action"…'` for when the model wraps the array in commentary anyway ([routes/email_pollers.py:443-449]()).
- **Owner-scope every fallback.** Almost every security comment in this subsystem is variations on the same theme: a "default" lookup that was fine for single-user installs ended up leaking to a second user. The fix is always passing `owner` into the resolution function and OR-matching legacy null-owner rows by their mailbox identity, not by trust ([routes/email_helpers.py:478-484]()).
- **Idempotent IDs for external state.** `caldav-<sha256(url)[:24]>` for remote calendar rows, VEVENT `UID` for events, `(folder, uid)`-hashed `<synth-…@local>` for messages missing a Message-ID ([src/caldav_sync.py:40-44](), [routes/email_pollers.py:222-228]()) — every "I might re-sync this thing" surface picks a deterministic key.
- **One semaphore makes the agent loop sane.** A single shared `Semaphore(1)` for LLM tasks plus a bypass for pure-infra actions means the scheduler can dispatch as many concurrent things as it wants without ever running two model calls at once on the same machine — the only exception is the `ping_notes` scanner, which is allowed to fire reminders out-of-band because it doesn't touch the model ([src/task_scheduler.py:223-224, 535-540]()).
What's *not* here is also instructive: there is no hosted-service dependency, no API key for "the Odysseus cloud," no per-vendor SDK. CalDAV is the protocol it actually speaks; ntfy is an HTTP POST it just makes; SMTP/IMAP are stdlib `imaplib`/`smtplib`. The LLM resolution goes through `resolve_endpoint("utility")` which lets the user point at OpenAI, an Ollama box on their LAN, or anything else with a chat-completions shape — so the whole productivity side is provider-neutral by construction.
Sources: [routes/email_pollers.py:188-194](), [src/caldav_sync.py:1-23](), [routes/note_routes.py:365-391]()
---
## 10. Builder Takeaways: What's Surprising, What's Hard, What to Watch
> Closing synthesis: the monolith-with-sidecars architecture, the breadth of admin-only surface area, the security implications called out in SECURITY.md, and the roadmap items the maintainers explicitly want help on.
- Page Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/pages/10-builder-takeaways-what-s-surprising-what-s-hard-what-to-watch.md
- Generated: 2026-05-31T19:53:05.561Z
### Source Files
- `ROADMAP.md`
- `SECURITY.md`
- `ACKNOWLEDGMENTS.md`
- `docker-compose.yml`
- `app.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [ROADMAP.md](ROADMAP.md)
- [SECURITY.md](SECURITY.md)
- [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md)
- [docker-compose.yml](docker-compose.yml)
- [app.py](app.py)
</details>
# Builder Takeaways: What's Surprising, What's Hard, What to Watch
Odysseus is a self-hosted AI workspace that, on first inspection, looks like a "kitchen sink" personal assistant — chat, memory, research, gallery, notes, calendar, email, vault, a shell, a model serving control panel — all behind one login. Reading the code makes the shape clearer: it is a single FastAPI process that owns nearly all behaviour, with three optional Docker sidecars and a small set of external tools (Ollama, Radicale, Dovecot, tmux, SSH) that the app shells out to. That choice is what shapes the rest of the page — the breadth of admin-only surface area, the security posture, and the roadmap items the maintainer is explicitly asking for help on.
This page is a closing synthesis: the architecture in one diagram, the privileged-surface picture from `SECURITY.md`, the explicit "help wanted" list from `ROADMAP.md`, and the surprising details a README-only reader would miss.
## Architecture: a monolith with three sidecars
`app.py` is the only entry point. It imports about 47 routers from `routes/` and wires them into one FastAPI app, then connects a handful of long-lived managers (auth, sessions, memory, RAG, webhooks, MCP, task scheduler) into them. The `docker-compose.yml` file adds three sidecars whose only job is to be a backend Odysseus can talk to over the network.
```text
┌───────────────────────── docker-compose.yml ──────────────────────────┐
│ │
│ ┌──────────── odysseus (FastAPI, port 7000) ─────────────┐ │
│ │ app.py orchestrator │ │
│ │ ├─ Middleware: CORS, SecurityHeaders, │ │
│ │ │ RequestTimeout (45s, with exemptions), │ │
│ │ │ AuthMiddleware (cookie + Bearer ody_) │ │
│ │ ├─ Routers (~47): chat, research, memory, shell, │ │
│ │ │ cookbook, hwfit, mcp, calendar, contacts, email, │ │
│ │ │ vault, notes, tasks, gallery, document, … │ │
│ │ └─ Managers: AuthManager, SessionManager, │ │
│ │ MemoryManager, McpManager, WebhookManager, │ │
│ │ TaskScheduler │ │
│ └─────────┬───────────────┬──────────────────┬────────────┘ │
│ │ │ │ │
│ ┌──────▼─────┐ ┌──────▼─────┐ ┌───────▼──────┐ │
│ │ searxng │ │ chromadb │ │ ntfy │ │
│ │ metasearch│ │ vectors │ │ push notif. │ │
│ └────────────┘ └────────────┘ └──────────────┘ │
└───────────────────────────────────────────────────────────────────────┘
│ │
└──── shells out to: ollama, ssh, tmux, mbsync ────┘
(declared in ACKNOWLEDGMENTS.md, not bundled)
```
A few things worth noticing about this shape:
- All cross-cutting concerns — auth, CSRF-ish security headers, hard request timeout, CSP nonce injection into static HTML — live as Starlette middlewares inside the same process. Streaming and long-running endpoints (`/api/chat`, `/api/shell/stream`, `/api/research`, `/api/model/probe`, etc.) are exempted from the 45 s hard timeout by an explicit allowlist. Sources: [app.py:64-102]()
- The Bearer-token path keeps a prefix-keyed in-memory cache of bcrypted API tokens with a dirty-bit invalidation hook (`app.state.invalidate_token_cache`). The cookie path is the more common one; the bearer path exists for external API integrations and is rate-limited indirectly by bcrypt cost per prefix bucket. Sources: [app.py:136-254]()
- There is a deliberate loopback "internal-tool" header that lets the agent layer impersonate a user when calling back into admin-gated routes from inside the same process — gated to `127.0.0.1`/`::1` plus a matching token. This is the kind of detail a README-only reader would miss but a security reviewer should care about. Sources: [app.py:171-188]()
- Vector RAG is wired in but `rag_manager = None` and `rag_available = False` at startup — the comment notes the ChromaDB client could not even initialise against the installed pydantic, and all callers were already guarded. Useful pattern: a feature flag that the runtime owns, not the user. Sources: [app.py:348-356]()
Sources: [app.py:1-102](), [app.py:163-271](), [app.py:409-598](), [docker-compose.yml:1-79]()
## The breadth of admin-only surface area
The single biggest surprise reading this codebase is *how much* of it the operator-as-admin is responsible for. `SECURITY.md` lists the privileged surfaces explicitly:
> Leave high-risk agent tools restricted to admins: shell, Python, file read/write, email send/read, MCP, app API, task/skill/memory management, settings, tokens, and model serving.
That sentence is not aspirational — `routes/` includes individual modules for each of those concerns, and grepping for `require_admin`/`is_admin` checks finds ~137 occurrences across roughly 18 router files (shell, model, cookbook, mcp, vault, contacts, email, backup, webhooks, tasks, admin_wipe, …). In other words, "admin" in Odysseus is not just "settings page access" — it is the gate for an entire shadow operating system: shelling out to the host, downloading and serving local models, reading the user's IMAP mailbox, decrypting the vault, exposing a webhook surface, and wiping data.
A few representative router wirings from `app.py`:
| Surface | Router (in `routes/`) | Why it is privileged |
|---|---|---|
| Shell command exec / SSE stream | `shell_routes.py` | Direct host command execution |
| Cookbook (local model download/serve) | `cookbook_routes.py` | Spawns tmux, pulls model weights, shells via SSH |
| Hardware fit ("What Fits?") | `hwfit_routes.py` | Reads host hardware to score model viability |
| MCP servers | `mcp_routes.py` | Runs Model Context Protocol child processes |
| Vault | `vault_routes.py` | Encrypted secrets store |
| Email (IMAP/SMTP) | `email_routes.py` | Reads + sends real mail |
| Calendar / Contacts (CalDAV/CardDAV) | `calendar_routes.py`, `contacts_routes.py` | Speaks to Radicale or similar |
| Admin "Danger Zone" wipes | `admin_wipe_routes.py` | Deletes user data classes |
| API tokens | `api_token_routes.py` | Mints long-lived bearer tokens |
| Webhooks | `webhook_routes.py` | Outbound HTTP from the box |
Sources: [SECURITY.md:9-21](), [app.py:409-598]()
The operational reading of `SECURITY.md` is short: do not expose this without auth, do not run it as a public service, keep `AUTH_ENABLED=true`, put it behind a reverse proxy and HTTPS, and treat admin accounts as root-equivalent. There is a publish-a-fork checklist that includes `git check-ignore` for `.env`, `data/auth.json`, `data/app.db`, `odysseus.db`, and a `git grep` regex for common API-key shapes (`sk-…`, `xox[baprs]-…`, `AIza…`, `Bearer …`). It is a nice example of a small, copy-pasteable hygiene gate. Sources: [SECURITY.md:22-32]()
## What the maintainer is explicitly asking for help on
`ROADMAP.md` is unusually candid — its subtitle is *"Roadmap / Help Wanted"* and it opens with "I dont know what I'm doing hlep". That tone is useful because it labels the parts of the codebase the maintainer is least confident in. For builders dropping in, this is also where contributions are most welcome.
The high-priority list, paraphrased and tagged:
- **Reliability** — squash bugs; fresh Docker smoke tests on Linux, macOS, and Windows. Sources: [ROADMAP.md:8-11]()
- **Integrations** — audit which integrations actually work end-to-end, document the rest, hide or remove what does not. The hint that this work is needed is itself a flag: a long list of providers and protocols is wired in faster than they can be verified. Sources: [ROADMAP.md:13]()
- **Self-host troubleshooting cookbook** — the maintainer explicitly enumerates "30-second fixes that become 30-minute searches": Dovecot cleartext auth for local stacks, ntfy Android Instant Delivery, clipboard limits on plain-HTTP Tailscale URLs, Radicale collection URLs. These are the foot-guns of running a real self-hosted stack — worth lifting straight into a `docs/solutions/` entry if the project picks up a solved-problem doc convention. Sources: [ROADMAP.md:14]()
- **Provider probing** — the call-outs for Anthropic, Gemini, Groq, xAI, OpenRouter, OpenAI, and DeepSeek echo the request-timeout exemption for `/api/model/probe` in `app.py` (probes can take up to 8 s each and iterate). Probe correctness is a known weak spot. Sources: [ROADMAP.md:20](), [app.py:80-85]()
- **Skill audit** — "how does your model respond to skill injection, does it follow? Does its parsing miss?" — a frank admission that the skill/prompt-injection surface needs an adversarial pass. Sources: [ROADMAP.md:18]()
- **Degraded-state reporting** — ChromaDB, SearXNG, email, ntfy, and provider probes are exactly the components running as Docker sidecars or external services, so the failure modes are network-shaped and easy to surface badly. Sources: [ROADMAP.md:19]()
Refactor targets are also informative: `static/style.css` is openly called "Calypso's island", the onboarding tours are copy-pasted scaffolding asking for a shared `tour-core.js`, and mobile `@media` overrides are flagged as the root of a class of "CSS did not move" bugs. The Backend section ends with "Security hardening around admin-only tools and clear docs for their risk." — which is the next page over from the architecture point above. Sources: [ROADMAP.md:22-46]()
## Surprising details a README-only reader misses
- **Most of the code was written with AI.** The acknowledgments list `gpt-oss-120b`, Qwen3-235B, DeepSeek V3.1/V4 Pro/Flash, Claude, and Codex as collaborators, alongside human contributors. That is unusual to call out so explicitly and frames a lot of the breadth-versus-depth tradeoffs you see in the code. Sources: [ACKNOWLEDGMENTS.md:158-168]()
- **The "agent loop" came from opencode; the "what model fits?" engine came from `llmfit`; the deep-research pipeline came from Tongyi DeepResearch.** These are concrete, attributable transplants under MIT and Apache-2.0, with the adapted-to file paths listed (`services/hwfit/`, `routes/cookbook_*.py`, `services/search/`, `api/research_*.py`). Sources: [ACKNOWLEDGMENTS.md:12-39]()
- **The license story is deliberately permissive — with one AGPL trapdoor.** `pypdf` and `charset-normalizer` replaced earlier copyleft choices, and `chardet` was removed entirely. PyMuPDF is the one AGPL dependency left, and it is *optional* and lazy-imported, used only by PDF form-filling in `src/pdf_forms.py` and the form endpoints in `routes/document_routes.py`. If you install it for that feature, AGPL's network clause then applies to that feature for your deployment. Sources: [ACKNOWLEDGMENTS.md:139-156]()
- **Static assets ship without a build step.** `_RevalidatingStatic` in `app.py` forces `Cache-Control: no-cache` for `.js`/`.css`/`.html` because the app serves raw ES modules with no hashed URLs — without it, browsers happily served stale modules across deploys. Sources: [app.py:277-293]()
- **Sidecars are bound differently on purpose.** ChromaDB is published on `8100:8000` (host-wide), ntfy on `8091:80` (host-wide), but SearXNG is bound to `127.0.0.1:8080:8080` — only loopback. That is the right default for a metasearch you do not want to expose, and a small but real piece of the security posture you would not see without reading `docker-compose.yml`. Sources: [docker-compose.yml:38-74]()
- **The container drops privileges via `PUID`/`PGID`.** The entrypoint chowns `/app/data` and `/app/logs` to match the configured user/group before running uvicorn, which is what keeps bind-mounted files editable from the host. Sources: [docker-compose.yml:20-30]()
## What builders should watch
Pulling the threads together, this is the punch list a builder dropping into Odysseus should keep in mind:
1. **Treat any admin account as host-level access.** The privileged-tools list in `SECURITY.md` is not a configuration knob — it is the design. If you are reviewing security, the high-leverage targets are `routes/shell_routes.py`, `routes/cookbook_routes.py`, `routes/mcp_routes.py`, `routes/vault_routes.py`, `routes/email_routes.py`, and the loopback `INTERNAL_TOOL_HEADER` path in the auth middleware.
2. **Probing and degraded-state reporting are weak spots the maintainer flagged.** If you are writing a contribution that touches providers or sidecars, that work will be unusually welcome — and it lines up with the timeout exemptions already coded into `app.py`.
3. **Architecture additions should respect "monolith + sidecar".** New integrations belong as a router in `routes/`, a manager in `src/`, and (if heavyweight) a new compose service. There is no event-bus or microservice convention to honour — staying with the existing shape is the path of least resistance.
4. **Provider neutrality is a real design property, not a marketing claim.** Models, search, and notifications are all swappable: `searxng`, `chromadb`, `ntfy`, and any of the listed model providers can be replaced without touching the orchestrator. If you are integrating a knowledge profile or wiki layer on top, mirror that posture — keep the source of truth in files, not in a single vendor.
5. **`ROADMAP.md` is the contributor map.** It is short, specific, and includes the maintainer's own confidence about each area. If you want to land a PR with high signal, start there.
The summary, in one line: Odysseus is a single FastAPI process pretending to be a workspace, three sidecars pretending to be infrastructure, and a long list of privileged tools that only make sense if you take the security model seriously. The roadmap tells you exactly where the seams are.
---