# The FastAPI Monolith: app.py and the 45-Router Mount

> How a single ~950-line app.py wires authentication, middleware, and ~45 route modules into one server — and what the routes/ directory reveals about the surface area (chat, sessions, documents, memory, MCP, webhooks, vaults, and more).

- Repository: pewdiepie-archdaemon/odysseus
- GitHub: https://github.com/pewdiepie-archdaemon/odysseus
- Human wiki: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124
- Complete Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/llms-full.txt

## Source Files

- `app.py`
- `core/middleware.py`
- `core/auth.py`
- `core/database.py`
- `core/session_manager.py`
- `src/app_initializer.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [app.py](app.py)
- [core/middleware.py](core/middleware.py)
- [core/auth.py](core/auth.py)
- [core/database.py](core/database.py)
- [core/session_manager.py](core/session_manager.py)
- [src/app_initializer.py](src/app_initializer.py)
- [routes/chat_routes.py](routes/chat_routes.py)
</details>

# The FastAPI Monolith: app.py and the 45-Router Mount

Odysseus is a self-hosted, multi-modal AI workstation, and almost every HTTP entry point flows through a single `app.py` file. At ~957 lines it is unapologetically a monolith: one `FastAPI` instance, three custom middlewares stacked above the framework's CORS handler, a single auth gate that understands both browser session cookies and `Bearer ody_` API tokens, and roughly 45 `setup_*_routes(...)` factories pulled from the `routes/` directory and mounted with `app.include_router(...)`. If you want to know what this product *does*, you read `app.py` top-to-bottom and the answer is right there.

This page walks the seam between `app.py` and `routes/` — how the orchestrator wires authentication, security headers, request timeouts, and per-feature routers into one server, what the route filenames tell you about the product surface, and the small but pointed engineering details (a hand-rolled API-token cache, an internal-tool loopback escape hatch, a per-request CSP nonce, two named ghost sessions purged at startup) that turn an otherwise vanilla FastAPI app into something specific to this codebase.

## The shape of `app.py`

`app.py` calls itself a "slim orchestrator" in its first comment, which is a generous reading: it is slim only in the sense that the heavy logic lives elsewhere. The file is laid out as a single linear script — no factory, no `create_app()` — punctuated by `# =========` banner comments that double as a table of contents:

```
# ========= LOGGING =========
# ========= APP =========
# ========= CORS =========
# ========= SECURITY HEADERS MIDDLEWARE =========
# ========= REQUEST TIMEOUT (FALLBACK FOR HUNG HANDLERS) =========
# ========= AUTH =========
# ========= STATIC FILES =========
# ========= GENERATED IMAGES =========
# ========= YOUTUBE INIT =========
# ========= RAG (vector document RAG — DISABLED) =========
# ========= IMPORT CONFIG =========
# ========= COMPONENT INITIALIZATION =========
# ========= EXCEPTION HANDLERS =========
# ========= WEBHOOK MANAGER =========
# ========= INCLUDE ROUTERS =========
# ========= ROUTES (kept in app.py) =========
# ========= LIFECYCLE =========
```

The mental model is: configure the app, plug in the middleware stack, build the dependency graph in one shot via `initialize_managers(...)`, then `include_router` every feature. The `@app.on_event("startup")` handler at the bottom owns everything that has to happen *after* the server is accepting traffic — MCP server discovery, background-job monitoring, scheduled task runner, a keep-alive loop for upstream LLM endpoints, an Ollama auto-detector, and a nightly skill-audit loop.

Sources: [app.py:1-48](app.py), [app.py:599-672](app.py), [app.py:675-931](app.py)

## The middleware sandwich

Three custom middlewares sit on top of FastAPI's `CORSMiddleware`, in the order they are added (which in Starlette means the *reverse* order of execution — the last added wraps first):

```
request
  └─► AuthMiddleware                  (cookie/bearer gate, exempt list)
       └─► _RequestTimeoutMiddleware  (45s default, path-exempt for streams)
            └─► SecurityHeadersMiddleware (per-request CSP nonce)
                 └─► CORSMiddleware
                      └─► route handler
```

`SecurityHeadersMiddleware` generates a fresh `secrets.token_hex(16)` nonce on every request, attaches it to `request.state.csp_nonce`, and templates it into the page's `script-src` directive. The `_serve_html_with_nonce` helper in `app.py` substitutes the nonce into the literal token `{{CSP_NONCE}}` inside `static/index.html` before sending it, so inline scripts the SPA needs can run while arbitrary injected scripts cannot. Tool-render iframes (`/api/tools/.../render`) and self-contained research reports (`/api/research/report/...`) get different CSPs entirely — the iframes lean on `sandbox="allow-scripts"` for isolation and the reports relax `script-src` to `'unsafe-inline'` because they're standalone HTML artifacts.

`_RequestTimeoutMiddleware` is the most pragmatic piece in the file. The comment explains why it exists: *"Without this, a single hung `subprocess.run` or missing-timeout `httpx` call locks up the entire server for everyone."* The default cap is 45 seconds (`REQUEST_HARD_TIMEOUT`), and there is a hard-coded prefix exemption list for endpoints that legitimately stream or run for minutes — `/api/chat`, `/api/shell/stream`, `/api/research`, `/api/model/download`, `/api/cookbook/setup`, `/api/upload`, image diffusion proxies, and the model-probe SSE. Everything else gets `asyncio.wait_for` wrapped around it and returns a 504 on timeout.

Sources: [app.py:51-102](app.py), [app.py:601-617](app.py), [core/middleware.py:47-100](core/middleware.py)

## Auth: cookies, bearer tokens, and a loopback bypass

The `AuthMiddleware` defined inline in `app.py:163-266` is the single gate every request passes through (it's only added when `AUTH_ENABLED` is true, and it can be wholly disabled with the env var). It handles three credential types and an explicit dev/test exemption:

| Path | What it does | Code |
|---|---|---|
| Exact-match exemption | `/login`, `/api/health`, `/api/version`, `/api/auth/*` setup/login/status, etc. pass through unauthenticated. | `AUTH_EXEMPT_EXACT` / `_is_auth_exempt` |
| Loopback internal-tool token | If the request comes from `127.0.0.1`/`::1` *and* carries `X-Odysseus-Internal-Token` matching the per-process secret, it is admitted as `internal-tool` — or impersonates the user from `X-Odysseus-Owner`. | `app.py:172-188`, `core/middleware.py:16-17` |
| `LOCALHOST_BYPASS=true` | Lets unauthenticated localhost requests through. Off by default; meant to be off when fronted by a reverse proxy or Tailscale Funnel. | `app.py:110, 191-194` |
| `Bearer ody_...` API token | Bcrypt-checks against an in-memory prefix cache, attributes the request to the token's owner. | `app.py:202-254` |
| Session cookie | The default browser path — `auth_manager.validate_token(cookie)` decides. | `app.py:257-266` |

The API-token path is where there's real engineering. Tokens are 51 bytes (`ody_` + 43 base64 chars), and naively each request would do a full DB scan plus a bcrypt verify per row. Instead, `_token_cache` is a `dict[prefix → list[(id, hash, owner, scopes)]]` that's lazily refreshed when a `_token_cache_dirty` flag is set. The flag is bumped by `app.state.invalidate_token_cache`, which `routes/api_token_routes` calls on create/revoke. Inside the request handler, bcrypt is still run, but only against candidates whose 8-char prefix matches the incoming token — so it's O(collisions-in-prefix), effectively O(1). `last_used_at` is updated fire-and-forget via `asyncio.create_task(_touch_last_used(...))` so the request doesn't wait on the extra commit.

The "internal-tool" loopback deserves a callout: when the in-app agent makes tool calls that need admin-gated routes (e.g. creating a note in another user's account), it doesn't have a session cookie, so it loops back via HTTP with the per-process `INTERNAL_TOOL_TOKEN` from `core/middleware.py`. The token is regenerated on every process start via `secrets.token_hex(32)` if `ODYSSEUS_INTERNAL_TOKEN` isn't set, and `require_admin()` accepts either the header directly or a request whose middleware already stamped `request.state.current_user = "internal-tool"`. The impersonation header (`X-Odysseus-Owner`) was added so agent-created records land with the *user's* ownership instead of being orphaned to a generic `internal-tool` owner.

Sources: [app.py:104-271](app.py), [core/middleware.py:12-44](core/middleware.py)

## The dependency graph in one call

Component construction is centralized in `src/app_initializer.py:initialize_managers(...)`. `app.py` calls it once, unpacks the returned dict into module globals, and from then on every router factory receives explicit dependencies. There is no FastAPI `Depends()` graph here — it's plain constructor injection, with each `setup_*_routes(...)` taking whatever managers it needs and returning an `APIRouter`.

```
                    ┌────────────────────────────────────────┐
                    │  src/app_initializer.initialize_managers │
                    └────────────────────────────────────────┘
                                       │
              ┌────────────┬───────────┼───────────┬────────────────┐
              ▼            ▼           ▼           ▼                ▼
       MemoryManager  SkillsManager SessionManager UploadHandler  PresetManager
              │            │           │           │                │
              └────────┬───┴───────────┼───────────┴──────┬─────────┘
                       ▼               ▼                  ▼
                ChatProcessor    ChatHandler         ModelDiscovery
                       │               │
                       └───── plus MemoryVectorStore (Chroma, optional) ─┘
```

Two intentional fragilities are worth noting. `rag_manager` is hard-wired to `None` (with `rag_available = False`) in `app.py:354-356` because the ChromaDB vector-document RAG was never indexed in practice and its 1.4.1 client cost ~30s of startup time. Personal-doc routes still receive the `None` and degrade cleanly because every consumer guards on `rag_available`. The `MemoryVectorStore` (also Chroma-backed, but for memory rather than documents) is initialized in `app_initializer.py:55-74` and is kept — when healthy, it rebuilds its index from existing memories on first start.

Sources: [app.py:344-385](app.py), [src/app_initializer.py:32-114](src/app_initializer.py)

## The 45-router mount: what the surface area reveals

`app.py:412-597` calls `app.include_router(...)` exactly 40 times (with some files exposing helpers rather than routers, the `routes/` directory totals 47 Python files — ~45 of them are real route modules). The pattern is uniform: every module exposes a `setup_<feature>_routes(...)` factory that returns an `APIRouter`, instantiated with whatever managers the feature needs:

```python
# routes/chat_routes.py:90-101
def setup_chat_routes(
    session_manager,
    chat_handler,
    chat_processor,
    ...
) -> APIRouter:
    router = APIRouter(tags=["chat"])
```

This is the entire abstraction. There is no plugin registry, no automatic discovery, no decorator-based registration — just a literal list of imports and `include_router` calls in `app.py`. Adding a feature is an explicit, single-file diff to the orchestrator, which is why a fresh reader can know the system's full HTTP surface by scrolling one screen.

Grouping the routers by responsibility shows what kind of product this actually is:

| Group | Routers | What it covers |
|---|---|---|
| Auth & identity | `auth_routes`, `api_token_routes`, `prefs_routes` | First-run setup, login/logout, OAuth-like API tokens (`ody_...`), per-user preferences. |
| Chat core | `chat_routes`, `chat_helpers`, `session_routes`, `history_routes`, `compare_routes` | Streaming chat, sessions, message history, A/B model comparison. |
| Knowledge & memory | `memory_routes`, `skills_routes`, `personal_routes`, `embedding_routes`, `note_routes` | Long-term memory, skill packs (Markdown SKILL.md files), personal documents, notes/todos. |
| Documents & media | `document_routes`, `upload_routes`, `gallery_routes`, `signature_routes`, `editor_draft_routes`, `emoji_routes`, `font_routes` | Canvas/artifact documents, file uploads, image gallery + drafts, sig stamps, Twemoji proxy, custom fonts. |
| Voice & vision | `tts_routes`, `stt_routes` | Text-to-speech, speech-to-text. |
| Models & infra | `model_routes`, `cookbook_routes`, `hwfit_routes`, `diagnostics_routes`, `cleanup_routes`, `backup_routes`, `admin_wipe_routes` | Endpoint discovery, model download/serve via the "cookbook", hardware fit calculator, diagnostics, danger-zone wipes, export/import. |
| Integrations | `mcp_routes`, `webhook_routes`, `email_routes`, `email_pollers`, `calendar_routes`, `contacts_routes`, `vault_routes`, `search_routes`, `research_routes` | Model Context Protocol servers, webhooks, IMAP email + pollers, CalDAV, CardDAV, password vault, web search, deep research. |
| Agent & automation | `assistant_routes`, `task_routes`, `shell_routes`, `preset_routes` | Personal assistant, scheduled tasks, user-facing shell exec, preset prompts. |

The mix tells the story: this isn't a chat wrapper, it's a personal AI workstation that has accreted features into one server — notes, calendar, contacts, password vault, image gallery, shell, scheduled tasks, MCP, webhooks. Every one of those gets a router, and every router is mounted explicitly in `app.py`.

Sources: [app.py:412-597](app.py), [routes/chat_routes.py:90-101](routes/chat_routes.py)

## Routes that live in `app.py` itself

A small handful of routes are *not* factored out to the `routes/` directory:

- `/` and a dozen tool deep-links (`/notes`, `/calendar`, `/cookbook`, `/email`, `/memory`, `/gallery`, `/tasks`, `/library`) all serve the same `static/index.html` SPA — the client-side router reads `window.location.pathname` and auto-opens the matching modal. The deep-link comment explains the bookmark-UX motivation: each route can pin a unique favicon and title.
- `/backgrounds` and `/login` serve standalone HTML files.
- `/api/health` and `/api/version` are kept inline because they are tiny and must be exempt from auth anyway.
- `/api/generated-image/{filename}` is a single ownership-checked file server: filenames must match a content-hash regex, ownership is verified against the `GalleryImage` table (treating an empty row as "not yet imported, allow"), and the response sets `Cache-Control: public, max-age=31536000, immutable` because the bytes for a content-hash filename never change.

Sources: [app.py:296-342](app.py), [app.py:601-671](app.py)

## The startup choreography

`startup_event` runs after the socket is open, which is deliberate — anything slow or network-bound goes here as a background task, so the UI starts serving immediately. A list under `app.state._startup_tasks` holds strong references so the GC doesn't reap fire-and-forget tasks before they finish. The notable startup work:

| Stage | Why | Source |
|---|---|---|
| Purge `Nobody` and `Incognito` sessions | They're ephemeral by design and must not survive a restart. | `app.py:682-696` |
| `start_bg_monitor()` | Always-on monitor that re-invokes the agent when `#!bg` shell jobs finish. | `app.py:706-709` |
| `register_builtin_servers` + `mcp_manager.connect_all_enabled()` (20s cap) | MCP server discovery is wrapped in `wait_for(..., 20)` because local tooling can block it. | `app.py:712-725` |
| Tool index pre-warm | Loads the local embedding model + ChromaDB and runs a dummy query so the *first* user message doesn't pay the 1-3s cost. | `app.py:732-743` |
| LLM endpoint warmup + 60s keep-alive | Pings each endpoint's `/models` to prime connections and prevent cold starts. | `app.py:745-773` |
| `_ensure_default_tasks` | Reconciles built-in scheduled tasks per user; the comment notes this also sweeps stale demo/deleted-user rows that would otherwise fire forever. | `app.py:775-820` |
| Skill owner backfill | Disk-backed Markdown skills aren't covered by the DB legacy-owner sweep, so ownerless `SKILL.md` files are assigned to the primary admin. | `app.py:825-842` |
| `task_scheduler.start()` (gated) | Skipped when `ODYSSEUS_INPROCESS_TASKS=0` so an external cron worker can drive task firing instead. | `app.py:847-854` |
| Hourly null-owner sweep | Re-runs the legacy-owner assignment so data created while auth was disabled gets claimed by the admin instead of staying world-visible. | `app.py:858-868` |
| Nightly skill audit | At ~02:00 local, the LLM judges and auto-fixes a batch of stale skills. | `app.py:875-898` |
| Ollama auto-detect | If the `ollama` binary is on PATH and `localhost:11434` responds, an endpoint row is auto-added. | `app.py:900-930` |

`shutdown_event` is comparatively boring — cancel the upload-cleanup task, stop the scheduler, close the webhook manager, disconnect MCP servers — but everything is wrapped in try/except so a single misbehaving subsystem can't block a clean process exit.

Sources: [app.py:675-957](app.py)

## What builders should take away

The interesting choice in this codebase isn't any individual feature — it's the deliberate refusal to hide the wiring. There is no `create_app()` factory, no plugin manifest, no auto-discovery of routes; just an explicit list of `include_router` calls. The cost is that adding a feature touches `app.py`. The benefit is that the entire HTTP surface, the full middleware stack, the exemption rules for auth, and every startup task are all readable in one file. For a personal-server project where the operator is also the developer, that's a defensible trade.

The places where it stops being naive are exactly where the operator would feel pain otherwise: a request timeout middleware so one hung subprocess doesn't freeze the whole event loop; a bcrypt-cached API-token path so external integrations don't pay a linear DB scan per request; a per-request CSP nonce templated into the SPA so XSS surface stays small; an in-process loopback token so the agent can call its own admin routes without a cookie; a startup task list with strong references so warmups don't get garbage-collected. Each one reads like a "we got bit by this once" fix, and the inline comments mostly say so.

Sources: [app.py:64-99](app.py), [app.py:130-254](app.py), [core/middleware.py:47-100](core/middleware.py)
