# Deep Research: A Multi-Step Synthesis Engine in 800 Lines

> The deep_research module adapts Alibaba's Tongyi DeepResearch pattern into a self-contained loop that plans, searches, reads, and produces a visual report — and how SearXNG, the search service, and the visual_report renderer fit together.

- Repository: pewdiepie-archdaemon/odysseus
- GitHub: https://github.com/pewdiepie-archdaemon/odysseus
- Human wiki: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124
- Complete Markdown: https://grok-wiki.com/public/wiki/pewdiepie-archdaemon-odysseus-8b8805c93124/llms-full.txt

## Source Files

- `src/deep_research.py`
- `src/research_handler.py`
- `src/research_utils.py`
- `src/visual_report.py`
- `services/research/service.py`
- `services/search/core.py`
- `routes/research_routes.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/deep_research.py](src/deep_research.py)
- [src/research_handler.py](src/research_handler.py)
- [src/research_utils.py](src/research_utils.py)
- [src/visual_report.py](src/visual_report.py)
- [services/research/service.py](services/research/service.py)
- [services/search/core.py](services/search/core.py)
- [routes/research_routes.py](routes/research_routes.py)
- [src/goal_based_extractor.py](src/goal_based_extractor.py)
- [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md)
</details>

# Deep Research: A Multi-Step Synthesis Engine in 800 Lines

Odysseus ships a "Deep Research" mode that, on paper, sounds heavy — plan a topic, loop through web searches, read pages, decide when it's done, and hand back a magazine-style report with a hero image. In practice, the entire loop lives in a single ~820-line file (`src/deep_research.py`) backed by a handler, a search core, and a self-contained HTML renderer. The pattern is adapted from Alibaba-NLP's [Tongyi DeepResearch](https://github.com/Alibaba-NLP/DeepResearch) under Apache-2.0 (`ACKNOWLEDGMENTS.md:33-38`) and rewritten so the LLM — not hard-coded heuristics — drives every step.

Why this matters: most "deep research" agents either bolt onto a hosted API or carry a heavy framework. Odysseus's version is bring-your-own-LLM and bring-your-own-search: the same loop runs against a local Ollama model and SearXNG, or against a hosted OpenAI-compatible endpoint with Brave/Tavily as a fallback. The interesting engineering is in the cooperation between the four moving parts — `DeepResearcher`, `ResearchHandler`, the search provider chain, and `visual_report` — and the failure modes they were each built to survive.

## The Pattern: Plan → Think → Search → Extract → Synthesize → Decide

`DeepResearcher.research()` is the whole engine. After an optional one-shot category classification, it spins a `for round_num in range(1, max_rounds + 1)` loop where every step is a prompted LLM call (`src/deep_research.py:224-338`):

1. **Plan** (`_create_plan`) — a single call turns the question into `sub_questions`, `key_topics`, and a `success_criteria` sentence (`src/deep_research.py:25-46`, `:361-386`).
2. **Think** (`_generate_queries`) — produces 4 broad queries on round 1, then 3 targeted follow-ups per round, with the round-specific instruction switched in the prompt (`src/deep_research.py:422-462`). Already-used queries are deduped via `self.queries_used`.
3. **Search** (`_search`) — runs each query against the configured provider chain (see below) and only adds a provider to `self.providers_used` if it actually returned results, so the visual report can later credit the engines that carried the work (`src/deep_research.py:209-213`, `:511-544`).
4. **Extract** (`_fetch_and_extract`) — fetches each new URL with `src.search.fetch_webpage_content`, truncates to `max_content_chars` at a paragraph boundary, and asks the LLM to return JSON `{summary, evidence, rational}` using `EXTRACTOR_PROMPT` from `goal_based_extractor.py` (`src/deep_research.py:546-602`).
5. **Synthesize** (`_synthesize`) — folds the last `synthesis_window` (default 10) findings into the evolving report in a single prompt that says "remove redundancy, resolve contradictions, keep URLs as inline citations" (`src/deep_research.py:68-85`, `:607-632`).
6. **Decide** (`_should_stop`) — once `round_num >= min_rounds`, a tiny YES/NO prompt asks the LLM if the report is comprehensive enough. The parser tolerates `**YES**`, leading quotes, and thinking-block prefixes (`src/deep_research.py:87-106`, `:637-663`).

When the loop exits, `_final_report` rewrites the evolving notes into a long-form article using `FINAL_REPORT_PROMPT`. If the result is under 400 words, a follow-up message asks the model to expand it; the longer version is kept only if it actually grew (`src/deep_research.py:108-127`, `:668-714`).

```text
            ┌─────────── DeepResearcher.research() ──────────┐
            │  plan ─▶ ┌──── for each round ────┐ ─▶ final  │
            │           │ think ─▶ search ─▶    │           │
            │           │ extract ─▶ synth ─▶   │           │
            │           │ decide (YES → break)  │           │
            │           └────────────────────────┘           │
            └────────────────────────────────────────────────┘
```

## Category-Aware Output: Five Prompts, One Engine

A second LLM call (`_classify_category`) buckets the question into `product`, `comparison`, `howto`, `factcheck`, or `general`, and the matching `CATEGORY_PROMPTS` snippet is appended to the final-report prompt as an `IMPORTANT FORMAT OVERRIDE` block (`src/deep_research.py:129-160`, `:674-676`). The same evidence ends up structured very differently:

| Category    | Output shape forced by the override prompt                          |
|-------------|---------------------------------------------------------------------|
| `product`   | Ranked list with quick-compare table and **Verdict** section        |
| `comparison`| Criteria-by-option markdown table + per-option strengths            |
| `howto`     | Quick Guide, Prerequisites, numbered detailed steps, Common Mistakes|
| `factcheck` | The Claim, Evidence For/Against, Verdict (Supported/Mixed/Unsupported)|

The classifier is defensive: weak local models often answer "the category is product" instead of `product`, so it falls back to a substring scan before defaulting to `None` (`src/deep_research.py:404-414`).

## SearXNG, With a Fallback Chain

The research loop never hard-codes a search engine. `_search` reads `research_search_provider` (or the global `search_provider`, default `searxng`) and then calls `_build_provider_chain` to assemble the order to try (`src/deep_research.py:511-544`, `services/search/core.py:91-105`).

```text
primary (searxng | brave | tavily | …) ─▶ user fallbacks ─▶ default [duckduckgo]
```

`_call_provider` dispatches to one of six provider modules — `searxng`, `brave`, `duckduckgo`, `google_pse`, `tavily`, `serper` (`services/search/core.py:71-85`). For non-research callers, `searxng_search_results` adds query-keyed disk caching with per-query expiry, a 2-attempt retry per provider, and post-call re-ranking (`services/search/core.py:111-189`). The research engine itself wraps the synchronous provider call in `asyncio.to_thread` and fans queries out in parallel (`src/deep_research.py:467-509`).

A subtle failure mode is handled: if two consecutive rounds return zero new findings (`max_empty_rounds`), the loop emits a `phase="error"` event with the captured `_last_search_error` and either returns a "Search unavailable" message (no findings at all) or stops gracefully and synthesizes what it has (`src/deep_research.py:295-308`).

## The Handler: Background Tasks, Hard Timeouts, Resumable Reports

`ResearchHandler` is the operational shell around `DeepResearcher`. Three things make it the part that survives production rather than just running once (`src/research_handler.py:25-270`):

- **Probe-before-commit.** Every research call first sends a one-token "hi" to the model. A 401 becomes "Model X requires an API key"; a connection error becomes "Cannot reach model X" — both raised before any expensive round starts (`src/research_handler.py:552-578`).
- **Hard wall-clock timeout.** The whole run sits inside `asyncio.wait_for(..., timeout=hard_timeout)`. If it fires, the handler checks the researcher's `evolving_report` and saves whatever was already synthesized, rather than discarding the run (`src/research_handler.py:208-259`).
- **Resumable + ownership-stamped persistence.** Each completed run is written to `data/deep_research/<session_id>.json` with `query`, `result`, `raw_report`, `sources`, `raw_findings`, `stats`, `category`, and `owner` (`src/research_handler.py:445-480`). The "spinoff" endpoint can then call `researcher.research(prior_report=..., prior_findings=..., prior_urls=...)` to continue a previous report instead of starting over (`src/deep_research.py:224-260`).

Two small but telling defenses live in `research_utils.py`: `strip_thinking` removes `<think>` blocks from reasoning models (without this, `_should_stop` would see `<THINK>...` and never stop), and `is_low_quality` filters extracted summaries containing markers like `"insufficient to"`, `"cookie"`, `"boilerplate"`, so the synthesis prompt isn't fed junk (`src/research_utils.py:12-56`, `src/deep_research.py:584-589`).

## The HTTP Surface: Start, Stream, Hide-an-Image

`routes/research_routes.py` exposes the engine over a small REST + Server-Sent-Events surface (`routes/research_routes.py:48-433`):

| Endpoint                                  | Purpose                                                 |
|-------------------------------------------|---------------------------------------------------------|
| `POST /api/research/start`                | Allocate `rp-<uuid>`, resolve endpoint, kick off task   |
| `GET  /api/research/stream/{sid}`         | SSE feed of `progress` events at ~1.5s tick             |
| `GET  /api/research/status/{sid}`         | One-shot status                                         |
| `POST /api/research/cancel/{sid}`         | Cooperative cancel via `researcher.cancel()`            |
| `POST /api/research/result/{sid}`         | Final markdown                                          |
| `GET  /api/research/report/{sid}`         | Visual HTML report                                      |
| `POST /api/research/{sid}/hide-image`     | Persist a hidden image URL in the JSON                  |
| `POST /api/research/spinoff/{sid}`        | Continue research from a prior report                   |

Endpoint resolution prefers a dedicated `research` endpoint, then falls back through `utility`, `default`, `chat`, and finally the first enabled `ModelEndpoint`. A small but important filter — `_first_chat_model` — skips known non-chat model names (`text-embedding`, `whisper`, `dall-e`, etc.) so research doesn't accidentally try to "complete" with `text-embedding-ada-002`, which was the actual bug the comment documents (`routes/research_routes.py:21-34`, `:351-388`).

Ownership is enforced at every read: `_owns_in_memory` checks the in-memory task's `owner` first, then falls back to the on-disk JSON's `owner` field for finished runs (`routes/research_routes.py:61-74`). Without this, refresh-after-completion would leak other users' reports.

A second consumer of the engine — `services/research/service.py` — exposes the same handler as a clean `async def research(topic, ...) -> ResearchResult` API, with a dataclass-shaped result (`services/research/service.py:30-117`). It's a thin façade over `ResearchHandler.call_research_service`, useful when the engine is embedded by other Python code rather than driven over HTTP.

## The Visual Report: One Big String of HTML

`src/visual_report.py` is bigger than the engine itself (~1,833 lines) but it's almost entirely a self-contained HTML template — no remote fonts, no external CSS, dark/light via `prefers-color-scheme`. The Python is the templating layer:

- `_md_to_html` autolinks bare URLs, runs `markdown` with `extra`/`codehilite`/`toc`/`tables`/`sane_lists`, and rewrites every external `<a href="https://...">` to add `target="_blank" rel="noopener noreferrer"` (`src/visual_report.py:33-63`).
- `_extract_headings` builds a TOC from `##`/`###` and, if the model emitted bold lines instead of headings, falls back to promoting `**Lead-in:**` lines to `h2` (`src/visual_report.py:66-89`, `:1642-1649`).
- `_inject_images` walks Open-Graph images collected from each source, picks index 0 as a hero, and inserts the rest as `<figure>` elements after every second `</h2>` boundary — with a guarded blocklist for icons/favicons/logos (`src/visual_report.py:1661-1707`).
- Every injected `<figure>` carries a reroll/hide button overlay (`_IMG_OVERLAY_BTNS`). The spare image pool is embedded in the page so the reroll button can swap an irrelevant image client-side without another server round-trip (`src/visual_report.py:95-138`, `:1703-1708`).
- Hidden images are persisted: `POST /api/research/{sid}/hide-image` appends to `hidden_images` in the JSON, and the next render filters them out (`src/research_handler.py:519-535`, `routes/research_routes.py:158-178`).

The stats bar at the top — Duration, Rounds, Queries, URLs Analyzed, Model, Search — is built directly from `DeepResearcher.get_stats()`, which is why the only providers shown are the ones that actually returned hits during the run (`src/deep_research.py:806-820`, `src/visual_report.py:1720-1727`).

## The Architecture, in One Picture

```mermaid
flowchart TB
  subgraph UI["Browser"]
    panel["Research panel"]
    sse["SSE: /api/research/stream"]
    report["Visual HTML report"]
  end

  subgraph API["FastAPI routes (routes/research_routes.py)"]
    start["POST /api/research/start"]
    stream["GET  /api/research/stream/{sid}"]
    htmlR["GET  /api/research/report/{sid}"]
    hide["POST /api/research/{sid}/hide-image"]
  end

  subgraph Handler["src/research_handler.py"]
    rh["ResearchHandler<br/>_active_tasks, probe, hard timeout"]
    disk[("data/deep_research/<br/>{sid}.json")]
  end

  subgraph Engine["src/deep_research.py"]
    dr["DeepResearcher<br/>plan → loop → final"]
    prompts["RESEARCH_PLAN_PROMPT<br/>QUERY_GEN_PROMPT<br/>SYNTHESIZE_PROMPT<br/>STOP_PROMPT<br/>FINAL_REPORT_PROMPT<br/>CATEGORY_PROMPTS"]
  end

  subgraph Search["services/search/core.py"]
    chain["_build_provider_chain"]
    providers["searxng | brave | duckduckgo<br/>google_pse | tavily | serper"]
  end

  subgraph Render["src/visual_report.py"]
    md2html["_md_to_html"]
    inject["_inject_images + TOC"]
    tmpl["_TEMPLATE (HTML+CSS+JS)"]
  end

  llm[("BYO LLM<br/>OpenAI-compatible")]

  panel --> start --> rh --> dr
  dr -->|prompts| llm
  dr --> chain --> providers
  rh --> disk
  panel --> sse --> stream --> rh
  panel --> htmlR --> rh --> Render --> report
  report -->|reroll/hide| hide --> rh --> disk
```

Sources: [src/deep_research.py:163-338](), [src/research_handler.py:25-270](), [services/search/core.py:71-105](), [routes/research_routes.py:48-433](), [src/visual_report.py:1621-1727]()

## What Builders Should Notice

A few details are worth pulling out for anyone building something similar:

- **The LLM owns every decision.** Query generation, extraction, synthesis, stop, and final formatting are all prompts — there's no rules engine or scoring heuristic deciding when to stop. The price is a `min_rounds`/`max_rounds` safety frame plus the empty-rounds counter; the reward is that a smarter local model produces a smarter loop without code changes.
- **Defensive JSON parsing.** `_parse_json_array` first tries `json.loads`, then a greedy `[…]` regex, then — if the array was truncated mid-string — recovers complete quoted items with a regex sweep so an aborted response still yields usable queries (`src/deep_research.py:741-773`).
- **Probe before you spend.** The 5-token probe in `_probe_endpoint` is cheap insurance against burning 5 minutes on a misconfigured endpoint, and the wrapped error messages turn raw HTTP failures into actionable settings advice (`src/research_handler.py:552-578`).
- **Provider neutrality is enforced by chains, not branches.** The same `_build_provider_chain` is used by deep research and ad-hoc search; swapping SearXNG for Brave or chaining `["brave","duckduckgo"]` is a setting, not a code path (`services/search/core.py:91-105`).
- **The HTML report is a single string.** Because `_TEMPLATE` is inlined CSS+JS with no remote assets, the rendered page is portable — saveable, shareable, and printable without losing styling. The reroll button works because the spare image pool is embedded in the page; the hide button works because the server persists the dismissal back into the run's JSON.

For an ~800-line implementation, this is a lot of moving parts wired with restraint: a single loop file, a single handler file, a single search-core file, and a single template file — each one owning one boundary, and each one written to fail gracefully when the LLM, the search engine, or the page fetch misbehaves.

Sources: [src/research_handler.py:208-259](), [src/research_handler.py:552-578](), [src/deep_research.py:741-773](), [services/search/core.py:91-189](), [src/visual_report.py:145-200]()