# Karpathy LLM wiki workflow

> Source-grounded knowledge base pattern with `research/` and `articles/` folders, starter packs, and MCP `workflow` kinds `ingest`, `research`, and `consolidate`.

- Repository: sashimikun/open-knowledge
- GitHub: https://github.com/sashimikun/open-knowledge
- Human docs: https://grok-wiki.com/public/docs/sashimikun-open-knowledge-5c45105c876e
- Complete Markdown: https://grok-wiki.com/public/docs/sashimikun-open-knowledge-5c45105c876e/llms-full.txt

## Source Files

- `docs/content/workflows/karpathy-llm-wiki.mdx`
- `packages/server/src/mcp/tools/workflow.ts`
- `packages/server/src/mcp/tools/ingest-body.ts`
- `packages/server/src/mcp/tools/research-body.ts`
- `packages/server/src/mcp/tools/consolidate-body.ts`
- `packages/cli/src/commands/seed.ts`
- `docs/content/features/agent-activity.mdx`

---

---
title: "Karpathy LLM wiki workflow"
description: "Source-grounded knowledge base pattern with `research/` and `articles/` folders, starter packs, and MCP `workflow` kinds `ingest`, `research`, and `consolidate`."
---

Open Knowledge implements Andrej Karpathy's LLM-curated wiki pattern as the default `knowledge-base` starter pack (`ok seed`, pack id `knowledge-base`), scaffolds three content folders under the project `content.dir`, and wires the MCP `workflow` tool's `ingest`, `research`, and `consolidate` kinds to procedural guides in `packages/server/src/mcp/tools/`. The pack splits Karpathy's single `wiki/` layer into `research/` (provisional) and `articles/` (canonical) so promotion is an explicit `consolidate` step rather than accidental drift.

## Pattern overview

Karpathy's gist describes three persistent layers plus operational metadata:

| Layer | Role | Karpathy name | Open Knowledge name |
| --- | --- | --- | --- |
| Raw sources | Verbatim external material; read, never edited | `raw/` | `external-sources/` |
| Wiki pages | LLM-synthesized summaries and cross-links | `wiki/` (single layer) | `research/` + `articles/` (split) |
| Audit trail | Append-only record of ingest, query, and lint activity | `log.md` | `log.md` at pack root |
| Schema | Tells the agent how folders behave | `CLAUDE.md` / `AGENTS.md` | `<folder>/.ok/frontmatter.yml` + per-folder templates |

Open Knowledge adds an explicit promotion gate: `research/` articles carry `status: provisional`; `articles/` carry `status: canonical` and a `supersedes:` chain back to the research they replace.

```mermaid
flowchart TB
  subgraph ingest_layer["external-sources/"]
    ES["Raw wrappers + binaries<br/>tags: source, immutable, layer-ingest"]
  end
  subgraph research_layer["research/"]
    R["Provisional synthesis<br/>status: provisional<br/>sources: frontmatter list"]
  end
  subgraph canonical_layer["articles/"]
    A["Canonical decisions<br/>status: canonical<br/>supersedes: research paths"]
  end
  LOG["log.md — append-only audit"]
  WF["MCP workflow tool"]
  WF -->|kind: ingest| ES
  WF -->|kind: research| R
  WF -->|kind: consolidate| A
  ES -->|cite local paths| R
  R -->|promote after decision| A
  ES & R & A --> LOG
```

<Note>
The knowledge base is **closed-loop**: downstream docs cite local paths under `external-sources/`, not bare web URLs. Agent-initiated fetches for grounding claims follow the same `ingest` discipline as user-shared URLs.
</Note>

## Starter pack layout

`ok seed` (default pack `knowledge-base`) creates three folders, three templates, folder frontmatter, and `log.md`. Default subfolder suggestion is `brain/`; pass `--root` to nest the pack or leave blank for project root.

:::files
your-project/
├── external-sources/
│   └── .ok/
│       ├── frontmatter.yml
│       └── templates/clip.md
├── research/
│   └── .ok/
│       ├── frontmatter.yml
│       └── templates/research-log.md
├── articles/
│   └── .ok/
│       ├── frontmatter.yml
│       └── templates/article.md
└── log.md
:::

| Folder | Template | `status` | Produced by |
| --- | --- | --- | --- |
| `external-sources/` | `clip` | — (immutable source) | `workflow({ kind: "ingest" })` |
| `research/` | `research-log` | `provisional` | `workflow({ kind: "research" })` |
| `articles/` | `article` | `canonical` | `workflow({ kind: "consolidate" })` |

Each folder's `.ok/frontmatter.yml` carries a `description` the agent reads on every `exec("ls …")` listing — distributed schema instead of a single root `AGENTS.md`. The `open-knowledge-pack-knowledge-base` skill (installed by `ok seed --pack knowledge-base`) holds workflow rules so template bodies stay structural only.

## MCP `workflow` tool

The `workflow` tool returns numbered procedural guides (instructional text, not data). The agent executes the guide using `exec`, `write`, `edit`, `links`, and other MCP tools. `previewUrl` is always `null` for workflow responses.

<ParamField body="kind" type="enum" required>
`ingest` | `research` | `consolidate` | `discover` | `wiki`. Karpathy's three-layer pipeline uses the first three.
</ParamField>

<ParamField body="source" type="string">
Required when `kind` is `ingest`. URL, local file path, or identifier to capture verbatim.
</ParamField>

<ParamField body="topic" type="string">
Required when `kind` is `research` or `consolidate`. Question, topic, or anchor URL.
</ParamField>

<ParamField body="cwd" type="string">
Optional project root override. Defaults to the connected project's `content.dir` from `.ok/config.yml`.
</ParamField>

### `ingest` — preserve raw sources

`workflow({ kind: "ingest", source: "https://…" })` returns a guide that:

- Sanity-checks scope, duplicate sources, and intent (preserve vs analyze)
- Classifies binary vs text sources; downloads binaries to `external-sources/<slug>.<ext>` with `sha256` and `bytes` metadata
- Writes a markdown wrapper with `preservation: binary` or `preservation: text-extracted`, `source_url`, and tags `source`, `immutable`, `layer-ingest`
- Refuses analysis, summarization, and silent chaining into `research`
- On sha256 mismatch for an existing slug, appends dated siblings (`<slug>.YYYY-MM-DD.<ext>`) with `supersedes:` — the layer is append-only by convention

Binary wrappers embed assets via `![[file.ext]]`; text wrappers preserve extracted content verbatim in the body.

### `research` — provisional synthesis

`workflow({ kind: "research", topic: "…" })` returns a guide that:

- Creates checkpoint tasks before any external fetch
- Scans existing coverage (`grep`, `ls`, `cat`) and routes to Path A (new article), Path B (chat-only answer), or Path C (update existing)
- Stops at a scoping gate in supervised mode until the user confirms a research rubric
- Calls `ingest` per source before analyzing — persist-as-you-go is mandatory
- Writes under `research/<slug>.md` with `status: provisional` and a `sources:` frontmatter list
- Validates with `links({ kind: "dead" })` and cross-links neighbor docs
- Files valuable Q&A back as short pages (Karpathy's query step)

Default article structure: Question, Context, Findings (with inline source links), Trade-offs, Open questions, Tentative recommendation, Further reading.

### `consolidate` — promote to canonical

`workflow({ kind: "consolidate", topic: "…" })` returns a guide that:

- **STOP gate:** confirms a real team decision before any write; returns early if still provisional
- Loads research articles and their `sources:` chain
- Writes under `articles/<slug>.md` with `status: canonical` and `supersedes: [research-path.md]`
- Updates superseded research with `superseded_by:` — research is never deleted
- Uses direct voice (decisions stated as decisions, not options)

Canonical structure: Summary, Context, Decision, Rationale, Trade-offs, Alternatives considered, Implementation notes, Further reading.

## End-to-end procedure

<Steps>
<Step title="Initialize the project">

Run `ok init` to scaffold `.ok/`, git, skills, and MCP registration. See [Initialize a project](/initialize-project).

</Step>

<Step title="Seed the knowledge-base pack">

```bash
ok seed --pack knowledge-base --yes
# or nest under a subfolder:
ok seed --pack knowledge-base --root brain --yes
```

Verify folders with `exec("ls external-sources research articles")` or the editor sidebar. Seeding is idempotent — existing entries are skipped.

</Step>

<Step title="Wire an MCP agent">

Confirm `workflow` appears in the agent's tool list alongside `exec`, `write`, `edit`, and `links`. See [Wire agent editors](/wire-agent-editors).

</Step>

<Step title="Ingest sources">

For each external URL or file:

```
workflow({ kind: "ingest", source: "https://example.com/spec" })
```

Expect wrappers under `external-sources/` with `source_url`, `date_fetched`, and preservation metadata. Five sources → five files.

</Step>

<Step title="Research the topic">

```
workflow({ kind: "research", topic: "agent-framework evaluation" })
```

Confirm `research/<slug>.md` exists with `status: provisional`, populated `sources:`, and inline citations to `external-sources/` paths.

</Step>

<Step title="Query and file answers">

Ask the agent questions against the research doc. Valuable answers become new `research/` pages or short Q&A files — not chat-only history.

</Step>

<Step title="Consolidate after a decision">

Only when the team has decided:

```
workflow({ kind: "consolidate", topic: "agent-framework evaluation" })
```

Confirm `articles/<slug>.md` with `status: canonical`, `supersedes:` chain, and `superseded_by:` on the research doc.

</Step>
</Steps>

## Promotion rhythm

| Trigger | Tool | Output layer |
| --- | --- | --- |
| Source arrives (URL, PDF, transcript) | `ingest` | `external-sources/` |
| Agent fetched the web to ground a claim | `ingest` | `external-sources/` |
| Synthesizing 2+ sources into findings | `research` | `research/` (`provisional`) |
| Team decided; position is canonical | `consolidate` | `articles/` (`canonical`) |
| Scratch note or runbook | `write` | Any folder (outside the three-kind pipeline) |
| Useful one-off query answer | `write` to `research/` | Provisional page, not chat |

<Warning>
**Consolidate too early.** Canonical articles that precede real decisions require constant rewrites. Keep uncertainty in `research/` until the decision is firm.

**Ingest your own thoughts.** `ingest` is for external sources preserved verbatim. Reflections belong in `research/` or a separate notes folder.

**Bare web URLs in body text.** An inline `https://…` link inside a knowledge-base doc is a TODO — run `ingest` first, then cite `./external-sources/<slug>.md`.
</Warning>

## Log discipline

`log.md` at the pack root is append-only. After any turn that creates, edits, or restructures content, append one dated entry covering:

- `ingest`, `research`, or `consolidate` runs
- Direct `write` / `edit` / `move` / `delete` outside the three workflow kinds
- Folder restructures and `.ok/config.yml` changes

Reference touched docs as markdown links (`[doc](./path/doc.md)`) so entries register in `links({ kind: "backlinks" })`.

## Composing features

| Concern | Surface |
| --- | --- |
| Agent-readable folder rules | `<folder>/.ok/frontmatter.yml` on `exec("ls")` |
| Template instantiation | `write({ document: { path, template: "research-log" } })` |
| Edit attribution | Agent activity panel + shadow-repo timeline |
| Link graph hygiene | `links` (`dead`, `orphans`, `backlinks`) |
| Closed-loop citations | `sources:` frontmatter + inline `./external-sources/…` links |
| Crash-safe long research | Persist-as-you-go: ingest per source, edit article section-by-section |

## Cadence

| When | Action |
| --- | --- |
| As sources arrive | `ingest` (~30 s per source) |
| Weekly | `research` pass on recent ingests; flag contradictions |
| Per decision | `consolidate` to `articles/` |
| Monthly | `links({ kind: ["dead", "orphans"] })` lint pass on `articles/` |

End each ingest session with one synthesis query; file the answer in `research/` so the vault compounds.

## Karpathy mapping reference

| Karpathy operation | Open Knowledge equivalent |
| --- | --- |
| Ingest (touch many wiki pages) | `ingest` → optional neighbor link updates (Step 7 in ingest guide) |
| Query (search + synthesize + file back) | Agent `search` + `exec` + `research`; file Q&A as new pages |
| Lint (contradictions, orphans, stale claims) | `links` tool + periodic agent prompts |
| Index (`index.md`) | Dynamic via `exec("ls")` enrichment + sidebar; optional hand-written `index.md` |
| Schema (`CLAUDE.md`) | Per-folder `.ok/frontmatter.yml` + `knowledge-base` pack skill |

## Related pages

<CardGroup>
<Card title="Quickstart" href="/quickstart">
Run `ok init`, `ok start --open`, and confirm MCP tools respond before seeding the pack.
</Card>
<Card title="Initialize a project" href="/initialize-project">
Scaffold `.ok/`, git, skills, and editor MCP registration.
</Card>
<Card title="Folders and templates" href="/folders-and-templates">
Folder frontmatter cascade and `write({ document: { template } })` resolution.
</Card>
<Card title="MCP tools reference" href="/mcp-tools-reference">
Full `workflow` input schema and the other sixteen MCP tools.
</Card>
<Card title="CLI reference" href="/cli-reference">
`ok seed` flags (`--pack`, `--root`, `--dry-run`, `--list-packs`).
</Card>
<Card title="Entity vault workflow" href="/entity-vault">
Alternative starter pack for people, companies, and meeting timelines instead of source-grounded research.
</Card>
</CardGroup>
