# The Mental Model: Graph-First Codebase Understanding

> The single simplest model of the whole system: a CLI plugin triggers a sequential multi-agent pipeline that emits one JSON knowledge graph, which is then rendered as an interactive React Flow dashboard. Everything else — tree-sitter extractors, parsers, staleness checks, Zustand store — supports this central invariant. Understanding this flow lets you predict where any new feature, bug, or change lands.

- Repository: Lum1104/Understand-Anything
- GitHub: https://github.com/Lum1104/Understand-Anything
- Human wiki: https://grok-wiki.com/public/wiki/lum1104-understand-anything-3b923df96896
- Complete Markdown: https://grok-wiki.com/public/wiki/lum1104-understand-anything-3b923df96896/llms-full.txt

## Source Files

- `README.md`
- `understand-anything-plugin/skills/understand/SKILL.md`
- `understand-anything-plugin/package.json`
- `understand-anything-plugin/packages/core/src/types.test.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [README.md](README.md)
- [understand-anything-plugin/skills/understand/SKILL.md](understand-anything-plugin/skills/understand/SKILL.md)
- [understand-anything-plugin/skills/understand-dashboard/SKILL.md](understand-anything-plugin/skills/understand-dashboard/SKILL.md)
- [understand-anything-plugin/package.json](understand-anything-plugin/package.json)
- [understand-anything-plugin/packages/core/src/types.ts](understand-anything-plugin/packages/core/src/types.ts)
- [understand-anything-plugin/packages/core/src/types.test.ts](understand-anything-plugin/packages/core/src/types.test.ts)
- [understand-anything-plugin/packages/core/src/schema.ts](understand-anything-plugin/packages/core/src/schema.ts)
- [understand-anything-plugin/packages/core/src/search.ts](understand-anything-plugin/packages/core/src/search.ts)
- [understand-anything-plugin/packages/core/src/staleness.ts](understand-anything-plugin/packages/core/src/staleness.ts)
- [understand-anything-plugin/packages/dashboard/src/App.tsx](understand-anything-plugin/packages/dashboard/src/App.tsx)
- [understand-anything-plugin/packages/dashboard/src/store.ts](understand-anything-plugin/packages/dashboard/src/store.ts)
- [understand-anything-plugin/packages/dashboard/src/components/GraphView.tsx](understand-anything-plugin/packages/dashboard/src/components/GraphView.tsx)
- [understand-anything-plugin/agents/project-scanner.md](understand-anything-plugin/agents/project-scanner.md)
</details>

# The Mental Model: Graph-First Codebase Understanding

Understand Anything is organized around one central invariant: a CLI skill (`/understand`) triggers a sequential multi-agent pipeline that produces a single JSON file — the **knowledge graph** — which a React Flow dashboard then renders as an interactive, explorable visualization. Every subsystem in the project either helps produce that JSON, validate it, keep it fresh, or display it. Understanding this pipeline-to-graph-to-dashboard flow lets you confidently predict where any feature, bug, or change belongs.

This page explains the end-to-end flow, the data contract that holds it together, and the architectural rules that keep each boundary clean. It is the starting point before reading any other part of the codebase.

---

## The Central Invariant

```text
CLI skill invoked
      │
      ▼
┌─────────────────────────────────────────┐
│         Multi-Agent Pipeline            │
│  (project-scanner → file-analyzer ×N   │
│   → architecture-analyzer → tour-builder│
│   → [reviewer])                         │
└────────────────┬────────────────────────┘
                 │ writes
                 ▼
    .understand-anything/
        knowledge-graph.json     ◄── THE OUTPUT
                 │
                 │ loaded by
                 ▼
┌─────────────────────────────────────────┐
│         React Flow Dashboard            │
│  (GraphView · sidebar · search ·        │
│   layers · tour · diff overlay)         │
└─────────────────────────────────────────┘
```

One JSON artifact is the only handoff between the pipeline and the UI. Agents never talk to the dashboard directly; the dashboard never re-runs agents. This strict separation means agents can be swapped, retried, or run incrementally without touching any UI code, and the dashboard can be deployed anywhere that can serve a static JSON file.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:683](understand-anything-plugin/skills/understand/SKILL.md), [understand-anything-plugin/packages/dashboard/src/App.tsx:133-163](understand-anything-plugin/packages/dashboard/src/App.tsx)

---

## The Knowledge Graph Schema: The Contract

The `KnowledgeGraph` type is the data contract every agent writes to and every dashboard component reads from. It is defined once in `@understand-anything/core`:

```typescript
// understand-anything-plugin/packages/core/src/types.ts:91-99
export interface KnowledgeGraph {
  version: string;
  kind?: "codebase" | "knowledge";
  project: ProjectMeta;
  nodes: GraphNode[];
  edges: GraphEdge[];
  layers: Layer[];
  tour: TourStep[];
}
```

The top-level structure has five concerns:

| Field | Purpose | Populated by |
|---|---|---|
| `project` | Name, languages, frameworks, timestamp, git hash | Phase 0/1 (project-scanner) |
| `nodes` | Every file, function, class, config, service, etc. | Phase 2 (file-analyzer) |
| `edges` | All relationships between nodes | Phase 2 (file-analyzer) |
| `layers` | Architectural groupings (API, Service, Data, UI…) | Phase 4 (architecture-analyzer) |
| `tour` | Ordered learning steps referencing node IDs | Phase 5 (tour-builder) |

Sources: [understand-anything-plugin/packages/core/src/types.ts:38-99](understand-anything-plugin/packages/core/src/types.ts), [understand-anything-plugin/packages/core/src/types.test.ts:4-28](understand-anything-plugin/packages/core/src/types.test.ts)

### Node Types

The schema supports 21 node types across four domains:

| Domain | Types |
|---|---|
| Code | `file`, `function`, `class`, `module`, `concept` |
| Non-code | `config`, `document`, `service`, `table`, `endpoint`, `pipeline`, `schema`, `resource` |
| Domain | `domain`, `flow`, `step` |
| Knowledge | `article`, `entity`, `topic`, `claim`, `source` |

Node IDs follow a `<type>:<relative-path>` convention (e.g., `file:src/auth/login.ts`, `function:src/auth/login.ts:handleLogin`). This makes IDs human-readable and stable across incremental updates.

Sources: [understand-anything-plugin/packages/core/src/types.ts:1-7](understand-anything-plugin/packages/core/src/types.ts)

### Edge Types

Thirty-five edge types span eight semantic categories:

```
Structural:     imports, exports, contains, inherits, implements
Behavioral:     calls, subscribes, publishes, middleware
Data flow:      reads_from, writes_to, transforms, validates
Dependencies:   depends_on, tested_by, configures
Semantic:       related, similar_to
Infrastructure: deploys, serves, provisions, triggers
Domain:         contains_flow, flow_step, cross_domain
Knowledge:      cites, contradicts, builds_on, exemplifies, categorized_under, authored_by
```

Every edge has a `weight` (0–1) encoding relationship strength, and a `direction` (`forward`, `backward`, `bidirectional`). The schema layer (`schema.ts`) normalizes LLM-generated aliases at load time — for example `"func"` → `"function"`, `"container"` → `"service"` — so downstream code only ever sees canonical types.

Sources: [understand-anything-plugin/packages/core/src/types.ts:9-19](understand-anything-plugin/packages/core/src/types.ts), [understand-anything-plugin/packages/core/src/schema.ts:1-60](understand-anything-plugin/packages/core/src/schema.ts)

---

## The Multi-Agent Pipeline

The `/understand` skill orchestrates a **sequential, phase-gated pipeline** with parallelism inside Phase 2. Phases hand off via files written to `.understand-anything/intermediate/` — no agent returns data directly to the orchestrating skill context.

```mermaid
sequenceDiagram
    participant User
    participant Skill as /understand (SKILL.md)
    participant Scanner as project-scanner
    participant Analyzers as file-analyzer ×N (≤5 concurrent)
    participant Assembler as merge-batch-graphs.py
    participant ArchAgent as architecture-analyzer
    participant TourAgent as tour-builder
    participant Reviewer as inline validator / graph-reviewer
    participant FS as .understand-anything/

    User->>Skill: /understand [options]
    Skill->>Skill: Phase 0 — pre-flight, staleness check
    Skill->>Scanner: Phase 1 — scan project
    Scanner->>FS: scan-result.json
    Skill->>Analyzers: Phase 2 — analyze batches (parallel)
    Analyzers->>FS: batch-0.json … batch-N.json
    Skill->>Assembler: merge-batch-graphs.py
    Assembler->>FS: assembled-graph.json
    Skill->>ArchAgent: Phase 4 — layer assignment
    ArchAgent->>FS: layers.json
    Skill->>TourAgent: Phase 5 — tour generation
    TourAgent->>FS: tour.json
    Skill->>Reviewer: Phase 6 — validate assembled graph
    Reviewer->>FS: review.json
    Skill->>FS: Phase 7 — write knowledge-graph.json
    Skill->>User: summary + auto-launch /understand-dashboard
```

### Phase 0: Pre-flight and Staleness

Before any agent is dispatched, the skill:
1. Resolves `PROJECT_ROOT` (supports cross-worktree redirect to avoid ephemeral paths).
2. Reads `.understand-anything/meta.json` to get the last `gitCommitHash`.
3. Runs `git diff <lastHash>..HEAD --name-only` to detect changed files.
4. Chooses between **full rebuild**, **incremental update** (only changed files), or **no-op** (graph current).

The staleness logic is also available as a library function in core:

```typescript
// understand-anything-plugin/packages/core/src/staleness.ts:34-43
export function isStale(projectDir: string, lastCommitHash: string): StalenessResult {
  const changedFiles = getChangedFiles(projectDir, lastCommitHash);
  return { stale: changedFiles.length > 0, changedFiles };
}
```

Sources: [understand-anything-plugin/skills/understand/SKILL.md:25-160](understand-anything-plugin/skills/understand/SKILL.md), [understand-anything-plugin/packages/core/src/staleness.ts:1-43](understand-anything-plugin/packages/core/src/staleness.ts)

### Phase 1: Scan

The `project-scanner` agent writes `scan-result.json` containing the full file list with `fileCategory` per file (`code`, `config`, `docs`, `infra`, `data`, `script`, `markup`), detected languages and frameworks, and a pre-resolved `importMap`. The skill stores `importMap` in memory for injection into Phase 2 batches, avoiding redundant import resolution work by agents.

Sources: [understand-anything-plugin/agents/project-scanner.md:1-8](understand-anything-plugin/agents/project-scanner.md), [understand-anything-plugin/skills/understand/SKILL.md:214-248](understand-anything-plugin/skills/understand/SKILL.md)

### Phase 2: Parallel File Analysis

Files are batched in groups of 20–30. Up to **5 `file-analyzer` agents run concurrently**, each producing a `batch-N.json`. After all batches complete, `merge-batch-graphs.py` runs a single-pass merge that:

- Combines nodes and edges across all batches
- Normalizes node IDs (strips double prefixes, adds missing type prefixes)
- Normalizes complexity values (`low` → `simple`, `high` → `complex`)
- Deduplicates by ID and by `(source, target, type)` triple
- Drops dangling edges referencing missing nodes
- Runs a `tested_by` linker that canonicalizes test-coverage edges and flips LLM-inverted ones

Output is `intermediate/assembled-graph.json`.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:259-328](understand-anything-plugin/skills/understand/SKILL.md)

### Phase 4: Architecture Layer Assignment

The `architecture-analyzer` agent takes all file-level nodes and all edges, applies language/framework context files (from `languages/` and `frameworks/` subdirectories), and assigns each node to an architectural layer. The skill normalizes the output: unwraps envelope JSON, renames `nodes` → `nodeIds`, synthesizes missing IDs, converts raw file paths to typed node IDs, and drops dangling references.

Layers are the only structural metadata that groups nodes for visualization. Without valid layers, no node appears in the layer-cluster view.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:369-437](understand-anything-plugin/skills/understand/SKILL.md)

### Phase 5: Tour Generation

The `tour-builder` agent produces an ordered array of `TourStep` objects. Each step has a title, description, and a list of node IDs to highlight. The skill normalizes field names (`nodesToInspect` → `nodeIds`), drops dangling node references, and sorts by `order`. Tours power the Learn persona in the dashboard sidebar.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:450-518](understand-anything-plugin/skills/understand/SKILL.md), [understand-anything-plugin/packages/core/src/types.ts:71-78](understand-anything-plugin/packages/core/src/types.ts)

### Phase 6: Validation

By default the skill runs an inline Node.js validator (written to `.understand-anything/tmp/ua-inline-validate.cjs`) that checks structural integrity: required node fields, duplicate IDs, dangling edge references, nodes missing from layers, and tour steps referencing absent nodes. With `--review`, a full LLM `graph-reviewer` agent runs instead. Issues trigger automated fixes (remove dangling edges, fill missing fields) before the final save.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:554-677](understand-anything-plugin/skills/understand/SKILL.md)

### Phase 7: Save and Cleanup

The skill writes `knowledge-graph.json`, generates a structural fingerprints baseline (required for correct future incremental updates — see issue #152 comment in the SKILL.md), writes `meta.json` with the current git hash, then removes all `intermediate/` and `tmp/` files. After a successful save, `/understand-dashboard` is auto-launched.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:683-736](understand-anything-plugin/skills/understand/SKILL.md)

---

## The Dashboard: Rendering the Graph

The dashboard (`packages/dashboard`) is a **React + Vite SPA**. It has no knowledge of agents or Node.js pipeline internals. Its only coupling to the pipeline is the JSON file it fetches at startup.

### Load Path

```typescript
// understand-anything-plugin/packages/dashboard/src/App.tsx:133-163
useEffect(() => {
  fetch(dataUrl("knowledge-graph.json", accessToken))
    .then((res) => res.json())
    .then((data: unknown) => {
      const result = validateGraph(data);   // schema validation
      if (result.success && result.data) {
        setGraph(result.data);              // Zustand store
        ...
      }
    });
}, [setGraph]);
```

The dashboard fetches `knowledge-graph.json` (and optionally `domain-graph.json`, `diff-overlay.json`, `config.json`, `meta.json`) from the Vite dev server. A token gate (`TokenGate`) protects the endpoints — the Vite server generates a one-time token printed to the terminal. In demo mode (`VITE_DEMO_MODE=true`), the token gate is bypassed and URLs come from environment variables.

Sources: [understand-anything-plugin/packages/dashboard/src/App.tsx:49-163](understand-anything-plugin/packages/dashboard/src/App.tsx)

### Schema Validation on Load

Before the graph reaches any React component, it passes through `validateGraph()` from `@understand-anything/core/schema`. Auto-correctable issues (alias normalization, missing optional fields) are silently fixed and logged; fatal structural errors surface as a `WarningBanner` in the UI.

Sources: [understand-anything-plugin/packages/dashboard/src/App.tsx:136-152](understand-anything-plugin/packages/dashboard/src/App.tsx), [understand-anything-plugin/packages/core/src/schema.ts:17-60](understand-anything-plugin/packages/core/src/schema.ts)

### Zustand Store: Single State Owner

The dashboard's entire runtime state lives in a Zustand store (`store.ts`). It owns the loaded `KnowledgeGraph`, the selected node, filters, personas, view mode, and navigation level. All components derive their display from this store via selectors.

Two layer indexes are maintained in the store simultaneously and intentionally kept separate:

- `nodeIdToLayerId` — first-wins mapping, used for navigation (drillIntoLayer, sidebar history)
- `nodeIdToLayerIds` — all-layers set, used for filter queries (a node in multiple layers should survive if any selected layer matches)

Sources: [understand-anything-plugin/packages/dashboard/src/store.ts:54-95](understand-anything-plugin/packages/dashboard/src/store.ts)

### React Flow and the Graph Layout

`GraphView.tsx` uses `@xyflow/react` (React Flow) for the interactive canvas. It translates `GraphNode[]` and `GraphEdge[]` from the store into React Flow `Node` and `Edge` objects, runs an ELK layout pass, and renders four custom node types:

| React Flow node type | Represents |
|---|---|
| `custom` | Individual graph nodes (file, function, class, etc.) |
| `layer-cluster` | A collapsed architectural layer |
| `portal` | Cross-layer entry/exit point |
| `container` | An expanded layer showing all member nodes |

The `ViewMode` in the store switches between `"structural"` (the main dependency graph), `"domain"` (business domains via `DomainGraphView`), and `"knowledge"` (wiki/knowledge base via `KnowledgeGraphView`). The same `KnowledgeGraph` JSON is reinterpreted for each view mode.

Sources: [understand-anything-plugin/packages/dashboard/src/components/GraphView.tsx:1-60](understand-anything-plugin/packages/dashboard/src/components/GraphView.tsx), [understand-anything-plugin/packages/dashboard/src/store.ts:17](understand-anything-plugin/packages/dashboard/src/store.ts)

### Search Engine

The `SearchEngine` class in `@understand-anything/core/search` wraps Fuse.js with a multi-field weighted index over `GraphNode[]`. It uses an OR-token strategy: a query like `"auth controller"` becomes `"auth | controller"` so either token matches.

```typescript
// understand-anything-plugin/packages/core/src/search.ts:14-25
const FUSE_OPTIONS = {
  keys: [
    { name: "name", weight: 0.4 },
    { name: "tags", weight: 0.3 },
    { name: "summary", weight: 0.2 },
    { name: "languageNotes", weight: 0.1 },
  ],
  threshold: 0.4,
  ...
};
```

The store instantiates a `SearchEngine` whenever a new graph is loaded. Search results reference node IDs, which the store uses to highlight matching nodes in the graph view.

Sources: [understand-anything-plugin/packages/core/src/search.ts:1-50](understand-anything-plugin/packages/core/src/search.ts), [understand-anything-plugin/packages/dashboard/src/store.ts:2](understand-anything-plugin/packages/dashboard/src/store.ts)

---

## Dependency Direction and Module Boundaries

The dependency graph flows strictly one way:

```text
understand-anything-plugin/
├── packages/core          ← shared types, schema, search, staleness (no UI, no Node.js in browser exports)
│     └── subpath exports: ./types  ./schema  ./search
├── packages/dashboard     ← imports core via subpath exports ONLY
│     (never imports core's main entry — avoids pulling Node.js modules into the browser)
└── src/                   ← skill TypeScript source (Node.js, imports core freely)

agents/                    ← LLM agent definitions (markdown), no TypeScript
skills/                    ← skill definitions (markdown), no TypeScript
```

The critical rule: the dashboard must only import from core's browser-safe subpath exports (`./types`, `./schema`, `./search`). The main entry point pulls in Node.js modules (`child_process`, `execFileSync` in `staleness.ts`) that will break Vite's browser build.

Sources: [CLAUDE.md](CLAUDE.md), [understand-anything-plugin/packages/dashboard/src/store.ts:1-9](understand-anything-plugin/packages/dashboard/src/store.ts), [understand-anything-plugin/packages/dashboard/src/components/GraphView.tsx:28-32](understand-anything-plugin/packages/dashboard/src/components/GraphView.tsx)

---

## Incremental Update Invariant

The graph is git-aware by design. `meta.json` stores the `gitCommitHash` at analysis time. On every subsequent `/understand` invocation, the skill compares that hash to HEAD. If files changed, only those files are re-analyzed; the merge script then surgically removes and replaces their nodes and edges in the existing graph.

A fingerprints baseline (written in Phase 7 before `meta.json`) feeds the `change-classifier` during auto-update mode. **If `meta.json` is written before the fingerprints baseline**, the auto-updater sees a fresh commit hash with no baseline to compare against and escalates every future commit to a full rebuild (issue #152). The SKILL.md enforces this ordering explicitly.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:685-706](understand-anything-plugin/skills/understand/SKILL.md), [understand-anything-plugin/packages/core/src/staleness.ts:44-55](understand-anything-plugin/packages/core/src/staleness.ts)

---

## What Breaks If the Design Changes

| Invariant | What breaks if violated |
|---|---|
| All agents write to `intermediate/`, never return data directly | The orchestrating skill's context window grows unbounded for large projects; retry logic breaks because there is no file to reuse |
| `knowledge-graph.json` is the single handoff | Dashboard has no way to load partial results; real-time agent streaming would require a protocol redesign |
| Dashboard imports only core subpath exports | Node.js modules (`child_process`, `execFileSync`) are bundled into the Vite build and crash in the browser |
| `meta.json` written after fingerprints baseline | Auto-update sees every file as structurally changed and runs a full rebuild on every commit |
| Node IDs follow `<type>:<path>` convention | Tour step and layer `nodeIds` arrays reference IDs that do not exist after incremental updates; the validator's dangling-ref checks catch this but it degrades the graph |
| Schema validation happens at dashboard load | Silent data corruption from buggy agents reaches the React Flow renderer and produces invisible nodes or crashes |

---

## Summary

The entire system is an expression of one idea: a sequential, file-mediated multi-agent pipeline produces one well-typed JSON artifact, and a stateless React dashboard renders it. The `KnowledgeGraph` schema — defined in `@understand-anything/core/src/types.ts` with 21 node types, 35 edge types, layers, and tour steps — is the contract that makes every phase independently testable and every component independently replaceable. Any new feature either enriches that JSON (add a field, a node type, or an edge category) or reads from it (add a dashboard panel, a filter, or a search mode).

Sources: [understand-anything-plugin/packages/core/src/types.ts:1-99](understand-anything-plugin/packages/core/src/types.ts)
