# Staleness Detection & Incremental Updates

> The knowledge graph is stored as .understand-anything/knowledge-graph.json alongside config.json (which records the last analyzed commit hash and user preferences such as language and autoUpdate). On each /understand invocation, staleness.ts compares git diff lastCommitHash..HEAD; if files changed, only affected nodes are removed and re-analyzed (incremental mode). --full forces a complete rebuild. The auto-update hook re-triggers analysis after every git commit when autoUpdate is true. Worktree redirect is a critical invariant: graphs generated inside a Claude Code worktree are redirected to the main repo root to prevent ephemeral loss.

- Repository: Lum1104/Understand-Anything
- GitHub: https://github.com/Lum1104/Understand-Anything
- Human wiki: https://grok-wiki.com/public/wiki/lum1104-understand-anything-3b923df96896
- Complete Markdown: https://grok-wiki.com/public/wiki/lum1104-understand-anything-3b923df96896/llms-full.txt

## Source Files

- `understand-anything-plugin/packages/core/src/staleness.ts`
- `understand-anything-plugin/skills/understand/SKILL.md`
- `understand-anything-plugin/hooks/hooks.json`
- `understand-anything-plugin/hooks/auto-update-prompt.md`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [understand-anything-plugin/packages/core/src/staleness.ts](understand-anything-plugin/packages/core/src/staleness.ts)
- [understand-anything-plugin/packages/core/src/fingerprint.ts](understand-anything-plugin/packages/core/src/fingerprint.ts)
- [understand-anything-plugin/packages/core/src/change-classifier.ts](understand-anything-plugin/packages/core/src/change-classifier.ts)
- [understand-anything-plugin/packages/core/src/types.ts](understand-anything-plugin/packages/core/src/types.ts)
- [understand-anything-plugin/hooks/hooks.json](understand-anything-plugin/hooks/hooks.json)
- [understand-anything-plugin/hooks/auto-update-prompt.md](understand-anything-plugin/hooks/auto-update-prompt.md)
- [understand-anything-plugin/skills/understand/SKILL.md](understand-anything-plugin/skills/understand/SKILL.md)
</details>

# Staleness Detection & Incremental Updates

Understand Anything maintains its knowledge graph as `.understand-anything/knowledge-graph.json` alongside two small sidecar files — `meta.json` (recording the last analyzed git commit hash and file count) and `config.json` (storing user preferences such as `autoUpdate` and `outputLanguage`). Every time `/understand` or the auto-update hook runs, the system compares the stored commit hash against `HEAD` to decide what work is actually necessary. The goal is to spend zero LLM tokens on cosmetic edits and the minimum possible tokens on genuine structural changes, while a full rebuild is reserved for sweeping rearchitecture.

This page explains the full staleness pipeline: how git-diff-based detection drives an incremental vs. full decision, how the structural fingerprint system distinguishes cosmetic from structural changes within a changed file set, how the graph is surgically patched without touching untouched nodes, and how two automatic hooks (post-commit and session-start) keep the graph current with no manual invocation.

---

## Storage Layout

```text
<project-root>/
└── .understand-anything/
    ├── knowledge-graph.json   ← the graph (nodes, edges, layers, tour)
    ├── meta.json              ← { gitCommitHash, lastAnalyzedAt, version, analyzedFiles }
    ├── config.json            ← { autoUpdate, outputLanguage }
    ├── fingerprints.json      ← per-file structural fingerprints baseline
    └── intermediate/          ← scratch space (cleaned up after each run)
```

`meta.json` is the staleness anchor. Its `gitCommitHash` is written only after `fingerprints.json` has been successfully built, so the two are always in sync. Writing `meta.json` before the fingerprint baseline would cause every subsequent auto-update to escalate to `FULL_UPDATE` (documented as issue #152 in `auto-update-prompt.md`).

`config.json` carries opt-in flags. The relevant TypeScript type is:

```ts
// understand-anything-plugin/packages/core/src/types.ts:117-119
export interface ProjectConfig {
  autoUpdate: boolean;
  outputLanguage?: string;
```

Sources: [understand-anything-plugin/packages/core/src/types.ts:80-119]()

---

## Staleness Detection

Staleness is determined purely by git: the system runs `git diff <lastCommitHash>..HEAD --name-only` and considers the graph stale if any files are listed in the output.

```ts
// understand-anything-plugin/packages/core/src/staleness.ts:13-29
export function getChangedFiles(
  projectDir: string,
  lastCommitHash: string,
): string[] {
  try {
    const output = execFileSync('git', ['diff', `${lastCommitHash}..HEAD`, '--name-only'], {
      cwd: projectDir,
      encoding: "utf-8",
    });
    return output
      .split("\n")
      .map((line) => line.trim())
      .filter((line) => line.length > 0);
  } catch {
    return [];
  }
}
```

`isStale()` wraps this into a `StalenessResult` with a boolean `stale` flag and the `changedFiles` list. On any git error (e.g., the repo is not initialized), it conservatively returns an empty array rather than throwing.

Sources: [understand-anything-plugin/packages/core/src/staleness.ts:1-43]()

---

## Decision Logic at `/understand` Invocation

Phase 0 of the `/understand` skill reads both `meta.json` and the existing graph, then routes to one of four paths:

| Condition | Action |
|---|---|
| `--full` flag present | Full analysis (all phases) |
| No existing graph or `meta.json` | Full analysis (all phases) |
| `--review` + existing graph + same commit hash | Skip directly to graph reviewer |
| Existing graph + same commit hash | Ask user: rebuild, review, or do nothing |
| Existing graph + changed files | Incremental update (re-analyze changed files only) |

For incremental updates, the skill uses the same `git diff <lastCommitHash>..HEAD --name-only` call and passes the changed file list to a targeted re-analysis pipeline.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:141-159]()

---

## Structural Fingerprint System

Not every changed file represents a structural change. A developer reformatting a function body or fixing a comment should not trigger node re-analysis. The fingerprint system provides a zero-LLM-token pre-filter that classifies each changed file into one of three levels.

### FileFingerprint Shape

```ts
// understand-anything-plugin/packages/core/src/fingerprint.ts:9-39
export interface FileFingerprint {
  filePath: string;
  contentHash: string;
  functions: FunctionFingerprint[];  // name, params, returnType, exported, lineCount
  classes: ClassFingerprint[];       // name, methods, properties, exported, lineCount
  imports: ImportFingerprint[];      // source, specifiers
  exports: string[];
  totalLines: number;
  hasStructuralAnalysis: boolean;
}
```

The baseline `FingerprintStore` (written to `fingerprints.json` in Phase 7 of `/understand`) uses tree-sitter for precise extraction. During auto-update (Phase 1), a temporary Node.js script uses regex-based extraction — faster but sufficient for signature-level detection.

### Change Classification

`compareFingerprints()` applies a three-level decision:

| Level | Condition |
|---|---|
| `NONE` | SHA-256 content hash is identical — file unchanged |
| `COSMETIC` | Content changed but all function/class/import/export signatures match |
| `STRUCTURAL` | Any signature-level difference: new/removed function or class, changed params, changed return type, changed export status, changed imports/exports, or no structural analysis available (conservative) |

```ts
// understand-anything-plugin/packages/core/src/fingerprint.ts:131-246
export function compareFingerprints(
  oldFp: FileFingerprint,
  newFp: FileFingerprint,
): FileChangeResult {
  // Fast path: identical content
  if (oldFp.contentHash === newFp.contentHash) {
    return { filePath: newFp.filePath, changeLevel: "NONE", details: [] };
  }
  // Conservative: no structural analysis → STRUCTURAL
  if (!oldFp.hasStructuralAnalysis || !newFp.hasStructuralAnalysis) {
    return { ..., changeLevel: "STRUCTURAL", ... };
  }
  // ... compare function, class, import, export signatures ...
```

Sources: [understand-anything-plugin/packages/core/src/fingerprint.ts:131-246]()

---

## Update Classification

After fingerprinting all changed files, `classifyUpdate()` in `change-classifier.ts` maps the aggregate analysis to an action:

```ts
// understand-anything-plugin/packages/core/src/change-classifier.ts:21-87
export function classifyUpdate(
  analysis: ChangeAnalysis,
  totalFilesInGraph: number,
  allKnownFiles: string[] = [],
): UpdateDecision {
```

| Action | Trigger condition |
|---|---|
| `SKIP` | All changed files are `NONE` or `COSMETIC` |
| `PARTIAL_UPDATE` | ≤10 structural files, no new/removed top-level directories |
| `ARCHITECTURE_UPDATE` | >10 structural files, or new/removed top-level directories, or structural count ≤30 but directory structure changed |
| `FULL_UPDATE` | >30 structural files, or structural changes exceed 50% of all files in the graph |

`FULL_UPDATE` does not perform re-analysis automatically; it reports the situation and tells the user to run `/understand --full`. This prevents an auto-update from silently spending large amounts of LLM tokens.

Sources: [understand-anything-plugin/packages/core/src/change-classifier.ts:1-143]()

---

## Incremental Graph Merge

When an incremental update re-analyzes a subset of files, the results must be merged back into the existing graph without disturbing untouched nodes.

`mergeGraphUpdate()` in `staleness.ts` performs a surgical replacement:

1. Identify nodes whose `filePath` is in the changed-file set.
2. Collect their IDs into `removedNodeIds`.
3. Retain all nodes whose ID is **not** in `removedNodeIds`.
4. Retain all edges whose `source` **and** `target` are **not** in `removedNodeIds`.
5. Append the freshly analyzed nodes and edges.
6. Update `project.gitCommitHash` and `project.analyzedAt`.

```ts
// understand-anything-plugin/packages/core/src/staleness.ts:54-90
export function mergeGraphUpdate(
  existingGraph: KnowledgeGraph,
  changedFilePaths: string[],
  newNodes: GraphNode[],
  newEdges: GraphEdge[],
  newCommitHash: string,
): KnowledgeGraph {
  const changedSet = new Set(changedFilePaths);

  const removedNodeIds = new Set(
    existingGraph.nodes
      .filter((node) => node.filePath !== undefined && changedSet.has(node.filePath))
      .map((node) => node.id),
  );

  const retainedNodes = existingGraph.nodes.filter(
    (node) => !removedNodeIds.has(node.id),
  );

  const retainedEdges = existingGraph.edges.filter(
    (edge) => !removedNodeIds.has(edge.source) && !removedNodeIds.has(edge.target),
  );

  return {
    ...existingGraph,
    project: { ...existingGraph.project, gitCommitHash: newCommitHash, analyzedAt: new Date().toISOString() },
    nodes: [...retainedNodes, ...newNodes],
    edges: [...retainedEdges, ...newEdges],
  };
}
```

This invariant — removing both the affected nodes and any edge touching them — prevents dangling references in the merged graph.

Sources: [understand-anything-plugin/packages/core/src/staleness.ts:47-90]()

---

## Auto-Update Hooks

Two hooks in `hooks.json` allow the system to automatically detect and respond to graph staleness without user invocation.

### PostToolUse: Post-Commit Hook

```json
// understand-anything-plugin/hooks/hooks.json:4-13
"PostToolUse": [
  {
    "matcher": "Bash",
    "hooks": [
      {
        "type": "command",
        "command": "printf '%s' \"$TOOL_INPUT\" | grep -qE 'git\\s+(commit|merge|cherry-pick|rebase)' && [ -f .understand-anything/config.json ] && grep -q '\"autoUpdate\".*true' .understand-anything/config.json && [ -f .understand-anything/knowledge-graph.json ] && echo \"[understand-anything] Commit detected with auto-update enabled. You MUST read the file at ${CLAUDE_PLUGIN_ROOT}/hooks/auto-update-prompt.md and execute its instructions...\" || true"
      }
    ]
  }
]
```

This hook fires after every Bash tool use. It checks whether the Bash input looks like a git commit/merge/cherry-pick/rebase, verifies that `autoUpdate` is true in `config.json`, and that a graph already exists. If all conditions are met, it injects an instruction into the Claude session to run the incremental update prompt.

### SessionStart: Stale-on-Session-Open Hook

```json
// understand-anything-plugin/hooks/hooks.json:14-23
"SessionStart": [
  {
    "hooks": [
      {
        "type": "command",
        "command": "[ -f .understand-anything/config.json ] && grep -q '\"autoUpdate\".*true' ... && [ \"$(node -p ...)\" != \"$(git rev-parse HEAD 2>/dev/null)\" ] && echo \"[understand-anything] Knowledge graph is stale. You MUST read...\" || true"
      }
    ]
  }
]
```

At session start, if `autoUpdate` is enabled and the stored `gitCommitHash` in `meta.json` differs from the current `HEAD`, Claude is prompted to run the incremental update immediately. This handles the case where commits were made outside of an active Claude session.

Sources: [understand-anything-plugin/hooks/hooks.json:1-25]()

---

## Auto-Update Execution (Three-Phase Protocol)

When a hook fires, Claude reads `auto-update-prompt.md` and follows a three-phase protocol designed to minimize LLM token usage.

```mermaid
stateDiagram-v2
    [*] --> Phase0: Hook triggered
    Phase0 --> STOP_NoGraph: No knowledge-graph.json
    Phase0 --> STOP_UpToDate: Hashes match
    Phase0 --> STOP_NonSource: Only non-source files changed
    Phase0 --> Phase1: Source files changed

    Phase1 --> SKIP: All NONE/COSMETIC
    Phase1 --> PARTIAL_UPDATE: ≤10 structural files, same dirs
    Phase1 --> ARCHITECTURE_UPDATE: >10 structural or new dirs
    Phase1 --> FULL_UPDATE_STOP: >30 structural or >50% of graph

    SKIP --> SaveMeta: Update meta.json only
    PARTIAL_UPDATE --> Phase2: Re-analyze changed files
    ARCHITECTURE_UPDATE --> Phase2: Re-analyze + rerun architecture
    Phase2 --> Phase3: Merge results
    Phase3 --> SaveGraph: Write knowledge-graph.json + meta.json + fingerprints
```

**Phase 0 (Zero token cost):** Checks hashes, enumerates changed files, filters to source extensions (`.ts`, `.tsx`, `.js`, `.py`, `.go`, `.rs`, etc.), and applies `.understandignore` exclusions. If no relevant source files remain after filtering, `meta.json` is updated and execution stops.

**Phase 1 (Zero LLM tokens):** Runs a Node.js fingerprint-check script against stored `fingerprints.json` to classify each file as `NONE`, `COSMETIC`, or `STRUCTURAL`. The outcome drives the action decision (`SKIP`, `PARTIAL_UPDATE`, `ARCHITECTURE_UPDATE`, or `FULL_UPDATE`).

**Phase 2 (Minimal LLM tokens):** Re-dispatches the `file-analyzer` agent only for structurally changed files. Results are merged using the same node-removal logic described above for `mergeGraphUpdate()`.

**Phase 3:** Saves the final graph, updates `meta.json`, and performs a load-patch-save update of `fingerprints.json` (never overwriting the full dict — only patching re-analyzed entries to avoid issue #152).

Sources: [understand-anything-plugin/hooks/auto-update-prompt.md:1-321]()

---

## Worktree Redirect Invariant

A critical invariant governs where the graph is written when `/understand` runs inside a Claude Code git worktree.

Claude Code creates temporary worktrees for tasks. Any `.understand-anything/` directory written inside a worktree is destroyed when the session ends — taking the knowledge graph with it (documented as issue #133). To prevent this, Phase 0 of `/understand` detects worktrees by comparing `git rev-parse --git-dir` against `git rev-parse --git-common-dir`:

```bash
# understand-anything-plugin/skills/understand/SKILL.md:36-51
COMMON_DIR=$(git -C "$PROJECT_ROOT" rev-parse --git-common-dir 2>/dev/null)
GIT_DIR=$(git -C "$PROJECT_ROOT" rev-parse --git-dir 2>/dev/null)
if [ -n "$COMMON_DIR" ] && [ -n "$GIT_DIR" ]; then
  COMMON_ABS=$(cd "$PROJECT_ROOT" && cd "$COMMON_DIR" 2>/dev/null && pwd -P)
  GIT_ABS=$(cd "$PROJECT_ROOT" && cd "$GIT_DIR" 2>/dev/null && pwd -P)
  if [ -n "$COMMON_ABS" ] && [ "$COMMON_ABS" != "$GIT_ABS" ]; then
    MAIN_ROOT=$(dirname "$COMMON_ABS")
    PROJECT_ROOT="$MAIN_ROOT"   # redirect output to main repo root
  fi
fi
```

In a normal checkout or submodule, `--git-dir` and `--git-common-dir` resolve to the same path and no redirect occurs. In a worktree they differ; the parent of `--git-common-dir` is the main repo root, and `PROJECT_ROOT` is updated accordingly. The redirect can be suppressed with `UNDERSTAND_NO_WORKTREE_REDIRECT=1` for the rare case of wanting a per-worktree graph.

Sources: [understand-anything-plugin/skills/understand/SKILL.md:33-53]()

---

## Fingerprint Integrity: The LOAD-PATCH-SAVE Rule

The most critical invariant during fingerprint updates is that `fingerprints.json` must be loaded fully before patching, and saved in full afterward. A naive implementation that writes only the freshly computed entries discards every other file's fingerprint; on the next auto-update, those files have no stored baseline, get classified as `STRUCTURAL`, and the system escalates to `FULL_UPDATE` permanently.

The correct pattern (from `auto-update-prompt.md` Phase 3d) is:

```javascript
// 1. LOAD ALL existing entries
const all = existsSync(fpPath) ? JSON.parse(readFileSync(fpPath, 'utf-8')) : {};
const before = Object.keys(all).length;

// 2. PATCH only the re-analyzed paths
for (const filePath of filesToReanalyze) { /* update or delete all[filePath] */ }

// 3. GUARD: if file existed and non-empty but loaded as {}, abort
if (existedAndNonEmpty && before === 0) {
  throw new Error('fingerprints.json existed and was non-empty but loaded as {} — refusing to overwrite');
}

// 4. SAVE ALL entries back
writeFileSync(fpPath, JSON.stringify(all, null, 2));
```

The guard on step 3 prevents a silent read failure from clobbering the store. This is the same failure mode as issue #152, caught at the write boundary rather than the read boundary.

Sources: [understand-anything-plugin/hooks/auto-update-prompt.md:243-290]()

---

## Summary

Staleness detection in Understand Anything is a multi-layered pipeline where each layer has zero cost if nothing significant changed. The git-diff check (`staleness.ts`) determines whether any files are new since the last analysis; the fingerprint system (`fingerprint.ts`, `change-classifier.ts`) further classifies whether those file-level changes affect signatures the knowledge graph cares about; and `mergeGraphUpdate()` (`staleness.ts:54-90`) applies a minimal surgical replacement — removing stale nodes and their incident edges, then appending fresh ones — so the vast majority of the graph survives unchanged. Two hooks make this automatic: a post-commit hook for changes made inside a Claude session, and a session-start hook for changes made between sessions. The worktree redirect ensures the graph is never written to an ephemeral location, and the LOAD-PATCH-SAVE fingerprint rule ensures the incremental system never accidentally enters a degenerate state that forces unnecessary full rebuilds.
