# Freshness, Storage & What to Try Next

> The closing map for the first 30 minutes: understand the SQLite schema, query and traversal layers, file watcher, git hook sync path, safety checks, and the concrete next experiments that prove the local graph is fresh and useful.

- Repository: colbymchenry/codegraph
- GitHub: https://github.com/colbymchenry/codegraph
- Human wiki: https://grok-wiki.com/public/wiki/colbymchenry-codegraph-89e8b2c4d43a
- Complete Markdown: https://grok-wiki.com/public/wiki/colbymchenry-codegraph-89e8b2c4d43a/llms-full.txt

## Source Files

- `src/db/schema.sql`
- `src/db/index.ts`
- `src/db/queries.ts`
- `src/db/sqlite-adapter.ts`
- `src/graph/traversal.ts`
- `src/graph/queries.ts`
- `src/sync/watcher.ts`
- `src/sync/git-hooks.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [src/db/schema.sql](src/db/schema.sql)
- [src/db/index.ts](src/db/index.ts)
- [src/db/queries.ts](src/db/queries.ts)
- [src/db/sqlite-adapter.ts](src/db/sqlite-adapter.ts)
- [src/graph/traversal.ts](src/graph/traversal.ts)
- [src/graph/queries.ts](src/graph/queries.ts)
- [src/sync/watcher.ts](src/sync/watcher.ts)
- [src/sync/git-hooks.ts](src/sync/git-hooks.ts)
- [src/sync/watch-policy.ts](src/sync/watch-policy.ts)
- [src/sync/index.ts](src/sync/index.ts)
- [src/index.ts](src/index.ts)
- [src/extraction/index.ts](src/extraction/index.ts)
- [src/bin/codegraph.ts](src/bin/codegraph.ts)
- [src/utils.ts](src/utils.ts)
- [__tests__/sync.test.ts](__tests__/sync.test.ts)
- [__tests__/watcher.test.ts](__tests__/watcher.test.ts)
- [__tests__/git-hooks.test.ts](__tests__/git-hooks.test.ts)
</details>

# Freshness, Storage & What to Try Next

This page is the closing map for the first 30 minutes in CodeGraph: where the local graph lives, how it is queried, how it stays fresh, and which experiments prove that the graph is both current and useful. The important thing to notice first is that freshness is local and file-backed: CodeGraph stores indexed symbols in `.codegraph/codegraph.db`, watches or syncs source changes, and exposes status/search/context commands without depending on a hosted model provider.

The provider-neutral posture matters for BYOC/BYOK workflows: the storage, sync, and query layers are ordinary repository files plus SQLite state. Any assistant or integration can consume the same graph through CLI or MCP surfaces without assuming a specific model vendor.

## Storage: The Local SQLite Graph

CodeGraph stores four main data shapes: `nodes`, `edges`, `files`, and `unresolved_refs`. Nodes are code symbols with source ranges and metadata; edges connect symbols; files record content hashes and indexing timestamps; unresolved refs hold references that need a later resolution pass. The schema also includes `schema_versions`, `project_metadata`, indexes for common lookups, and an FTS5 table kept current by triggers on `nodes`.

```mermaid
erDiagram
  files ||--o{ nodes : "file_path"
  nodes ||--o{ edges : "source"
  nodes ||--o{ edges : "target"
  nodes ||--o{ unresolved_refs : "from_node_id"
  schema_versions {
    integer version PK
    integer applied_at
    text description
  }
  files {
    text path PK
    text content_hash
    text language
    integer indexed_at
    integer node_count
  }
  nodes {
    text id PK
    text kind
    text name
    text qualified_name
    text file_path
    integer updated_at
  }
  edges {
    integer id PK
    text source
    text target
    text kind
    text provenance
  }
  unresolved_refs {
    integer id PK
    text from_node_id
    text reference_name
    text file_path
    text language
  }
```

Sources: [src/db/schema.sql:19-81](), [src/db/schema.sql:87-150]()

The default database path is `.codegraph/codegraph.db`. Initialization creates the parent directory, opens SQLite, enables foreign keys, sets WAL-oriented pragmas for the native backend, executes `schema.sql`, and records the current schema version when needed. Existing databases are opened through the same backend factory and then migrated if their schema version is behind.

Sources: [src/db/index.ts:32-68](), [src/db/index.ts:73-101](), [src/db/index.ts:181-190]()

### Backend Choice: Native First, WASM Fallback

The SQLite adapter presents one interface over `better-sqlite3` and `node-sqlite3-wasm`. It tries the native backend first and falls back to WASM if native loading fails. The WASM adapter rewrites named parameters, emulates transactions, adapts pragmas, and finalizes open statements on close so file locks are released.

The product signal to remember: `codegraph status` shows `Backend: native` or a WASM warning. WASM is deliberately supported for portability, but the code surfaces it because indexing and sync are slower there.

Sources: [src/db/sqlite-adapter.ts:8-23](), [src/db/sqlite-adapter.ts:72-123](), [src/db/sqlite-adapter.ts:124-229](), [src/db/sqlite-adapter.ts:231-267](), [src/index.ts:615-623](), [src/bin/codegraph.ts:721-735]()

## Query Layers: Prepared Storage APIs, Then Graph Views

`QueryBuilder` is the low-level database API. It converts SQLite rows into typed objects, keeps a small node cache, and owns prepared statements for insert/update/delete, node lookup, edge lookup, file records, unresolved refs, stats, and metadata. File freshness depends on the `files.content_hash` values written during extraction and compared during sync.

A short storage excerpt:

```ts
// src/db/queries.ts
INSERT INTO files (path, content_hash, language, size, modified_at, indexed_at, node_count, errors)
VALUES (@path, @contentHash, @language, @size, @modifiedAt, @indexedAt, @nodeCount, @errors)
ON CONFLICT(path) DO UPDATE SET
  content_hash = @contentHash,
  indexed_at = @indexedAt,
  node_count = @nodeCount
```

Sources: [src/db/queries.ts:84-151](), [src/db/queries.ts:193-253](), [src/db/queries.ts:1004-1048](), [src/db/queries.ts:1074-1149](), [src/db/queries.ts:1361-1407]()

`GraphTraverser` is the traversal layer. It uses `QueryBuilder` to perform BFS, DFS, callers, callees, call graphs, type hierarchy, usages, impact radius, paths, ancestors, and children. BFS has bounded defaults, supports direction and edge/node filters, and prioritizes `contains` and `calls` edges before broader references.

Sources: [src/graph/traversal.ts:13-20](), [src/graph/traversal.ts:48-125](), [src/graph/traversal.ts:207-227](), [src/graph/traversal.ts:229-345](), [src/graph/traversal.ts:456-581](), [src/graph/traversal.ts:590-640]()

`GraphQueryManager` builds higher-level questions from traversal and direct queries. It can assemble a node context, file dependencies and dependents, exported symbols, module structure, circular dependencies, node metrics, filtered subgraphs, and dead-code candidates. These are graph conveniences, not a separate store.

Sources: [src/graph/queries.ts:14-21](), [src/graph/queries.ts:23-108](), [src/graph/queries.ts:110-188](), [src/graph/queries.ts:230-330](), [src/graph/queries.ts:332-427]()

## Freshness: Hashes, Sync, Watcher, Git Hooks

Freshness starts in extraction. CodeGraph hashes file contents with SHA-256, skips storing unchanged files, deletes prior rows for changed files, inserts valid nodes/edges/unresolved refs, and writes a fresh `FileRecord` with `contentHash`, size, `modifiedAt`, `indexedAt`, node count, and any errors.

Sources: [src/extraction/index.ts:89-94](), [src/extraction/index.ts:1154-1225]()

`sync()` has two paths. In a git repo it uses `git status --porcelain --no-renames` to inspect only changed, added, and deleted files, then still hashes modified or untracked files against the DB so an untracked file is not repeatedly treated as new after it has been indexed. Outside git, or if git fails, it scans the current file set and compares it against tracked `files` records.

Sources: [src/extraction/index.ts:223-268](), [src/extraction/index.ts:1227-1371](), [src/extraction/index.ts:1374-1465]()

At the `CodeGraph` class level, `indexAll()`, `indexFiles()`, and `sync()` are protected by an in-process mutex and a cross-process lock file. That is important because the CLI, MCP server, and git hooks can all try to write the same SQLite database. If a lock looks stale, the lock utility can remove it; otherwise it tells users to run `codegraph unlock`.

Sources: [src/index.ts:139-164](), [src/index.ts:370-410](), [src/index.ts:417-490](), [src/utils.ts:177-239](), [src/utils.ts:241-260]()

### Watcher Path

`FileWatcher` uses recursive `fs.watch`, filters events through the same include/exclude rules as extraction, ignores `.codegraph/` writes, and debounces changes before calling `sync()`. If a sync is already running, later changes set `hasChanges` and schedule another pass after the current sync finishes.

Sources: [src/sync/watcher.ts:40-49](), [src/sync/watcher.ts:82-139](), [src/sync/watcher.ts:168-206](), [src/index.ts:505-528]()

The watch policy can disable live watching in known-problem environments. `CODEGRAPH_NO_WATCH=1` wins first, `CODEGRAPH_FORCE_WATCH=1` can override auto-detection, and WSL on Windows drive mounts like `/mnt/c` disables recursive watching because setup can be too slow for MCP startup. The sync module exports watcher, watch policy, and git hook helpers from one place.

Sources: [src/sync/watch-policy.ts:71-98](), [src/sync/index.ts:1-25]()

### Git Hook Sync Path

Git hooks are the fallback freshness path when the live watcher is not desirable. The default hooks are `post-commit`, `post-merge`, and `post-checkout`. The inserted shell block checks whether `codegraph` is on `PATH` and runs `codegraph sync` in the background, so git operations are not blocked. The installer uses marker comments so repeated installs replace CodeGraph’s block and preserve user-authored hook content.

```sh
# src/sync/git-hooks.ts
if command -v codegraph >/dev/null 2>&1; then
  ( codegraph sync >/dev/null 2>&1 & ) >/dev/null 2>&1
fi
```

Sources: [src/sync/git-hooks.ts:1-14](), [src/sync/git-hooks.ts:23-35](), [src/sync/git-hooks.ts:72-84](), [src/sync/git-hooks.ts:116-159](), [src/sync/git-hooks.ts:161-208]()

## Safety Checks Worth Knowing

| Safety check | Where it lives | Why it matters |
| --- | --- | --- |
| Project paths and file paths are validated against the root | `validatePathWithinRoot()` and extraction calls | Blocks path traversal before file reads |
| Sensitive system and home directories are rejected as project roots | `validateProjectPath()` | Avoids accidental indexing of dangerous locations |
| Cross-process writes use a PID lock file | `FileLock` plus `CodeGraph.indexAll()`/`sync()` | Prevents CLI, MCP, and hooks from writing concurrently |
| Watcher ignores `.codegraph/` | `FileWatcher.start()` | Avoids feedback loops from DB writes |
| Sync filters include/exclude patterns | `shouldIncludeFile()` and git change detection | Keeps unsupported or excluded files out of refresh work |
| Parser work has size checks, worker timeouts, and worker recycling | extraction orchestrator | Keeps large or pathological files from freezing the whole index |

Sources: [src/utils.ts:49-106](), [src/extraction/index.ts:104-126](), [src/extraction/index.ts:751-823](), [src/extraction/index.ts:602-731](), [src/sync/watcher.ts:106-118](), [src/index.ts:370-490]()

## What to Try Next

Use these as concrete experiments, not just commands to memorize.

| Experiment | Command or action | What proves it worked |
| --- | --- | --- |
| Build the graph from scratch | `codegraph init -i` | `.codegraph/codegraph.db` exists and `codegraph status` reports files, nodes, edges, DB size, and backend |
| Check freshness before editing | `codegraph status --json` | `pendingChanges` is all zero when the DB hashes match the working tree |
| Modify a known source file | Edit a function name, then run `codegraph status` | Pending changes show a modified file |
| Apply the change | `codegraph sync` | The command reports changed files, then `codegraph status` says the index is up to date |
| Prove old symbols disappear | Search for the old name after sync | The old symbol is gone and the new symbol is searchable |
| Verify deletion handling | Delete an indexed file, then run `codegraph sync` | Removed count increments and symbols from that file disappear |
| Inspect backend health | `codegraph status` | `Backend: native` is ideal; `wasm` means the portable fallback is active |
| Test watcher behavior through MCP/server mode | `codegraph serve --mcp` or use the configured MCP server | Watcher diagnostics say whether auto-sync is active or unavailable |
| Test hook fallback | Install/init in a git repo, then inspect hooks | Hooks include one CodeGraph marker block and invoke `codegraph sync` guarded by `command -v codegraph` |

Sources: [src/bin/codegraph.ts:391-472](), [src/bin/codegraph.ts:602-662](), [src/bin/codegraph.ts:664-775](), [src/bin/codegraph.ts:1098-1153](), [__tests__/sync.test.ts:92-152](), [__tests__/sync.test.ts:200-304](), [__tests__/git-hooks.test.ts:42-80]()

## First-30-Minute Reading Order

1. Start with `src/db/schema.sql` to understand what can be known: symbols, relationships, files, unresolved references, and metadata.
2. Read `src/extraction/index.ts` around `storeExtractionResult()` and `sync()` to understand freshness by content hash.
3. Read `src/db/queries.ts` for the storage API that every graph operation uses.
4. Read `src/graph/traversal.ts` and `src/graph/queries.ts` to see how callers, callees, impact, file dependencies, and context are assembled.
5. Read `src/sync/watcher.ts`, `src/sync/watch-policy.ts`, and `src/sync/git-hooks.ts` to understand automatic refresh versus fallback refresh.
6. Finish with `src/bin/codegraph.ts` status/sync output so you know how implementation state becomes user-facing diagnostics.

Sources: [src/db/schema.sql:19-81](), [src/extraction/index.ts:1154-1371](), [src/db/queries.ts:143-183](), [src/graph/traversal.ts:31-39](), [src/graph/queries.ts:11-21](), [src/sync/watcher.ts:40-49](), [src/sync/watch-policy.ts:71-98](), [src/bin/codegraph.ts:602-775]()

## Closing Summary

The mental model is simple: CodeGraph is a local SQLite graph plus a freshness loop. Extraction writes file hashes and symbol relationships, `QueryBuilder` reads and updates them, traversal builds useful graph answers, and freshness is maintained by manual sync, a debounced native watcher, or opt-in git hooks. The fastest confidence check is to edit one source file, run `codegraph status`, run `codegraph sync`, and verify that both search results and pending-change counts reflect the new state. Sources: [src/extraction/index.ts:1227-1371](), [src/bin/codegraph.ts:641-651](), [src/bin/codegraph.ts:757-773]()
