# Start Here

> What this repo is, the fastest read order, the first files to open, and the vocabulary behind nodes, edges, projects, tools, and local-first indexing.

- Repository: colbymchenry/codegraph
- GitHub: https://github.com/colbymchenry/codegraph
- Human wiki: https://grok-wiki.com/public/wiki/colbymchenry-codegraph-89e8b2c4d43a
- Complete Markdown: https://grok-wiki.com/public/wiki/colbymchenry-codegraph-89e8b2c4d43a/llms-full.txt

## Source Files

- `README.md`
- `package.json`
- `src/index.ts`
- `src/types.ts`
- `src/config.ts`
- `src/directory.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [package.json](package.json)
- [src/index.ts](src/index.ts)
- [src/types.ts](src/types.ts)
- [src/config.ts](src/config.ts)
- [src/directory.ts](src/directory.ts)
- [src/bin/codegraph.ts](src/bin/codegraph.ts)
- [src/mcp/tools.ts](src/mcp/tools.ts)
- [src/db/index.ts](src/db/index.ts)
- [src/db/schema.sql](src/db/schema.sql)
- [src/extraction/index.ts](src/extraction/index.ts)
- [src/sync/watcher.ts](src/sync/watcher.ts)
- [src/context/index.ts](src/context/index.ts)
</details>

# Start Here

CodeGraph is a TypeScript CLI and library for building a local semantic knowledge graph of a codebase. Its public package exposes a `codegraph` binary, compiles TypeScript into `dist`, copies SQLite schema and tree-sitter WASM assets at build time, and supports Node.js 20 through 24. Sources: [package.json:1-25](), [package.json:34-58]().

This page gives a first-30-minutes map: what to read first, what the core words mean, how indexing works, and where the CLI and agent-facing tools enter the system. Repository code is the source of truth here. No `STRATEGY.md` or `docs/solutions/**` sources were present in this checkout, so this page does not cite strategy or solved-problem notes. The selected Compound Engineering profile was treated as bundled guidance for page shape and QA review, not as an installed local skill execution.

## What This Repo Is

At the center is `CodeGraph`, a class that opens or initializes a project, creates a local `.codegraph` directory, stores graph data in SQLite, extracts symbols from source files, resolves references into edges, and serves query/context APIs. The class wires together the database, extraction orchestrator, resolver, graph manager, traverser, context builder, file lock, and optional file watcher. Sources: [src/index.ts:123-170](), [src/index.ts:185-216](), [src/index.ts:255-286]().

The repo has two main surfaces:

| Surface | Where to start | What it does |
|---|---|---|
| Library API | `src/index.ts` | Owns project lifecycle, indexing, sync, graph queries, context building, and exports public types. |
| CLI | `src/bin/codegraph.ts` | Provides commands such as `init`, `index`, `sync`, `status`, `query`, `files`, `context`, and `serve --mcp`. |
| MCP tools | `src/mcp/tools.ts` | Exposes agent-facing tools like `codegraph_context`, `codegraph_search`, `codegraph_node`, and `codegraph_files`. |
| Local storage | `src/db/index.ts`, `src/db/schema.sql` | Creates and opens `.codegraph/codegraph.db` with tables for files, nodes, edges, unresolved refs, and metadata. |

Sources: [src/bin/codegraph.ts:1-19](), [src/bin/codegraph.ts:537-705](), [src/bin/codegraph.ts:787-880](), [src/bin/codegraph.ts:1052-1135](), [src/mcp/tools.ts:240-248](), [src/db/index.ts:181-191](), [src/db/schema.sql:19-81]().

## Fastest Read Order

Read in this order if you are new:

1. `README.md` for the product promise, install flow, supported agents, and high-level feature list.
2. `package.json` for runtime constraints, binary name, build/test commands, and dependencies.
3. `src/index.ts` for the core `CodeGraph` lifecycle and how subsystems are composed.
4. `src/types.ts` for the domain vocabulary: node kinds, edge kinds, languages, records, config, context, and traversal options.
5. `src/directory.ts` and `src/config.ts` for what “a CodeGraph project” means on disk.
6. `src/extraction/index.ts` for indexing phases, file inclusion, hashing, git-aware scanning, and sync inputs.
7. `src/mcp/tools.ts` and `src/bin/codegraph.ts` for the user-facing and agent-facing operations.

Sources: [README.md:24-37](), [README.md:102-113](), [package.json:15-25](), [src/index.ts:53-82](), [src/types.ts:11-60](), [src/types.ts:429-490](), [src/directory.ts:10-34](), [src/config.ts:13-23](), [src/extraction/index.ts:52-87](), [src/mcp/tools.ts:248-442]().

## First Files to Open

### `src/index.ts`: The Spine

`CodeGraph.init()` creates the directory structure, writes config, initializes the database, and optionally indexes immediately. `CodeGraph.open()` checks that the project is initialized, validates the directory, loads config, opens the database, and optionally syncs. Indexing and sync are guarded by an in-process mutex and a cross-process file lock so CLI, MCP, and hooks do not write the graph at the same time. Sources: [src/index.ts:185-216](), [src/index.ts:255-286](), [src/index.ts:370-410](), [src/index.ts:417-490]().

```ts
// src/index.ts
static async init(projectRoot: string, options: InitOptions = {}): Promise<CodeGraph>
static async open(projectRoot: string, options: OpenOptions = {}): Promise<CodeGraph>
async indexAll(options: IndexOptions = {}): Promise<IndexResult>
async sync(options: IndexOptions = {}): Promise<SyncResult>
```

### `src/types.ts`: The Vocabulary

Nodes represent symbols and code elements. Edges represent relationships between nodes. Files record indexed source files and hashes for change detection. Context combines a focal node with ancestors, children, incoming and outgoing refs, related types, and imports. Sources: [src/types.ts:18-60](), [src/types.ts:97-186](), [src/types.ts:188-215](), [src/types.ts:376-400]().

| Term | Meaning in this repo |
|---|---|
| Node | A code symbol or structural element such as `file`, `class`, `function`, `route`, or `component`. |
| Edge | A relationship such as `contains`, `calls`, `imports`, `extends`, `implements`, `references`, or `decorates`. |
| FileRecord | The tracked source file, content hash, language, size, timestamps, node count, and extraction errors. |
| Subgraph | A focused subset of nodes and edges with root entry points. |
| Context | A task-ready bundle around a focal symbol and its relevant graph neighborhood. |

## Projects and Local Files

A project is considered initialized only when it has both a `.codegraph/` directory and `.codegraph/codegraph.db`. The directory module also walks upward from a starting path to find the nearest initialized project, similar to how Git discovers a repository root. Sources: [src/directory.ts:10-34](), [src/directory.ts:36-64]().

The `.codegraph` directory is intentionally local. `createDirectory()` writes a `.gitignore` that ignores database files, WAL/SHM files, cache, logs, and hook markers. This supports a local-first workflow where indexes are machine-local artifacts rather than committed project source. Sources: [src/directory.ts:66-106](), [src/db/index.ts:181-191]().

```text
project-root/
  .codegraph/
    codegraph.db        local SQLite graph database
    .gitignore          ignores DB, WAL/SHM, cache, logs, hook markers
  src/
  package.json
```

Sources: [src/directory.ts:83-105](), [src/db/schema.sql:19-81]().

## Indexing in One Mental Model

Indexing starts with file discovery, filters files through config include/exclude patterns, hashes contents, parses supported languages, stores files/nodes/edges, then resolves unresolved references into graph edges. Full indexing resolves all unresolved refs in batches; sync scopes resolution to changed files when possible and falls back to batched resolution when git change data is unavailable. Sources: [src/extraction/index.ts:28-50](), [src/extraction/index.ts:90-126](), [src/extraction/index.ts:178-216](), [src/index.ts:375-410](), [src/index.ts:437-490]().

```text
source files
  -> include/exclude filter
  -> language detection + tree-sitter extraction
  -> files / nodes / unresolved_refs in SQLite
  -> resolver creates edges
  -> context, search, callers, callees, impact, files
```

The watcher keeps the same local graph fresh by using native `fs.watch`, filtering out `.codegraph/` changes, applying include/exclude rules, and debouncing sync work. It returns `false` instead of crashing when recursive watching is unavailable or disabled for the environment. Sources: [src/sync/watcher.ts:1-9](), [src/sync/watcher.ts:40-49](), [src/sync/watcher.ts:82-138](), [src/sync/watcher.ts:168-206]().

## Config Defaults That Matter

Configuration lives at `.codegraph/config.json`, but `rootDir` is derived from the actual project path when config is loaded or saved. Validation requires version, root, include/exclude arrays, language and framework arrays, max file size, docstring extraction, and call-site tracking flags. Custom regex patterns are checked for compilability and basic ReDoS risk before acceptance. Sources: [src/config.ts:13-23](), [src/config.ts:25-48](), [src/config.ts:50-111](), [src/config.ts:134-191]().

The default config includes many source extensions across TypeScript, JavaScript, Python, Go, Rust, Java, C/C++, C#, PHP, Ruby, Swift, Kotlin, Dart, Svelte, Vue, Liquid, Pascal/Delphi, and Scala. It excludes version control, dependencies, build outputs, framework caches, virtualenvs, and language-specific generated directories. Sources: [src/types.ts:492-548](), [src/types.ts:549-620]().

## Tools and Agent Workflows

The CLI is the human/operator surface. Use `codegraph init -i` to create a project and index it, `codegraph index` for full indexing, `codegraph sync` for incremental updates, `codegraph status` for counts and backend status, `codegraph query` for symbol search, `codegraph files` for indexed file structure, `codegraph context` for task-ready context, and `codegraph serve --mcp` for agent integration. Sources: [src/bin/codegraph.ts:391-460](), [src/bin/codegraph.ts:537-662](), [src/bin/codegraph.ts:664-705](), [src/bin/codegraph.ts:787-880](), [src/bin/codegraph.ts:1052-1135]().

The MCP tools are the agent surface. `codegraph_context` is explicitly described as the primary first call for architecture, feature, and bug-context questions. Other tools are narrower: search symbols, inspect one node, find callers/callees, estimate impact, explore related source grouped by file, get status, and list indexed files. All MCP tools support an optional `projectPath` for querying another initialized local project. Sources: [src/mcp/tools.ts:232-248](), [src/mcp/tools.ts:250-442]().

## Provider-Neutral and BYOC/BYOK-Friendly Boundaries

The implementation keeps model-provider concerns outside the graph core. The package dependencies list local parsing, CLI, config parsing, SQLite, glob matching, and tree-sitter packages; it does not declare a model-provider SDK dependency. The integration boundary is CLI/MCP tooling, and MCP tools accept `projectPath` so the same installed tool can query different local repositories. Sources: [package.json:34-55](), [src/mcp/tools.ts:232-248](), [src/mcp/tools.ts:507-520]().

That makes the architecture portable across bring-your-own-compute and bring-your-own-key setups: CodeGraph builds and queries local repository indexes, while whichever assistant, key, or runtime invokes the MCP server remains outside the indexing/storage layer. The README also frames the system as local, with no API keys or external services and a SQLite database. Sources: [README.md:102-113](), [src/db/index.ts:181-191](), [src/db/schema.sql:19-81]().

## What to Remember

Start with `src/index.ts` for lifecycle, `src/types.ts` for vocabulary, `src/directory.ts` plus `src/config.ts` for project shape, and `src/mcp/tools.ts` plus `src/bin/codegraph.ts` for the two user-facing surfaces. The key mental model is simple: CodeGraph creates a local `.codegraph/codegraph.db`, extracts source symbols into nodes, resolves relationships into edges, and exposes that graph through CLI and MCP tools for faster codebase understanding. Sources: [src/index.ts:123-170](), [src/types.ts:97-186](), [src/directory.ts:23-34](), [src/mcp/tools.ts:240-442]().
