# Explain It Simply

> What Graphify does in plain language: it reads a pile of project material, finds the named things and relationships, and draws a map that agents and humans can query later.

- Repository: safishamsi/graphify
- GitHub: https://github.com/safishamsi/graphify
- Human wiki: https://grok-wiki.com/public/wiki/safishamsi-graphify-af19ef9fd72d
- Complete Markdown: https://grok-wiki.com/public/wiki/safishamsi-graphify-af19ef9fd72d/llms-full.txt

## Source Files

- `README.md`
- `pyproject.toml`
- `ARCHITECTURE.md`
- `graphify/__main__.py`
- `graphify/skill-codex.md`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [pyproject.toml](pyproject.toml)
- [ARCHITECTURE.md](ARCHITECTURE.md)
- [graphify/__main__.py](graphify/__main__.py)
- [graphify/skill-codex.md](graphify/skill-codex.md)
- [graphify/detect.py](graphify/detect.py)
- [graphify/extract.py](graphify/extract.py)
- [graphify/build.py](graphify/build.py)
- [graphify/cluster.py](graphify/cluster.py)
- [graphify/analyze.py](graphify/analyze.py)
- [graphify/report.py](graphify/report.py)
- [graphify/export.py](graphify/export.py)
- [graphify/serve.py](graphify/serve.py)
- [graphify/llm.py](graphify/llm.py)
- [graphify/wiki.py](graphify/wiki.py)
- [tests/test_cli_export.py](tests/test_cli_export.py)
</details>

# Explain It Simply

Graphify turns a messy pile of project material into a map. The pile can include code, Markdown, PDFs, images, office files, videos, and URLs. The map is a knowledge graph: named things become nodes, relationships become edges, and the finished graph can be opened, queried, exported, or committed for a team.

This page explains Graphify in plain language first, then maps each simple idea back to the actual modules and commands in the repository. The goal is to help a smart newcomer understand what Graphify does before they learn every internal detail.

Sources: [README.md:26-41](), [README.md:208-221](), [ARCHITECTURE.md:5-12]()

## The Short Version

Imagine dumping a project folder onto a table. Graphify sorts the pile, writes labels on the important things, draws strings between related things, groups nearby things into neighborhoods, and saves the result so people and agents can ask better questions later.

In real repository terms:

```text
project files
  -> graphify.detect       finds supported files and skips noise
  -> graphify.extract      reads code structure with tree-sitter
  -> graphify.llm          reads docs/papers/images through a configured backend
  -> graphify.build        turns nodes + edges into a NetworkX graph
  -> graphify.cluster      groups related nodes into communities
  -> graphify.analyze      finds central nodes and surprising links
  -> graphify.export/report/wiki
                           writes graph.json, graph.html, reports, and wiki pages
```

The architecture document describes this as a staged pipeline where each stage is its own module and stages pass plain Python dicts or NetworkX graphs rather than sharing hidden state.

Sources: [ARCHITECTURE.md:5-31](), [graphify/__main__.py:2771-2777](), [graphify/__main__.py:3209-3252]()

## What Goes In

Graphify starts by detecting files. The detector knows about categories such as code, documents, papers, images, office files, and video/audio files. It also skips common generated folders and dependency caches, including `node_modules`, `.git`, build directories, framework caches, and Graphify's own output folder.

That means the first job is not "read everything." The first job is "find the useful project material and avoid obvious noise."

| Input kind | Examples from the code |
|---|---|
| Code | `.py`, `.ts`, `.js`, `.go`, `.rs`, `.java`, `.cpp`, `.rb`, `.swift`, `.kt`, `.cs`, `.php`, `.sql`, `.json`, and more |
| Docs | `.md`, `.mdx`, `.txt`, `.rst`, `.html`, `.yaml`, `.yml` |
| Papers | `.pdf` |
| Images | `.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.svg` |
| Office | `.docx`, `.xlsx` |
| Video/audio | `.mp4`, `.mov`, `.mp3`, `.wav`, and related formats |

Sources: [graphify/detect.py:28-37](), [graphify/detect.py:537-557](), [graphify/detect.py:862-940]()

## What Gets Found

Graphify looks for two kinds of meaning.

### Structural Meaning

For code, Graphify uses tree-sitter parsers. It finds things like files, imports, classes, functions, calls, and definitions. Those become graph nodes and edges with source file and line information.

A tiny code-shaped example of the internal data looks like this:

```json
{
  "nodes": [
    {
      "id": "auth_service",
      "label": "AuthService",
      "file_type": "code",
      "source_file": "src/auth.py",
      "source_location": "L12"
    }
  ],
  "edges": [
    {
      "source": "auth_service",
      "target": "database_pool",
      "relation": "calls",
      "confidence": "EXTRACTED"
    }
  ]
}
```

The important word is `EXTRACTED`: Graphify uses that when a relationship is directly present in the source, such as an import or call it can see.

Sources: [ARCHITECTURE.md:33-49](), [graphify/extract.py:1310-1382](), [graphify/extract.py:7505-7528]()

### Semantic Meaning

For docs, papers, and images, Graphify can use an AI backend to extract concepts and relationships that are not available through code parsing. The CLI supports multiple backends, including Claude, Kimi, Ollama, Gemini, OpenAI, DeepSeek, Bedrock, and Claude CLI. That keeps the design provider-neutral: the graph format and pipeline do not require one hosted model or one proprietary connector.

The direct CLI path detects or validates the chosen backend before semantic extraction. Ollama can run without an API key on loopback, Bedrock can use AWS credentials, and Claude CLI can use a locally authenticated CLI.

Sources: [graphify/__main__.py:2897-2970](), [graphify/llm.py:47-118](), [README.md:471-488]()

## How The Map Is Built

After extraction, Graphify merges the AST result and semantic result. The build layer validates and normalizes the extraction shape, then creates a NetworkX graph. Edges keep relation details like `calls`, `imports`, or `uses`, plus confidence labels such as `EXTRACTED`, `INFERRED`, and `AMBIGUOUS`.

The builder also handles practical cleanup: old `links` fields can be read as `edges`, absolute source paths can be made repo-relative, and slightly mismatched IDs can be normalized so relationships are not dropped too easily.

Sources: [graphify/build.py:107-189](), [graphify/build.py:192-220](), [graphify/skill-codex.md:396-429]()

## How Graphify Groups Things

Once it has a graph, Graphify clusters related nodes into communities. A community is like a neighborhood on the map: functions, classes, concepts, files, or documents that are more connected to each other than to the rest of the project.

Graphify also looks for:

| Report idea | Plain meaning |
|---|---|
| God nodes | The most-connected real entities, often core abstractions |
| Surprising connections | Links across files, file types, or communities that may not be obvious |
| Knowledge gaps | Thin or isolated areas that may need review |
| Confidence mix | How much of the graph was directly extracted versus inferred or ambiguous |

Sources: [graphify/cluster.py:86-106](), [graphify/analyze.py:85-136](), [graphify/report.py:67-120](), [graphify/report.py:151-180]()

## What Comes Out

The README describes the default user-facing result as three core files under `graphify-out/`: an interactive HTML graph, a human-readable `GRAPH_REPORT.md`, and the full machine-readable `graph.json`.

The code and tests show additional export paths too: HTML, Obsidian vaults, wiki pages, GraphML, SVG, Neo4j Cypher, and call-flow HTML. The wiki exporter creates an `index.md`, community articles, and god-node articles, which makes the graph easier for agents to crawl as Markdown.

Sources: [README.md:34-47](), [README.md:239-265](), [graphify/export.py:1-17](), [graphify/wiki.py:1-2](), [tests/test_cli_export.py:64-116]()

## How People And Agents Query It Later

The point of saving `graph.json` is that the project does not have to be reread from scratch every time. A user can ask:

```bash
graphify query "show the auth flow"
graphify path "UserService" "DatabasePool"
graphify explain "RateLimiter"
```

Under the hood, `query` loads `graph.json`, scores matching nodes, picks seed nodes, walks the graph with BFS or DFS, and renders a compact text subgraph with a token budget. `path` finds the shortest path between two matched nodes. `explain` finds a node and describes its neighbors.

Sources: [README.md:308-323](), [graphify/__main__.py:1703-1772](), [graphify/__main__.py:1852-1935](), [graphify/serve.py:314-339]()

## Why It Helps

Plain search answers "where does this word appear?" Graphify tries to answer "what is this thing connected to?" That difference matters in large projects because important relationships are often spread across files, languages, documents, and notes.

The report is also intentionally honest. It shows extraction confidence, highlights ambiguous edges for review, and separates directly found relationships from inferred ones. That makes it useful for human review and safer for agents that need scoped context instead of a full-project file dump.

Sources: [README.md:198-205](), [ARCHITECTURE.md:50-57](), [graphify/report.py:35-75]()

## Provider-Neutral And Portable By Design

Graphify's core artifact is a file-backed graph, not a hosted database or single model provider. The package exposes optional extras for different capabilities, and semantic extraction can route through different backends depending on environment and CLI flags. The assistant skill files are also platform-specific wrappers around the same graph-building idea.

For Grok-Wiki-style integration, the portable flow is: read repository files, generate or update `graphify-out/graph.json`, then crawl `graphify-out/wiki/index.md` or query `graphify query`. That stays BYOC/BYOK-friendly because the integration depends on files and CLI commands, while model choice remains a backend configuration detail.

Sources: [pyproject.toml:50-69](), [pyproject.toml:94-100](), [graphify/__main__.py:206-280](), [graphify/llm.py:47-118](), [graphify/wiki.py:141-178]()

## Final Summary

Graphify is a project-mapping tool. It reads supported project material, extracts named things and relationships, builds a graph, clusters it into meaningful neighborhoods, writes useful artifacts, and lets humans or agents ask scoped questions later. The simple idea is "turn the pile into a map"; the implementation is a Python CLI and library pipeline that saves that map under `graphify-out/`.

Sources: [ARCHITECTURE.md:7-12](), [README.md:26-41](), [graphify/__main__.py:3250-3307]()
