# Start Here: What Omni Is and the Read Order

> The one-paragraph product idea (on-device, no server, airgap-capable semantic search), the three build targets (OmniKit library, Omni app, omni-verify), the vocabulary a new reader needs (towers, retrieval LoRA, bf16 store, priority gate, base+delta search), and the fastest order to read the files that follow.

- Repository: hanxiao/omni-macos
- GitHub: https://github.com/hanxiao/omni-macos
- Human wiki: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05
- Complete Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/llms-full.txt

## Source Files

- `README.md`
- `Package.swift`
- `project.yml`
- `Makefile`
- `Sources/OmniKit/OmniEngine.swift`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [Package.swift](Package.swift)
- [project.yml](project.yml)
- [Makefile](Makefile)
- [Sources/OmniKit/OmniEngine.swift](Sources/OmniKit/OmniEngine.swift)
- [Sources/OmniKit/VectorStore.swift](Sources/OmniKit/VectorStore.swift)
- [Sources/omni-verify/main.swift](Sources/omni-verify/main.swift)
</details>

# Start Here: What Omni Is and the Read Order

This page is the orientation map for your first 30 minutes in `hanxiao/omni-macos`. It states the product idea in one paragraph, names the three things the build produces, defines the half-dozen words that recur everywhere in the code, and gives you the fastest order to read the files so the rest of the repository makes sense. Read this before opening any encoder or store file: those files assume the vocabulary below.

## What Omni is (the one-paragraph idea)

Omni is a native macOS app for **semantic search over your local files**: you type a query and it finds matching documents, code, PDFs, images, audio, and video by meaning, because every file and every query is embedded into one shared vector space. The embedding model (`jina-embeddings-v5-omni`, in a Nano ~1.9 GB or Small ~3.1 GB size) runs **in-process on Apple GPUs** via a native MLX-Swift port, with no Python and no server. The model downloads once on first launch; after that, indexing and search run with no network at all, so the Mac can be airgapped and Omni keeps working.

Sources: [README.md:13-33](README.md), [Sources/OmniKit/OmniEngine.swift:7-12](Sources/OmniKit/OmniEngine.swift)

## The three build targets

The repo is one SwiftPM package plus an XcodeGen-generated app. Three artifacts come out of it, and almost every file belongs to one of them.

| Target | Kind | Defined in | What it is |
|---|---|---|---|
| `OmniKit` | SPM library | `Package.swift` products | The engine + indexer: encoders, weight loader, crawler/extractor, SQLite vector store. All the real logic lives here (`Sources/OmniKit/`). |
| `Omni` | macOS app | `project.yml` -> `Omni.xcodeproj` | The SwiftUI front end (`App/`). Generated by XcodeGen, built through Xcode/`xcodebuild` because MLX-Swift must compile Metal shaders. |
| `omni-verify` | SPM executable | `Package.swift` products | A numeric-parity / benchmark CLI (`Sources/omni-verify/main.swift`) that checks the Swift encoder against Python reference fixtures. |

`OmniKit` depends on `mlx-swift` (the GPU runtime) and `swift-tokenizers` (a Rust-backed BPE tokenizer chosen for speed while keeping token ids identical to the reference). The app target depends only on `OmniKit`; the test target lives alongside the library and ships its parity fixtures as resources.

Sources: [Package.swift:7-46](Package.swift), [project.yml:12-19](project.yml), [README.md:87-94](README.md)

```text
   SPM package (Package.swift)                 XcodeGen (project.yml)
   ┌───────────────────────────┐               ┌──────────────────────┐
   │ OmniKit  (library)        │◀──depends on──│ Omni  (App/, SwiftUI)│
   │  encoders / indexer / DB  │               └──────────────────────┘
   └─────────────┬─────────────┘
                 │ depends on
        ┌────────┴────────┐
        │ omni-verify     │   numeric parity + searchbench CLI
        └─────────────────┘
   ext: mlx-swift (GPU) · swift-tokenizers (BPE)
```

## Vocabulary a new reader needs

These terms appear across the engine, the README, and the doc comments. Learn them once here.

### Towers
The model is three separate encoder stacks that all feed one shared backbone and land in the **same** vector space: a **Qwen3 text tower** (`OmniTextEncoder`), a **Qwen3-VL vision tower** (`OmniImageEncoder` / `OmniVisionTower`, also used for video frames and scanned-PDF pages), and a **Whisper-style audio tower** (`OmniAudioEncoder` / `OmniAudioTower`). `OmniEngine.init` constructs all three over a single shared `WeightStore`. Cross-modal alignment is enforced by wrapping media in a `Document:` prefix and appending a text end-token (`mediaSuffix`) so every modality pools at the same position.

Sources: [Sources/OmniKit/OmniEngine.swift:141-157](Sources/OmniKit/OmniEngine.swift), [Sources/OmniKit/OmniEngine.swift:122-133](Sources/OmniKit/OmniEngine.swift), [README.md:96-105](README.md)

### Retrieval LoRA
The published weights are a general backbone plus a small **retrieval LoRA** adapter (`adapters/retrieval/`). `WeightStore` loads the HF safetensors and **merges the LoRA into the backbone at load** (upcast to fp32, merge, cast back to bf16) so there is no per-call adapter math. This is why the engine is built for search rather than generation, and why `OmniEngine` constructs `WeightStore` with a `loraScale`.

Sources: [Sources/OmniKit/OmniEngine.swift:144-147](Sources/OmniKit/OmniEngine.swift), [README.md:96-105](README.md)

### bf16 store
Embeddings are persisted and held in memory as **bf16** (2 bytes per dimension) rather than fp32: half the size on disk and in RAM, with negligible recall loss on L2-normalized vectors. `VectorStore` keeps a single contiguous bf16 buffer (`[count*dim]`) as the source of truth, kept in sync on every insert/update/delete, and reinterprets (not converts) those bytes into a GPU matrix for scoring.

Sources: [Sources/OmniKit/VectorStore.swift:84-124](Sources/OmniKit/VectorStore.swift), [README.md:119-123](README.md)

### Priority gate
MLX evaluation is not thread-safe, so all GPU work is funneled through one serializer, `OmniEngine.run(highPriority:)`. It runs work one at a time, but a **high-priority** interactive query jumps ahead of pending **low-priority** indexing embeds, and an indexing call yields whenever a query is queued. Net effect: search stays responsive (a few ms) even while a long index runs, waiting at most one in-flight embed.

Sources: [Sources/OmniKit/OmniEngine.swift:118-180](Sources/OmniKit/OmniEngine.swift), [README.md:103-105](README.md)

### base + delta search
Search is **exact brute-force cosine**: one MLX matmul of the query against the resident bf16 matrix, no approximate index. To avoid recopying the whole matrix on every insert, `VectorStore` splits it into a GPU-resident **base** prefix plus a small **delta** of rows added since the base was built; each query scores base and delta and fuses the result. The ~0.8 GB base is rebuilt only on a structural change (delete/reload) or once the delta grows past a threshold; ordinary appends just extend the delta.

Sources: [Sources/OmniKit/VectorStore.swift:97-127](Sources/OmniKit/VectorStore.swift), [README.md:126-135](README.md)

## The fastest read order

Read top-down through the engine, then the pipeline, then the UI, then the verifier. Each step assumes the vocabulary above.

```text
1. README.md                       what + why, airgap claim, architecture map
2. OmniEngine.swift                facade: ModelLocator, towers, priority gate
3. WeightStore + encoders          OmniTextEncoder / OmniImageEncoder / OmniAudioEncoder
4. Indexer.swift                   crawl -> extract -> chunk -> embed -> store
5. VectorStore.swift               bf16 buffer + SQLite + base/delta cosine
6. App/ (OmniApp, AppModel, …)     SwiftUI shell that drives OmniKit
7. omni-verify/main.swift          parity fixtures + searchbench
```

| # | Read this | Why it is the right next file |
|---|---|---|
| 1 | `README.md` | Product idea, on-device/airgap guarantee, build prerequisites, and the four-line architecture map. Orientation, not implementation. |
| 2 | `Sources/OmniKit/OmniEngine.swift` | The public facade. `ModelLocator` shows how a model directory is found (`OMNI_MODEL_DIR`, App Support, HuggingFace cache); the `init` wires all three towers; `run(highPriority:)` is the priority gate. Start here, it links to everything. |
| 3 | `WeightStore.swift` + the encoder files | How LoRA-merged weights become bf16 tensors, and how each tower pools the last token and L2-normalizes into the shared space. |
| 4 | `Sources/OmniKit/Indexer.swift` | The crawl -> extract -> chunk -> embed -> store pipeline: incremental by file mtime/size, a concurrent decode stage feeding the single serialized GPU embed stage, batched forwards. |
| 5 | `Sources/OmniKit/VectorStore.swift` | Where vectors live: the bf16 buffer, SQLite persistence, and the base+delta brute-force matmul with a bounded min-heap for top-K. |
| 6 | `App/` (`OmniApp.swift`, `AppModel.swift`, `ContentView.swift`) | The SwiftUI shell that drives `OmniKit`: pick folders, press Index, search, filter by kind/folder/recency. |
| 7 | `Sources/omni-verify/main.swift` | The trust anchor: how numeric parity (text cosine >= 0.999, identical token ids) and `searchbench` are actually run. |

A useful "is it real?" check while reading: `make test` (delegated to `Scripts/run-tests.sh`) compiles the Metal shaders and asserts the parity cosines, and `make app` runs `xcodegen generate` then `xcodebuild`. The numbers in the README are produced by these, not assumed.

Sources: [Sources/OmniKit/OmniEngine.swift:14-91](Sources/OmniKit/OmniEngine.swift), [Makefile:9-24](Makefile), [README.md:76-85](README.md), [Sources/omni-verify/main.swift:8-31](Sources/omni-verify/main.swift)

## A note on portability

Omni's design is deliberately provider-neutral: there is no hosted API or model vendor in the runtime path. The "model" is just a directory of files (`model.safetensors`, `config.json`, `tokenizer.json`, `adapters/retrieval/`) resolved by `ModelLocator`, and the engine accepts either the Nano or Small variant interchangeably. Swapping models means pointing `OMNI_MODEL_DIR` at a different folder, not changing code or wiring a connector. That is what makes the airgap claim hold and what keeps any future integration a file/repository concern rather than a service dependency.

Sources: [Sources/OmniKit/OmniEngine.swift:14-59](Sources/OmniKit/OmniEngine.swift), [README.md:50-55](README.md)

## Summary

Omni is on-device semantic search: one shared vector space, three encoder towers, a retrieval-LoRA-merged backbone, all running in-process on Apple GPUs with no server. The build produces `OmniKit` (the engine), `Omni` (the SwiftUI app), and `omni-verify` (the parity/benchmark CLI). Carry five words with you, towers, retrieval LoRA, bf16 store, priority gate, and base+delta search, and read in the order README -> `OmniEngine` -> encoders -> `Indexer` -> `VectorStore` -> `App/` -> `omni-verify`. That path takes you from the product claim to the GPU matmul that backs it, in dependency order.

(Note on this run: the requested solved-problem notes under `docs/solutions/` and a `STRATEGY.md` strategy anchor are not present in the repository, so no such sources were used; all claims above are grounded in repository code, README, and build config.)