Agent-readable wiki
Omni macOS: First 30 Minutes
Omni is a native macOS SwiftUI app for on-device semantic search over local files, powered by an in-process MLX-Swift port of jina-embeddings-v5-omni (text, image, video, and audio towers in one shared vector space). This wiki is a guided path from first glance to a working mental model of the engine, the app, and its surfaces.
Pages
- Start Here: What Omni Is and the Read OrderThe one-paragraph product idea (on-device, no server, airgap-capable semantic search), the three build targets (OmniKit library, Omni app, omni-verify), the vocabulary a new reader needs (towers, retrieval LoRA, bf16 store, priority gate, base+delta search), and the fastest order to read the files that follow.
- Setup Signals: Model Discovery, Signing, and First RunWhat actually has to be true before Omni runs: where the model directory is located (OMNI_MODEL_DIR, Application Support, HuggingFace cache via ModelLocator), the one-time model download flow, the Apple Team ID / code-signing requirement wired through project.yml, and the onboarding path the user sees on first launch.
- OmniEngine: Embedding Towers, LoRA Merge, and the Priority GateThe heart of OmniKit: how WeightStore loads HF safetensors and merges the retrieval LoRA, how the Qwen3 text tower and Qwen3-VL / Whisper-style towers land in one shared space, and how MLX calls are serialized through a priority gate so an interactive query jumps ahead of in-flight indexing work.
- Indexing Pipeline: Crawl, Extract, Chunk, Embed, StoreThe incremental ingestion path stage by stage: the file crawler and mtime/size change detection, text/PDF/media extraction, chunking, the concurrent decode stage feeding a single serialized GPU embed stage, and persistence into the SQLite-backed bf16 vector store. Includes the FSWatcher that triggers re-indexing.
- Search Path: Query Qualifiers and bf16 Matmul ScoringHow a search-box string becomes results: the dependency-free SearchQueryParser that splits semantic text from key:value qualifiers (type, ext, in, date, after, score, sort), then exact brute-force cosine over the resident bf16 matrix split into a GPU-resident base plus a small delta, with top-K from a bounded min-heap and post-filtering.
- The SwiftUI App: AppModel, Views, and Window CommandsThe app shell that drives the engine: the @main scene and menu-bar command groups, the AppModel state object that owns indexing and search, the main content layout (sidebar, results list, Quick Look, settings), and how onboarding and updater hooks are wired into launch.
- Local Embedding Server: OpenAI / Cohere / Gemini-Compatible APIsA subsystem the README never mentions: an in-app HTTP server that exposes the engine as drop-in embedding APIs. Covers the Router and auth gate, the single ServingBackend seam onto OmniEngine + VectorStore, the per-provider SchemaAdapters (/v1/embeddings, /v1/embed, /v2/embed, Gemini :embedContent, /v1/search), and the controller/tab/log that manage it.
- Verifying the Engine and Where to Go NextThe closing page: how numeric parity is proven rather than assumed (omni-verify against Python-generated fixtures, cosine >= 0.999 with matching token ids, image/video/audio matching the upstream model.py), how the test suite and fixture generators are run, and a short map of what to read next after the first 30 minutes.
Complete Markdown
# Omni macOS: First 30 Minutes
> Omni is a native macOS SwiftUI app for on-device semantic search over local files, powered by an in-process MLX-Swift port of jina-embeddings-v5-omni (text, image, video, and audio towers in one shared vector space). This wiki is a guided path from first glance to a working mental model of the engine, the app, and its surfaces.
## Context Links
- [Agent index](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/llms.txt)
- [Human interactive wiki](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05)
- [GitHub repository](https://github.com/hanxiao/omni-macos)
## Repository Metadata
- Repository: hanxiao/omni-macos
- Generated: 2026-06-08T13:33:23.836Z
- Updated: 2026-06-08T13:54:18.223Z
- Runtime: Claude Code
- Format: First 30 Minutes
- Pages: 8
## Page Index
- 01. [Start Here: What Omni Is and the Read Order](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/01-start-here-what-omni-is-and-the-read-order.md) - The one-paragraph product idea (on-device, no server, airgap-capable semantic search), the three build targets (OmniKit library, Omni app, omni-verify), the vocabulary a new reader needs (towers, retrieval LoRA, bf16 store, priority gate, base+delta search), and the fastest order to read the files that follow.
- 02. [Setup Signals: Model Discovery, Signing, and First Run](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/02-setup-signals-model-discovery-signing-and-first-run.md) - What actually has to be true before Omni runs: where the model directory is located (OMNI_MODEL_DIR, Application Support, HuggingFace cache via ModelLocator), the one-time model download flow, the Apple Team ID / code-signing requirement wired through project.yml, and the onboarding path the user sees on first launch.
- 03. [OmniEngine: Embedding Towers, LoRA Merge, and the Priority Gate](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/03-omniengine-embedding-towers-lora-merge-and-the-priority-gate.md) - The heart of OmniKit: how WeightStore loads HF safetensors and merges the retrieval LoRA, how the Qwen3 text tower and Qwen3-VL / Whisper-style towers land in one shared space, and how MLX calls are serialized through a priority gate so an interactive query jumps ahead of in-flight indexing work.
- 04. [Indexing Pipeline: Crawl, Extract, Chunk, Embed, Store](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/04-indexing-pipeline-crawl-extract-chunk-embed-store.md) - The incremental ingestion path stage by stage: the file crawler and mtime/size change detection, text/PDF/media extraction, chunking, the concurrent decode stage feeding a single serialized GPU embed stage, and persistence into the SQLite-backed bf16 vector store. Includes the FSWatcher that triggers re-indexing.
- 05. [Search Path: Query Qualifiers and bf16 Matmul Scoring](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/05-search-path-query-qualifiers-and-bf16-matmul-scoring.md) - How a search-box string becomes results: the dependency-free SearchQueryParser that splits semantic text from key:value qualifiers (type, ext, in, date, after, score, sort), then exact brute-force cosine over the resident bf16 matrix split into a GPU-resident base plus a small delta, with top-K from a bounded min-heap and post-filtering.
- 06. [The SwiftUI App: AppModel, Views, and Window Commands](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/06-the-swiftui-app-appmodel-views-and-window-commands.md) - The app shell that drives the engine: the @main scene and menu-bar command groups, the AppModel state object that owns indexing and search, the main content layout (sidebar, results list, Quick Look, settings), and how onboarding and updater hooks are wired into launch.
- 07. [Local Embedding Server: OpenAI / Cohere / Gemini-Compatible APIs](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/07-local-embedding-server-openai-cohere-gemini-compatible-apis.md) - A subsystem the README never mentions: an in-app HTTP server that exposes the engine as drop-in embedding APIs. Covers the Router and auth gate, the single ServingBackend seam onto OmniEngine + VectorStore, the per-provider SchemaAdapters (/v1/embeddings, /v1/embed, /v2/embed, Gemini :embedContent, /v1/search), and the controller/tab/log that manage it.
- 08. [Verifying the Engine and Where to Go Next](https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/08-verifying-the-engine-and-where-to-go-next.md) - The closing page: how numeric parity is proven rather than assumed (omni-verify against Python-generated fixtures, cosine >= 0.999 with matching token ids, image/video/audio matching the upstream model.py), how the test suite and fixture generators are run, and a short map of what to read next after the first 30 minutes.
## Source File Index
- `App/AppModel.swift`
- `App/ContentView.swift`
- `App/OmniApp.swift`
- `App/OnboardingView.swift`
- `App/ResultsList.swift`
- `App/Serving/HTTPServer.swift`
- `App/Serving/Router.swift`
- `App/Serving/SchemaAdapters.swift`
- `App/Serving/ServingBackend.swift`
- `App/Serving/ServingController.swift`
- `App/Serving/ServingTab.swift`
- `App/SettingsView.swift`
- `App/Sidebar.swift`
- `Makefile`
- `Package.swift`
- `project.yml`
- `README.md`
- `Scripts/build-app.sh`
- `Scripts/run-tests.sh`
- `Sources/omni-verify/main.swift`
- `Sources/OmniKit/FileCrawler.swift`
- `Sources/OmniKit/FileExtractor.swift`
- `Sources/OmniKit/FSWatcher.swift`
- `Sources/OmniKit/Indexer.swift`
- `Sources/OmniKit/IndexSettings.swift`
- `Sources/OmniKit/ModelDownloader.swift`
- `Sources/OmniKit/OmniAudioEncoder.swift`
- `Sources/OmniKit/OmniEngine.swift`
- `Sources/OmniKit/OmniImageEncoder.swift`
- `Sources/OmniKit/OmniTextEncoder.swift`
- `Sources/OmniKit/Qwen3Backbone.swift`
- `Sources/OmniKit/SearchQueryParser.swift`
- `Sources/OmniKit/VectorStore.swift`
- `Sources/OmniKit/WeightStore.swift`
- `Tests/OmniKitTests/SearchQueryParserTests.swift`
- `Tests/OmniKitTests/TextEncoderTests.swift`
- `Tests/OmniKitTests/VectorStoreTests.swift`
- `Tools/gen_fixtures.py`
---
## 01. Start Here: What Omni Is and the Read Order
> The one-paragraph product idea (on-device, no server, airgap-capable semantic search), the three build targets (OmniKit library, Omni app, omni-verify), the vocabulary a new reader needs (towers, retrieval LoRA, bf16 store, priority gate, base+delta search), and the fastest order to read the files that follow.
- Page Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/01-start-here-what-omni-is-and-the-read-order.md
- Generated: 2026-06-08T13:29:54.874Z
### Source Files
- `README.md`
- `Package.swift`
- `project.yml`
- `Makefile`
- `Sources/OmniKit/OmniEngine.swift`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [Package.swift](Package.swift)
- [project.yml](project.yml)
- [Makefile](Makefile)
- [Sources/OmniKit/OmniEngine.swift](Sources/OmniKit/OmniEngine.swift)
- [Sources/OmniKit/VectorStore.swift](Sources/OmniKit/VectorStore.swift)
- [Sources/omni-verify/main.swift](Sources/omni-verify/main.swift)
</details>
# Start Here: What Omni Is and the Read Order
This page is the orientation map for your first 30 minutes in `hanxiao/omni-macos`. It states the product idea in one paragraph, names the three things the build produces, defines the half-dozen words that recur everywhere in the code, and gives you the fastest order to read the files so the rest of the repository makes sense. Read this before opening any encoder or store file: those files assume the vocabulary below.
## What Omni is (the one-paragraph idea)
Omni is a native macOS app for **semantic search over your local files**: you type a query and it finds matching documents, code, PDFs, images, audio, and video by meaning, because every file and every query is embedded into one shared vector space. The embedding model (`jina-embeddings-v5-omni`, in a Nano ~1.9 GB or Small ~3.1 GB size) runs **in-process on Apple GPUs** via a native MLX-Swift port, with no Python and no server. The model downloads once on first launch; after that, indexing and search run with no network at all, so the Mac can be airgapped and Omni keeps working.
Sources: [README.md:13-33](README.md), [Sources/OmniKit/OmniEngine.swift:7-12](Sources/OmniKit/OmniEngine.swift)
## The three build targets
The repo is one SwiftPM package plus an XcodeGen-generated app. Three artifacts come out of it, and almost every file belongs to one of them.
| Target | Kind | Defined in | What it is |
|---|---|---|---|
| `OmniKit` | SPM library | `Package.swift` products | The engine + indexer: encoders, weight loader, crawler/extractor, SQLite vector store. All the real logic lives here (`Sources/OmniKit/`). |
| `Omni` | macOS app | `project.yml` -> `Omni.xcodeproj` | The SwiftUI front end (`App/`). Generated by XcodeGen, built through Xcode/`xcodebuild` because MLX-Swift must compile Metal shaders. |
| `omni-verify` | SPM executable | `Package.swift` products | A numeric-parity / benchmark CLI (`Sources/omni-verify/main.swift`) that checks the Swift encoder against Python reference fixtures. |
`OmniKit` depends on `mlx-swift` (the GPU runtime) and `swift-tokenizers` (a Rust-backed BPE tokenizer chosen for speed while keeping token ids identical to the reference). The app target depends only on `OmniKit`; the test target lives alongside the library and ships its parity fixtures as resources.
Sources: [Package.swift:7-46](Package.swift), [project.yml:12-19](project.yml), [README.md:87-94](README.md)
```text
SPM package (Package.swift) XcodeGen (project.yml)
┌───────────────────────────┐ ┌──────────────────────┐
│ OmniKit (library) │◀──depends on──│ Omni (App/, SwiftUI)│
│ encoders / indexer / DB │ └──────────────────────┘
└─────────────┬─────────────┘
│ depends on
┌────────┴────────┐
│ omni-verify │ numeric parity + searchbench CLI
└─────────────────┘
ext: mlx-swift (GPU) · swift-tokenizers (BPE)
```
## Vocabulary a new reader needs
These terms appear across the engine, the README, and the doc comments. Learn them once here.
### Towers
The model is three separate encoder stacks that all feed one shared backbone and land in the **same** vector space: a **Qwen3 text tower** (`OmniTextEncoder`), a **Qwen3-VL vision tower** (`OmniImageEncoder` / `OmniVisionTower`, also used for video frames and scanned-PDF pages), and a **Whisper-style audio tower** (`OmniAudioEncoder` / `OmniAudioTower`). `OmniEngine.init` constructs all three over a single shared `WeightStore`. Cross-modal alignment is enforced by wrapping media in a `Document:` prefix and appending a text end-token (`mediaSuffix`) so every modality pools at the same position.
Sources: [Sources/OmniKit/OmniEngine.swift:141-157](Sources/OmniKit/OmniEngine.swift), [Sources/OmniKit/OmniEngine.swift:122-133](Sources/OmniKit/OmniEngine.swift), [README.md:96-105](README.md)
### Retrieval LoRA
The published weights are a general backbone plus a small **retrieval LoRA** adapter (`adapters/retrieval/`). `WeightStore` loads the HF safetensors and **merges the LoRA into the backbone at load** (upcast to fp32, merge, cast back to bf16) so there is no per-call adapter math. This is why the engine is built for search rather than generation, and why `OmniEngine` constructs `WeightStore` with a `loraScale`.
Sources: [Sources/OmniKit/OmniEngine.swift:144-147](Sources/OmniKit/OmniEngine.swift), [README.md:96-105](README.md)
### bf16 store
Embeddings are persisted and held in memory as **bf16** (2 bytes per dimension) rather than fp32: half the size on disk and in RAM, with negligible recall loss on L2-normalized vectors. `VectorStore` keeps a single contiguous bf16 buffer (`[count*dim]`) as the source of truth, kept in sync on every insert/update/delete, and reinterprets (not converts) those bytes into a GPU matrix for scoring.
Sources: [Sources/OmniKit/VectorStore.swift:84-124](Sources/OmniKit/VectorStore.swift), [README.md:119-123](README.md)
### Priority gate
MLX evaluation is not thread-safe, so all GPU work is funneled through one serializer, `OmniEngine.run(highPriority:)`. It runs work one at a time, but a **high-priority** interactive query jumps ahead of pending **low-priority** indexing embeds, and an indexing call yields whenever a query is queued. Net effect: search stays responsive (a few ms) even while a long index runs, waiting at most one in-flight embed.
Sources: [Sources/OmniKit/OmniEngine.swift:118-180](Sources/OmniKit/OmniEngine.swift), [README.md:103-105](README.md)
### base + delta search
Search is **exact brute-force cosine**: one MLX matmul of the query against the resident bf16 matrix, no approximate index. To avoid recopying the whole matrix on every insert, `VectorStore` splits it into a GPU-resident **base** prefix plus a small **delta** of rows added since the base was built; each query scores base and delta and fuses the result. The ~0.8 GB base is rebuilt only on a structural change (delete/reload) or once the delta grows past a threshold; ordinary appends just extend the delta.
Sources: [Sources/OmniKit/VectorStore.swift:97-127](Sources/OmniKit/VectorStore.swift), [README.md:126-135](README.md)
## The fastest read order
Read top-down through the engine, then the pipeline, then the UI, then the verifier. Each step assumes the vocabulary above.
```text
1. README.md what + why, airgap claim, architecture map
2. OmniEngine.swift facade: ModelLocator, towers, priority gate
3. WeightStore + encoders OmniTextEncoder / OmniImageEncoder / OmniAudioEncoder
4. Indexer.swift crawl -> extract -> chunk -> embed -> store
5. VectorStore.swift bf16 buffer + SQLite + base/delta cosine
6. App/ (OmniApp, AppModel, …) SwiftUI shell that drives OmniKit
7. omni-verify/main.swift parity fixtures + searchbench
```
| # | Read this | Why it is the right next file |
|---|---|---|
| 1 | `README.md` | Product idea, on-device/airgap guarantee, build prerequisites, and the four-line architecture map. Orientation, not implementation. |
| 2 | `Sources/OmniKit/OmniEngine.swift` | The public facade. `ModelLocator` shows how a model directory is found (`OMNI_MODEL_DIR`, App Support, HuggingFace cache); the `init` wires all three towers; `run(highPriority:)` is the priority gate. Start here, it links to everything. |
| 3 | `WeightStore.swift` + the encoder files | How LoRA-merged weights become bf16 tensors, and how each tower pools the last token and L2-normalizes into the shared space. |
| 4 | `Sources/OmniKit/Indexer.swift` | The crawl -> extract -> chunk -> embed -> store pipeline: incremental by file mtime/size, a concurrent decode stage feeding the single serialized GPU embed stage, batched forwards. |
| 5 | `Sources/OmniKit/VectorStore.swift` | Where vectors live: the bf16 buffer, SQLite persistence, and the base+delta brute-force matmul with a bounded min-heap for top-K. |
| 6 | `App/` (`OmniApp.swift`, `AppModel.swift`, `ContentView.swift`) | The SwiftUI shell that drives `OmniKit`: pick folders, press Index, search, filter by kind/folder/recency. |
| 7 | `Sources/omni-verify/main.swift` | The trust anchor: how numeric parity (text cosine >= 0.999, identical token ids) and `searchbench` are actually run. |
A useful "is it real?" check while reading: `make test` (delegated to `Scripts/run-tests.sh`) compiles the Metal shaders and asserts the parity cosines, and `make app` runs `xcodegen generate` then `xcodebuild`. The numbers in the README are produced by these, not assumed.
Sources: [Sources/OmniKit/OmniEngine.swift:14-91](Sources/OmniKit/OmniEngine.swift), [Makefile:9-24](Makefile), [README.md:76-85](README.md), [Sources/omni-verify/main.swift:8-31](Sources/omni-verify/main.swift)
## A note on portability
Omni's design is deliberately provider-neutral: there is no hosted API or model vendor in the runtime path. The "model" is just a directory of files (`model.safetensors`, `config.json`, `tokenizer.json`, `adapters/retrieval/`) resolved by `ModelLocator`, and the engine accepts either the Nano or Small variant interchangeably. Swapping models means pointing `OMNI_MODEL_DIR` at a different folder, not changing code or wiring a connector. That is what makes the airgap claim hold and what keeps any future integration a file/repository concern rather than a service dependency.
Sources: [Sources/OmniKit/OmniEngine.swift:14-59](Sources/OmniKit/OmniEngine.swift), [README.md:50-55](README.md)
## Summary
Omni is on-device semantic search: one shared vector space, three encoder towers, a retrieval-LoRA-merged backbone, all running in-process on Apple GPUs with no server. The build produces `OmniKit` (the engine), `Omni` (the SwiftUI app), and `omni-verify` (the parity/benchmark CLI). Carry five words with you, towers, retrieval LoRA, bf16 store, priority gate, and base+delta search, and read in the order README -> `OmniEngine` -> encoders -> `Indexer` -> `VectorStore` -> `App/` -> `omni-verify`. That path takes you from the product claim to the GPU matmul that backs it, in dependency order.
(Note on this run: the requested solved-problem notes under `docs/solutions/` and a `STRATEGY.md` strategy anchor are not present in the repository, so no such sources were used; all claims above are grounded in repository code, README, and build config.)
---
## 02. Setup Signals: Model Discovery, Signing, and First Run
> What actually has to be true before Omni runs: where the model directory is located (OMNI_MODEL_DIR, Application Support, HuggingFace cache via ModelLocator), the one-time model download flow, the Apple Team ID / code-signing requirement wired through project.yml, and the onboarding path the user sees on first launch.
- Page Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/02-setup-signals-model-discovery-signing-and-first-run.md
- Generated: 2026-06-08T13:30:24.317Z
### Source Files
- `project.yml`
- `Sources/OmniKit/OmniEngine.swift`
- `Sources/OmniKit/ModelDownloader.swift`
- `App/OnboardingView.swift`
- `Scripts/build-app.sh`
- `Makefile`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [project.yml](project.yml)
- [Sources/OmniKit/OmniEngine.swift](Sources/OmniKit/OmniEngine.swift)
- [Sources/OmniKit/ModelDownloader.swift](Sources/OmniKit/ModelDownloader.swift)
- [App/OnboardingView.swift](App/OnboardingView.swift)
- [App/AppModel.swift](App/AppModel.swift)
- [App/ContentView.swift](App/ContentView.swift)
- [Scripts/build-app.sh](Scripts/build-app.sh)
- [Makefile](Makefile)
- [README.md](README.md)
- [.github/workflows/release.yml](.github/workflows/release.yml)
</details>
# Setup Signals: Model Discovery, Signing, and First Run
Omni is a native macOS app that does semantic search entirely on-device, so before it can do anything useful two things must be true: a build identity must exist so macOS trusts the app with your files, and a complete model directory must be discoverable on disk. This page traces both. It explains where Omni looks for the model (the `OMNI_MODEL_DIR` env pointer, `~/Library/Application Support/Omni/`, and the HuggingFace cache, all resolved by `ModelLocator`), how the one-time download lands a model in Application Support, how the Apple Team ID is wired through `project.yml` at project-generation time, and what the user actually sees on first launch when no model is found yet.
If you are new to the repo, the two files that own this story are `Sources/OmniKit/OmniEngine.swift` (the `ModelLocator` discovery rules) and `App/AppModel.swift` (the launch/bootstrap state machine). `project.yml` owns signing. Read those three first.
## The four setup signals
| Signal | Owned by | What must be true |
| --- | --- | --- |
| Team ID | `project.yml` via `OMNI_TEAM_ID` env | A 10-char Apple Team ID is exported before `xcodegen generate`, so the generated project signs with a stable identity. |
| Model directory | `ModelLocator` (`OmniEngine.swift`) | A directory containing `model.safetensors` + `config.json` + `tokenizer.json` is reachable via override, App Support, or HF cache. |
| First-run UI | `AppModel.Phase` + `OnboardingView` | If no complete model is found, the app shows onboarding and offers a download or a folder picker. |
| Download | `ModelDownloader` | The runtime-required files are fetched once into `Application Support/Omni/<variant>`. |
Sources: [project.yml:30-41](project.yml), [Sources/OmniKit/OmniEngine.swift:14-91](Sources/OmniKit/OmniEngine.swift), [App/AppModel.swift:96-99](App/AppModel.swift), [Sources/OmniKit/ModelDownloader.swift:14-33](Sources/OmniKit/ModelDownloader.swift)
## Model discovery: how `ModelLocator` resolves a directory
Model location is path-based and layered, not hard-coded to one provider. `AppModel.resolvedModelDir()` is the entry point at launch: it first honors a folder the user explicitly picked (the `omni.modelDir` default, set by "Choose Model Folder"), but only if that folder is *complete*. Otherwise it delegates to `ModelLocator.resolve()`, which walks a fixed precedence chain and returns the first directory that holds a complete model.
A "complete" model is the key invariant. `firstWithWeights` deliberately requires three files together (`model.safetensors`, `config.json`, `tokenizer.json`) so a partial directory, like an interrupted download or a `/tmp` leftover with only weights, is skipped rather than selected and then failing later with a missing-config error.
```mermaid
flowchart TD
subgraph app["AppModel.resolvedModelDir()"]
saved["Saved picker dir<br/>UserDefaults omni.modelDir<br/>(only if complete)"]
end
subgraph locator["ModelLocator.resolve()"]
env["OMNI_MODEL_DIR env"]
legacy["~/Library/Application Support/<br/>Omni/model (legacy)"]
nano["resolve(variant: .nano)"]
small["resolve(variant: .small)"]
end
subgraph variant["resolve(variant:) order"]
appsup["App Support Omni/<variant>"]
dev["/private/tmp dev path"]
hub["HuggingFace cache snapshots<br/>~/.cache/huggingface/hub + ext volume"]
end
gate{"firstWithWeights:<br/>model.safetensors +<br/>config.json + tokenizer.json"}
saved -->|miss| env --> legacy --> nano --> small
nano --> variant
small --> variant
appsup --> dev --> hub
variant --> gate
locator --> gate
gate -->|complete dir found| ready["modelPath set -> bootstrap"]
gate -->|none| nomodel["Phase .noModel -> Onboarding"]
```
The precedence inside `ModelLocator` is:
1. **Overrides that win regardless of variant** — the `OMNI_MODEL_DIR` environment variable, then the legacy single-model path `Application Support/Omni/model`.
2. **Nano**, then **Small** as the default variant order (Nano is smaller and faster, so it wins when both are present).
3. For each variant, `resolve(variant:)` checks `Application Support/Omni/<variant>`, then a dev staging path (`/private/tmp/omni-model` for Small, `/private/tmp/omni-nano` for Nano), then HuggingFace cache snapshots under `~/.cache/huggingface/hub` and an external-volume hub root.
```swift
// Sources/OmniKit/OmniEngine.swift
private static let hubRoots = [
FileManager.default.homeDirectoryForCurrentUser.appendingPathComponent(".cache/huggingface/hub"),
URL(fileURLWithPath: "/Volumes/One Touch/ai-models/huggingface/hub"),
]
```
Because the highest-priority signal is a plain path in `OMNI_MODEL_DIR`, the discovery layer stays portable: any model directory laid out with the expected filenames works, whether it came from the HuggingFace cache, a hand-picked folder, or a CI-staged path. The `Makefile` uses exactly this seam to run tests against a fixed local model (`OMNI_MODEL_DIR='$(MODEL)'`).
Sources: [Sources/OmniKit/OmniEngine.swift:7-91](Sources/OmniKit/OmniEngine.swift), [App/AppModel.swift:932-943](App/AppModel.swift), [Makefile:18-19](Makefile)
## First run: the launch state machine
`AppModel` is an `@MainActor @Observable` model with a small `Phase` enum that the UI branches on. `init()` loads persisted settings and kicks off `bootstrap()`; `ContentView` renders one of four detail views depending on `phase`.
```mermaid
stateDiagram-v2
[*] --> loadingModel
loadingModel --> noModel: resolvedModelDir() == nil
loadingModel --> ready: engine + store loaded
loadingModel --> failed: load throws
noModel --> loadingModel: downloadModel() / setModelDir()
failed --> loadingModel: retryBootstrap()
ready --> loadingModel: switchVariant()
note right of noModel
OnboardingView:
download a variant
or pick a folder
end note
```
`bootstrap()` resolves the model directory; if none is found it sets `phase = .noModel` and returns. Otherwise it loads the `VectorStore` and `OmniEngine` concurrently (CPU index read overlapped with weight/tokenizer IO), wires the indexer and the in-process serving controller, sets `phase = .ready`, and kicks a background index pass. Any thrown error becomes `.failed(String)`.
```swift
// App/ContentView.swift
switch model.phase {
case .loadingModel: CenteredStatus(... "Loading the Omni model" ...)
case .noModel: OnboardingView()
case .failed(let msg): EngineFailedView(message: msg)
case .ready: ready
}
```
Sources: [App/AppModel.swift:96-99](App/AppModel.swift), [App/AppModel.swift:357-370](App/AppModel.swift), [App/AppModel.swift:993-1044](App/AppModel.swift), [App/ContentView.swift:80-91](App/ContentView.swift)
## Onboarding: what the user sees with no model
When `phase == .noModel`, `OnboardingView` greets the user and offers two paths to get a model on disk:
- **Download a variant.** Two buttons, `Download Omni Nano` (~1.9 GB, faster, the prominent default) and `Download Omni Small` (~3.1 GB, higher quality). Each calls `model.downloadModel(variant)`. While downloading, the buttons are replaced by a progress bar bound to `model.downloadFraction` and a monospaced byte-count label.
- **Choose an existing folder.** "Choose Model Folder…" opens an `NSOpenPanel` (directories only) and routes the chosen URL into `model.setModelDir(url)`, which persists it and re-runs `bootstrap()`.
The footer reinforces the privacy model: indexing and search run on Apple silicon, files never leave the device, and the model downloads once after which no internet is required. A download error surfaces in red via `model.downloadFailed` / `downloadLabel`.
Sources: [App/OnboardingView.swift:1-80](App/OnboardingView.swift), [App/AppModel.swift:925-929](App/AppModel.swift)
## The one-time download flow
`ModelDownloader` fetches only the files the Swift runtime needs (no Python) from the HuggingFace Hub. The repo name is derived from the variant (`jinaai/jina-embeddings-v5-omni-<variant>-mlx`), and files land in `Application Support/Omni/<variant>` via `installDir(for:)`.
```swift
// Sources/OmniKit/ModelDownloader.swift
public static let files = [
"config.json",
"tokenizer.json",
"tokenizer_config.json",
"adapters/retrieval/adapter_config.json",
"adapters/retrieval/adapter_model.safetensors",
"model.safetensors", // by far the largest
]
```
The download is resume-ish: a file already present with non-zero size is skipped, so an interrupted download continues rather than restarting. Each file streams to a stable temp location and is moved into place; HTML error pages returned with a non-200 status are rejected as model errors. `AppModel.downloadModel` drives progress (only the large `model.safetensors` advances the fraction), then on success refreshes `installedVariants`, sets the active variant, and calls `setModelDir(dest)` to load it.
Sources: [Sources/OmniKit/ModelDownloader.swift:14-123](Sources/OmniKit/ModelDownloader.swift), [App/AppModel.swift:959-991](App/AppModel.swift)
## Signing: the Apple Team ID wired through `project.yml`
Omni reads Documents, Downloads, and Desktop, which macOS gates behind TCC. To keep the user's grant from being re-prompted on every rebuild, the app is signed with a *stable* identity rather than ad-hoc. The repo ships no Apple credentials; instead `project.yml` reads the Team ID from the environment at generation time.
```yaml
# project.yml (target Omni settings)
CODE_SIGN_IDENTITY: "Apple Development"
CODE_SIGN_STYLE: Manual
DEVELOPMENT_TEAM: ${OMNI_TEAM_ID} # set before `xcodegen generate`
CODE_SIGNING_REQUIRED: YES
ENABLE_HARDENED_RUNTIME: YES
ENABLE_APP_SANDBOX: NO
```
The workflow is: `export OMNI_TEAM_ID=XXXXXXXXXX` then `xcodegen generate`, which bakes the team into `Omni.xcodeproj`. A *free* Apple ID's personal team is enough to build and run locally.
| Context | Team ID source | Signing identity | Extra |
| --- | --- | --- | --- |
| Local dev | `OMNI_TEAM_ID` env (free Apple ID OK) | `Apple Development` (manual) | Hardened runtime on, sandbox off |
| CI release | `APPLE_TEAM_ID` GitHub secret | `Developer ID Application` (paid program) | Notarization via Apple notary service |
CI does not rely on the `project.yml` default; the release workflow passes `DEVELOPMENT_TEAM` and a `Developer ID Application` identity as `xcodebuild` overrides, because the self-hosted runner has no "Apple Development" cert. `Scripts/build-app.sh` forwards any trailing args straight to `xcodebuild` for exactly these signing overrides, and also injects the swift-tokenizers Rust artifact module map that a plain `xcodebuild` would otherwise miss.
Sources: [project.yml:30-47](project.yml), [README.md:57-73](README.md), [.github/workflows/release.yml:84-99](.github/workflows/release.yml), [Scripts/build-app.sh:22-29](Scripts/build-app.sh)
## Putting it together
A clean first run is: export `OMNI_TEAM_ID`, generate and build (signed with your team so TCC remembers the grant), launch. `AppModel.bootstrap()` asks `ModelLocator` for a complete model directory; finding none, it lands on `.noModel` and `OnboardingView` offers a one-time download into `Application Support/Omni/<variant>` or a folder picker. Once a directory with `model.safetensors`, `config.json`, and `tokenizer.json` is resolvable, bootstrap loads the engine, flips to `.ready`, and indexing begins in the background. The deliberate seams — a path-based `OMNI_MODEL_DIR` override, an env-fed Team ID, and a completeness gate on the model directory — are what keep the setup portable and let CI, tests, and local builds each supply their own model and identity without anything proprietary baked into the repo.
Sources: [App/AppModel.swift:993-1044](App/AppModel.swift), [Sources/OmniKit/OmniEngine.swift:84-90](Sources/OmniKit/OmniEngine.swift)
---
## 03. OmniEngine: Embedding Towers, LoRA Merge, and the Priority Gate
> The heart of OmniKit: how WeightStore loads HF safetensors and merges the retrieval LoRA, how the Qwen3 text tower and Qwen3-VL / Whisper-style towers land in one shared space, and how MLX calls are serialized through a priority gate so an interactive query jumps ahead of in-flight indexing work.
- Page Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/03-omniengine-embedding-towers-lora-merge-and-the-priority-gate.md
- Generated: 2026-06-08T13:30:35.456Z
### Source Files
- `Sources/OmniKit/OmniEngine.swift`
- `Sources/OmniKit/WeightStore.swift`
- `Sources/OmniKit/OmniTextEncoder.swift`
- `Sources/OmniKit/OmniImageEncoder.swift`
- `Sources/OmniKit/Qwen3Backbone.swift`
- `Sources/OmniKit/OmniAudioEncoder.swift`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [Sources/OmniKit/OmniEngine.swift](Sources/OmniKit/OmniEngine.swift)
- [Sources/OmniKit/WeightStore.swift](Sources/OmniKit/WeightStore.swift)
- [Sources/OmniKit/OmniTextEncoder.swift](Sources/OmniKit/OmniTextEncoder.swift)
- [Sources/OmniKit/OmniImageEncoder.swift](Sources/OmniKit/OmniImageEncoder.swift)
- [Sources/OmniKit/Qwen3Backbone.swift](Sources/OmniKit/Qwen3Backbone.swift)
- [Sources/OmniKit/OmniAudioEncoder.swift](Sources/OmniKit/OmniAudioEncoder.swift)
- [Sources/OmniKit/OmniConfig.swift](Sources/OmniKit/OmniConfig.swift)
- [Sources/OmniKit/OmniAudioTower.swift](Sources/OmniKit/OmniAudioTower.swift)
</details>
# OmniEngine: Embedding Towers, LoRA Merge, and the Priority Gate
`OmniEngine` is the public face of the embedding engine: one object that loads the `jina-embeddings-v5-omni` model once and turns text, images, video, and audio into L2-normalized vectors that all land in the **same** retrieval space. Everything in OmniKit that produces a vector — the indexer crawling your Documents folder and the search box you type into — calls through this one class. This page traces three things that make it work: how `WeightStore` loads Hugging Face safetensors and **bakes the retrieval LoRA into the backbone at load time**, how the Qwen3 text tower and the Qwen3-VL / Qwen2.5-Omni media towers all funnel through one shared `Qwen3Backbone` so a text query can match a scanned PDF, and how every MLX evaluation is funneled through a **priority gate** so an interactive search jumps ahead of in-flight indexing work.
If you are new to the repo, read `OmniEngine.swift` first (the facade and the gate), then `WeightStore.swift` (what "the model" actually is in memory), then `Qwen3Backbone.swift` (the shared compute core). The three media encoders are thin wrappers that all delegate to that backbone.
## The shape of the engine
```mermaid
flowchart TB
subgraph app["Callers"]
Q["Search box<br/>embedQuery / embedFileQuery"]
IDX["Indexer<br/>embedText / embedImages / embedAudioMelBatch"]
end
subgraph engine["OmniEngine (facade + priority gate)"]
GATE["run(highPriority:)<br/>NSCondition: busy / highWaiting"]
end
subgraph encoders["Encoders (one shared backbone)"]
TE["OmniTextEncoder<br/>Qwen3 text tower"]
IE["OmniImageEncoder<br/>Qwen3-VL ViT + merger"]
AE["OmniAudioEncoder<br/>Qwen2.5-Omni audio tower + projector"]
BB["Qwen3Backbone<br/>28-layer GQA, last-token pool, L2"]
end
WS["WeightStore<br/>safetensors + merged retrieval LoRA"]
Q --> GATE
IDX --> GATE
GATE --> TE & IE & AE
TE --> BB
IE --> BB
AE --> BB
WS -. weights .-> TE & IE & AE & BB
```
`OmniEngine.init` loads the config, parses the BPE tokenizer concurrently with the synchronous weight load, then constructs the three encoders over one shared `WeightStore`. The embedding dimension is just the text hidden size (1024). The image and audio encoders are failable: `OmniImageEncoder?` and `OmniAudioEncoder?` are `nil` when their tower weights are absent, which drives `supportsImages` / `supportsAudio`.
Sources: [Sources/OmniKit/OmniEngine.swift:114-165](Sources/OmniKit/OmniEngine.swift), [Sources/OmniKit/OmniConfig.swift:7-21](Sources/OmniKit/OmniConfig.swift)
## WeightStore: loading safetensors and merging the retrieval LoRA
`WeightStore` is where "the model" becomes a `[String: MLXArray]` dictionary in memory. Its job is to mirror the reference Python `JinaMultiTaskModel` + `sanitize` so the Swift vectors match the Python ones numerically. Three things happen, in order.
**1. Load and prune.** It loads `model.safetensors`, then drops keys the app won't run: `position_ids`, and (optionally) `audio_tower.*` / `audio_projector.*` / `vision_tower.*` / `merger.*`. `OmniEngine` keeps all towers (`keepVision: true, keepAudio: true`), so pruning here mostly removes `position_ids`.
**2. Merge the retrieval LoRA in fp32.** The retrieval adapter (`adapters/retrieval/adapter_model.safetensors`) is a set of low-rank `A`/`B` pairs. For each target weight `W`, the merge computes `W += loraScale * (B @ A)`. The `loraScale` is `alpha/r = 1.0` for retrieval. This is a permanent, baked-in merge — there is no runtime adapter switching; the engine only ever produces retrieval embeddings.
```swift
// Sources/OmniKit/WeightStore.swift
let a = aArr.asType(.float32) // [r, in]
let b = bArr.asType(.float32) // [out, r]
let delta = matmul(b, a) // [out, in]
w[baseKey] = base + (loraScale * delta)
```
**3. Pay fp32 only where it matters.** The reference upcasts the entire backbone to fp32. This implementation is smarter by default: it first scans the adapter to learn exactly which `language_model.*` linears the LoRA touches (`loraTargets`), upcasts only those to fp32 for the merge, then casts them back to bf16. Every non-target weight is a bf16→fp32→bf16 identity round-trip, so skipping it is byte-identical at a fraction of the load memory. The exact fp32 parity path (`OMNI_BACKBONE_BF16=0`, used by the fp32 fixtures) still upcasts everything.
Finally, only the merged language backbone is force-evaluated; vision/audio tower weights stay **lazily memory-mapped** until the first image/audio embed, so launching and running a text-only query never pays to materialize towers it won't use.
Sources: [Sources/OmniKit/WeightStore.swift:9-86](Sources/OmniKit/WeightStore.swift), [Sources/OmniKit/OmniConfig.swift:55](Sources/OmniKit/OmniConfig.swift)
| Weight class | fp32 round-trip? | When materialized |
| --- | --- | --- |
| `language_model.*` LoRA targets | yes (merge), cast back to bf16 | eagerly at load (`eval`) |
| other `language_model.*` | no (identity, default bf16 path) | eagerly at load |
| `vision_tower.*` / `merger.*` | kept in stored dtype | lazily on first image embed |
| `audio_tower.*` / `audio_projector.*` | kept in stored dtype | lazily on first audio embed |
## One shared space: how every tower lands on the same vector
The reason text can search images is structural: every modality pools at the **last token** of a sequence run through the same `Qwen3Backbone`, and every sequence is wrapped so that last token lands in a comparable position. Two pieces enforce the alignment.
First, the **retrieval prefix** (`"Query: "` / `"Document: "`) is applied to *every* modality, not just text — the v5-omni model card applies the Query/Document distinction across all inputs. Media is indexed with the `Document:` prefix; a file used as a search query gets the `Query:` prefix instead.
Second, the **media suffix**. Last-token pooling is only meaningful if every modality pools at the *same kind* of token. The text path's tokenizer post-processor may append trailing special tokens (Nano appends `<|end_of_text|>`; Small appends nothing). `OmniTextEncoder` recovers exactly those trailing tokens by diffing `encode("x")` with and without special tokens, and the media encoders append the identical suffix. Without it, an image sequence would pool at `<|vision_end|>` while text pools at end-of-text — different positions, leaving the two modalities in near-orthogonal regions of the space. The embedding version string `omni-2-mediasuffix` records this fix.
```text
text : [Query:/Document:] + BPE(text) + [suffix] -> pool @ last
image : [Query:/Document:] + <|vision_start|> + ViT feats + <|vision_end|> + [suffix] -> pool @ last
audio : [Query:/Document:] + <|audio_start|> + aud feats + <|audio_end|> + [suffix] -> pool @ last
^ injected tower features ^ same trailing token everywhere
```
The injection itself is a concatenation, not a scatter: each media encoder builds `inputs_embeds` by concatenating the embedded prefix, the start token, the raw tower features (`[1, N, dim]`), the end token, and the suffix, then runs the shared backbone forward and last-token pools.
Sources: [Sources/OmniKit/OmniEngine.swift:93-100,124-132](Sources/OmniKit/OmniEngine.swift), [Sources/OmniKit/OmniTextEncoder.swift:18-34](Sources/OmniKit/OmniTextEncoder.swift), [Sources/OmniKit/OmniImageEncoder.swift:122-136](Sources/OmniKit/OmniImageEncoder.swift), [Sources/OmniKit/OmniAudioEncoder.swift:38-58](Sources/OmniKit/OmniAudioEncoder.swift)
### The towers, briefly
- **Text** — `OmniTextEncoder` runs `prefix -> Qwen2 BPE tokenize -> Qwen3 backbone -> last-token pool -> L2`, verified at cosine 1.00000 against the Python reference.
- **Image / video** — `OmniImageEncoder` runs the Qwen3-VL ViT plus a merger to produce per-image features, then injects them into the backbone. Video reuses the exact same path with `grid_t > 1` temporal features.
- **Audio** — `OmniAudioEncoder` runs the Qwen2.5-Omni audio encoder (a Whisper-style mel + conv stem with sinusoidal positions) plus a fused `audio_projector` (Linear 1280 -> 1024), then injects.
All three construct their **own** `Qwen3Backbone` over the same shared `WeightStore`, so the language weights are not duplicated — the backbone object is a thin stateless wrapper around the weight dictionary.
Sources: [Sources/OmniKit/OmniTextEncoder.swift:6-9](Sources/OmniKit/OmniTextEncoder.swift), [Sources/OmniKit/OmniImageEncoder.swift:5-40](Sources/OmniKit/OmniImageEncoder.swift), [Sources/OmniKit/OmniAudioTower.swift:6-20](Sources/OmniKit/OmniAudioTower.swift)
### Inside the backbone
`Qwen3Backbone` is 28 layers of grouped-query attention with RoPE (theta 3.5M) and last-token pooling. Two model-shape switches matter, both read from config rather than hardcoded:
- **Per-head q/k RMSNorm** is a Qwen3 feature (Small). Nano is Qwen2-style and omits those weights, so the norm is applied only when the weight is present.
- **Attention mask**: Small is causal (`isCausal`); Nano is bidirectional and, when batched, uses a padding-only additive mask (`-1e9` on pad columns) matching the reference `_bidi_mask`.
Compute precision is bf16 by default (faster, half the VRAM) with RMSNorm variance and the pooled output kept in fp32; `OMNI_BF16_COMPUTE=0` forces the exact fp32 path used by parity fixtures. Two opt-in performance levers exist: `OMNI_ASYNC_EVAL` double-buffers a batch's GPU forward over the prior batch's host readout, and `OMNI_COMPILE_BLOCK` fuses each transformer layer into one compiled kernel — both documented as bit-identical (or within cos 0.99995) to the eager path.
Sources: [Sources/OmniKit/Qwen3Backbone.swift:6-42,99-112,237-264](Sources/OmniKit/Qwen3Backbone.swift), [Sources/OmniKit/OmniTextEncoder.swift:101-149](Sources/OmniKit/OmniTextEncoder.swift)
## The priority gate: keeping search responsive during indexing
MLX evaluation is not safe to run concurrently from multiple threads, so every embed must be serialized. The naive serialization — one global lock — would make an interactive search wait behind the entire indexing queue. The gate solves this with a small `NSCondition` state machine that lets a **high-priority** query jump ahead of pending **low-priority** indexing work.
```swift
// Sources/OmniKit/OmniEngine.swift
private func run<T>(highPriority: Bool, _ work: () -> T) -> T {
cond.lock()
if highPriority { highWaiting += 1 }
while busy || (!highPriority && highWaiting > 0) { cond.wait() }
busy = true
if highPriority { highWaiting -= 1 }
cond.unlock()
let result = work()
cond.lock(); busy = false; cond.broadcast(); cond.unlock()
return result
}
```
The invariant is one line of logic: a low-priority call blocks whenever `highWaiting > 0`, so the instant a query registers itself it leapfrogs every waiting indexing embed. A query still cannot preempt the *one* embed already running (MLX has no mid-eval cancellation), so a search waits **at most one in-flight embed** — bounded, not queue-length.
```mermaid
sequenceDiagram
participant Idx as Indexer (low)
participant Gate as run() / NSCondition
participant Q as Search (high)
Idx->>Gate: embedText (low) -> busy=true, runs
Idx->>Gate: embedText (low) -> waits (busy)
Q->>Gate: embedQuery (high) -> highWaiting=1, waits (busy)
Note over Gate: in-flight embed finishes -> broadcast
Gate-->>Q: highWaiting>0 wins -> runs first
Gate-->>Idx: low embed resumes only after query clears
```
Which calls are high vs low is decided by intent, not modality:
| Call | Priority | Notes |
| --- | --- | --- |
| `embedQuery`, `embedImageQuery`, `embedVideoQuery`, `embedAudioQuery`, `embedFileQuery` | high | interactive search; excluded from the tok/s counter |
| `embedText`/`embedTextBatch` with `.query` | high | `highPriority: type == .query` |
| `embedText`/`embedTextBatch` with `.passage` | low | indexing |
| `embedImage(s)`, `embedVideoFrames`, `embedAudio*` | low | indexing |
Indexing calls also feed a separate, lock-guarded `tokensProcessed` counter (queries are deliberately excluded) that the UI samples for live tok/s. The `embedFileQuery` entry point closes the loop on cross-modal search: it detects a dropped file's modality, reuses the *indexing-path* decoders at high priority, and picks `queryPrefix` vs `docPrefix` based on whether the search is asymmetric ("search by this file") or symmetric ("find similar").
Sources: [Sources/OmniKit/OmniEngine.swift:118-201,273-317](Sources/OmniKit/OmniEngine.swift)
## Memory and batching discipline
Because this runs in-process on unified memory alongside the OS and the app UI, the engine bounds peak VRAM deliberately. `omniSetMemoryLimit` caps MLX memory globally (cache set to half the limit). The image encoder caps each block-diagonal vision-tower forward to a `patchBudget` (default 8192 packed patches, `OMNI_IMAGE_PATCH_BUDGET`), splitting larger image sets into successive bounded forwards, and evaluates the packed tower features before the backbone allocates — so the tower's large activations are freed first. The audio batch path bounds attention to per-chunk windows rather than an `O(L_total^2)` packed matrix. Crucially, batched media keeps the **backbone** pass at `B=1` per image for numerical stability (a batched bidirectional Nano forward over packed vision features measured as unstable), so every batched vector is bit-identical to its single-item counterpart.
Sources: [Sources/OmniKit/OmniEngine.swift:103-109,235-245](Sources/OmniKit/OmniEngine.swift), [Sources/OmniKit/OmniImageEncoder.swift:20-119](Sources/OmniKit/OmniImageEncoder.swift), [Sources/OmniKit/OmniAudioEncoder.swift:60-132](Sources/OmniKit/OmniAudioEncoder.swift)
## Summary
`OmniEngine` is a single facade over a single weight store, and three design choices give it its character. **The LoRA is baked at load time** — `WeightStore` merges the retrieval adapter into the backbone in fp32 and pays that precision cost only on the linears the adapter touches, leaving towers lazily mapped. **One backbone, one space** — text, vision, and audio towers all inject into the same `Qwen3Backbone` and pool at the same trailing token via a shared prefix and media suffix, which is exactly what makes text-to-PDF and file-to-file search possible. And **the priority gate** turns mandatory MLX serialization from a latency liability into a feature: an interactive query waits at most one in-flight embed, never the indexing backlog. To extend the engine, the load path is `WeightStore.init`, the compute core is `Qwen3Backbone.forward`/`pool`, and the concurrency contract lives entirely in `OmniEngine.run(highPriority:)`.
---
## 04. Indexing Pipeline: Crawl, Extract, Chunk, Embed, Store
> The incremental ingestion path stage by stage: the file crawler and mtime/size change detection, text/PDF/media extraction, chunking, the concurrent decode stage feeding a single serialized GPU embed stage, and persistence into the SQLite-backed bf16 vector store. Includes the FSWatcher that triggers re-indexing.
- Page Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/04-indexing-pipeline-crawl-extract-chunk-embed-store.md
- Generated: 2026-06-08T13:30:36.006Z
### Source Files
- `Sources/OmniKit/Indexer.swift`
- `Sources/OmniKit/FileCrawler.swift`
- `Sources/OmniKit/FileExtractor.swift`
- `Sources/OmniKit/FSWatcher.swift`
- `Sources/OmniKit/IndexSettings.swift`
- `Sources/OmniKit/VectorStore.swift`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [Sources/OmniKit/Indexer.swift](Sources/OmniKit/Indexer.swift)
- [Sources/OmniKit/FileCrawler.swift](Sources/OmniKit/FileCrawler.swift)
- [Sources/OmniKit/FileExtractor.swift](Sources/OmniKit/FileExtractor.swift)
- [Sources/OmniKit/FSWatcher.swift](Sources/OmniKit/FSWatcher.swift)
- [Sources/OmniKit/IndexSettings.swift](Sources/OmniKit/IndexSettings.swift)
- [Sources/OmniKit/VectorStore.swift](Sources/OmniKit/VectorStore.swift)
- [App/AppModel.swift](App/AppModel.swift)
</details>
# Indexing Pipeline: Crawl, Extract, Chunk, Embed, Store
This page traces a file from the disk to a searchable vector, stage by stage. Omni indexes the user's folders (Documents, Downloads, Desktop by default) incrementally: it walks each root, decides what changed, pulls embeddable content out of each file, splits text into overlapping chunks, runs everything through the in-process MLX embedder, and persists L2-normalized vectors into a SQLite-backed store that doubles as an in-memory bf16 matrix for cosine search. The whole flow lives in `OmniKit` and is driven by one method, `Indexer.index(roots:settings:force:onProgress:)`.
The central design idea is a **two-stage pipeline**: many CPU cores decode files concurrently (extraction, PDF rasterization, video frame sampling, audio mel STFT, image patchify), while a *single* serialized consumer thread owns the GPU and does only embedding. That split keeps the GPU continuously fed without ever running two forward passes at once, which is what lets an interactive search stay responsive while a full index is in flight. A separate, lighter path — the `FSWatcher` plus `AppModel` — keeps the index current after the first full pass by reacting to filesystem events.
Start reading at `Indexer.index(...)` ([Indexer.swift:124-341](Sources/OmniKit/Indexer.swift#L124-L341)); the `pipeline(...)` and `decode(...)` helpers below it are where the concurrency model lives.
## The shape of the pipeline
```mermaid
flowchart TB
subgraph trigger["Triggers"]
full["AppModel.startIndexing\n(full / resume pass)"]
fsw["FSWatcher (FSEvents)\n1.5s coalesced bursts"]
end
subgraph crawl["Crawl + change detection (per root)"]
fc["FileCrawler.walk\nskip hidden / bundles / node_modules,.git,...\nmtime+size -> CrawledFile"]
end
subgraph cpu["Concurrent decode stage (up to activeProcessorCount cores)"]
dec["Indexer.decode(file)\nFileExtractor.extract / chunk()\nPDF render, video keyframes,\naudio mel, image patchify"]
end
subgraph gpu["Serial embed stage (single thread, owns GPU)"]
emb["Embedder.embedTextBatches /\nembedImages / embedAudioMelBatch\nlength-bucketed, batched MLX forward"]
end
subgraph store["Persistence"]
vs["VectorStore.replace / replaceMany\nSQLite chunks table (bf16 BLOB)\n+ in-memory flat16 matrix"]
end
full --> fc
fsw -.path-only events.-> upd["Indexer.update(paths:)\n(no crawl)"]
fc -->|"CrawledFile, bounded by semaphore"| dec
dec -->|"DecodedItem mailbox\n(ordered, ReadyBox)"| emb
emb --> vs
upd --> dec
```
`FileCrawler` produces `CrawledFile` values; `pipeline(...)` fans them across cores into `decode(...)`, which yields a `DecodedItem`; the serial consumer embeds and calls `VectorStore`. `Sources: [Indexer.swift:84-117](Sources/OmniKit/Indexer.swift#L84-L117), [Indexer.swift:397-479](Sources/OmniKit/Indexer.swift#L397-L479)`
## Stage 1 - Crawl and change detection
`FileCrawler.walk` recursively enumerates each root with `FileManager.enumerator`, skipping hidden files, package bundles, and a fixed denylist of noise directories (`node_modules`, `.git`, `Library`, `Pods`, `.build`, `DerivedData`, `venv`, `__pycache__`, `dist`, `build`, `target`, and more). Only regular files that are a *supported, enabled* kind and under `maxFileSize` (200 MB) survive, each becoming a `CrawledFile` carrying its `url`, `modified` (mtime, epoch seconds), and `size`.
```swift
// FileCrawler.swift - the per-entry filter
guard vals.isRegularFile == true,
FileExtractor.isSupported(url, enabledKinds: enabledKinds, disabledExtensions: disabledExtensions)
else { continue }
let size = vals.fileSize ?? 0
if size > maxFileSize { continue }
let mtime = vals.contentModificationDate?.timeIntervalSince1970 ?? 0
onFile(CrawledFile(url: url, modified: mtime, size: size))
```
`index(...)` walks every root exactly once (a deliberate single pass - an earlier version stat-counted then re-walked, doubling traversal before the first embed), publishing each root's file count as its determinate progress total. Files are then **interleaved round-robin across roots** so every folder makes progress from the start, and **grouped by modality** so a whole kind is processed in one uniform phase. Incremental skipping happens inside `pipeline(...)`: a file whose stored `(modified, size)` still match `store.indexedFiles()` is marked `unchanged` and never decoded. `Sources: [FileCrawler.swift:39-63](Sources/OmniKit/FileCrawler.swift#L39-L63), [Indexer.swift:124-189](Sources/OmniKit/Indexer.swift#L124-L189), [Indexer.swift:407-417](Sources/OmniKit/Indexer.swift#L407-L417)`
## Stage 2 - Extraction by modality
`FileExtractor.kind(for:)` maps an extension to one of four `FileKind`s, and `extract(...)` dispatches to the right reader. The text path dominates but each modality has a distinct decode:
| Kind | Source path | What `decode` produces | Notes |
|------|-------------|------------------------|-------|
| Text | `extractText` reads up to `maxTextBytes` (2 MB), UTF-8 then ISO-Latin1 fallback | `.text([chunks])` | Covers code, markdown, config, logs |
| Text (PDF) | `extractPDF`: real text if `>= minCharsPerPage * pageCount`, else rasterize up to `maxScanPages` (8) pages to images | `.text` or `.imagePatches` | Scanned PDFs fall through to the vision tower |
| Text (office) | `extractOffice` via `NSAttributedString` (rtf, docx, pages, ...) | `.text` | AppKit-only |
| Image | `loadImage` thumbnails to `maxImageDimension` (1568) then `OmniVisionPreprocess.preprocessRaw` | `.imagePatches([RawPatches])` | Patchify runs in the decode stage, off the GPU thread |
| Video | `videoFrames`: dense sampling + average-hash dedup keeps up to `maxVideoFrames` (6) distinct keyframes | `.images([CGImage])` | One temporal clip -> one embedding |
| Audio | `OmniAudioPreprocess.melFeatures` computes the mel buffer | `.audioMel([Float], frames)` | STFT runs on background cores |
A key efficiency choice: still images and PDF pages are *preprocessed* (resize + parallel patchify) inside `decode`, so the serialized GPU thread only runs the vision tower, not the heavy CPU patchify. Video frames stay as raw `CGImage`s because they form one clip embedded together. `Sources: [FileExtractor.swift:65-106](Sources/OmniKit/FileExtractor.swift#L65-L106), [FileExtractor.swift:121-237](Sources/OmniKit/FileExtractor.swift#L121-L237), [Indexer.swift:438-479](Sources/OmniKit/Indexer.swift#L438-L479)`
Per-modality minimum thresholds from `IndexSettings` (`minTextChars`, `minImageDimension`, `minAudioSeconds`, `minVideoSeconds`) are applied here; a file below its threshold returns an empty `DecodedItem` and is counted as skipped rather than embedded. `Sources: [IndexSettings.swift:23-31](Sources/OmniKit/IndexSettings.swift#L23-L31), [Indexer.swift:443-470](Sources/OmniKit/Indexer.swift#L443-L470)`
## Stage 3 - Chunking text
Text longer than `maxCharsPerChunk` (user-set, default 1800, floored at 200) is split into overlapping windows. The slide step is `limit - chunkOverlap` (overlap 200), capped at `maxChunksPerFile` (40), so very long files contribute a bounded number of chunks while preserving cross-boundary context.
```swift
// Indexer.swift - chunk(_:)
let limit = max(200, active.maxCharsPerChunk)
let scalars = Array(text)
if scalars.count <= limit { return [text] }
let step = max(1, limit - chunkOverlap)
while start < scalars.count && chunks.count < maxChunksPerFile {
let end = min(start + limit, scalars.count)
chunks.append(String(scalars[start ..< end]))
if end == scalars.count { break }
start += step
}
```
Each chunk also gets a `snippet` (newline/tab-collapsed prefix of `snippetLength` = 220 chars) for the UI. `Sources: [Indexer.swift:558-577](Sources/OmniKit/Indexer.swift#L558-L577)`
## Stage 4 - Concurrent decode feeding a single serialized embed
This is the heart of the pipeline. `pipeline(...)` runs a producer that bounds outstanding work with a semaphore sized to `activeProcessorCount`, dispatches `decode(...)` onto a concurrent queue, and drops each result into an index-keyed mailbox (`ReadyBox`) guarded by an `NSCondition`. The consumer pulls items **in strict file order**, serially, on the calling thread - so embedding is never concurrent, even though decode is.
```swift
// Indexer.swift - pipeline(...)
let maxInFlight = max(2, ProcessInfo.processInfo.activeProcessorCount)
let sem = DispatchSemaphore(value: maxInFlight)
producerQ.async {
for (i, file) in files.enumerated() {
sem.wait() // bound decoding + decoded-not-consumed
... decodeQ.async { ready.items[i] = self.decode(file); cond.signal() }
}
}
for i in 0 ..< files.count {
cond.lock(); while ready.items[i] == nil { cond.wait() }
let item = ready.items.removeValue(forKey: i)!; cond.unlock()
sem.signal()
if item.abandoned { continue } // paused: don't consume/count
consume(item)
}
```
Because each modality is processed as a uniform phase, the consumer can **batch across files** into one GPU forward instead of many tiny ones:
- **Text** buffers chunks from many files into a staging window (`textBatchSize * 6`), **length-buckets** the window (sorting by length so each batch pads to a near-uniform max length), carves it into `textBatchSize` (default 16) groups, and hands the whole set to `embedTextBatches` in one serialized call. A file is stored only once all of its chunks return, preserving per-file atomicity. The batch size of 16 is a measured responsiveness sweet spot: a query's GPU eval queues behind the in-flight forward, so smaller batches (16 vs 48) cut the search p95 tail (~164 ms vs ~385 ms) with flat-to-better index throughput.
- **Audio** stages decoded mels and embeds up to `audioMaxClipsPerBatch` (16) clips bounded by a total-frame budget (`audioFrameBudget` = 24000) in one tower + backbone forward; a clip larger than the budget embeds alone.
- **Images / scanned PDFs** embed all pages of a file in one block-diagonal vision forward.
The order of phases is `IndexSettings.kindOrder` (default `[.image, .audio, .video, .text]` - media first because it is slower, so its results surface sooner). `Sources: [Indexer.swift:190-295](Sources/OmniKit/Indexer.swift#L190-L295), [Indexer.swift:397-436](Sources/OmniKit/Indexer.swift#L397-L436), [Indexer.swift:96-110](Sources/OmniKit/Indexer.swift#L96-L110), [IndexSettings.swift:18-21](Sources/OmniKit/IndexSettings.swift#L18-L21)`
### Cancellation and resume
`DecodedItem` carries two flags that keep pause/resume correct. `unchanged` means already indexed and current (counted, nothing to do); `abandoned` means produced after a cancel and never consumed - it is *not* counted as a skip and will be re-indexed on resume. Crucially, deletion reconciliation only runs on a fully completed pass, never on a cancelled one, because a paused run has not seen every file and must not purge "unseen" paths. `Sources: [Indexer.swift:65-80](Sources/OmniKit/Indexer.swift#L65-L80), [Indexer.swift:302-337](Sources/OmniKit/Indexer.swift#L302-L337)`
## Stage 5 - Persisting into the bf16 vector store
`VectorStore` is SQLite-backed but keeps every vector resident in memory as **bf16** (2 bytes/dim, half the residency of fp32 with negligible recall loss on L2-normalized vectors). `replace(path:chunks:)` is atomic per file: inside one transaction it deletes the path's old rows and inserts the new chunks, storing each embedding as a bf16 `BLOB`. It then mirrors the rows into the in-memory `flat16` matrix.
The store is tuned for the dominant indexing case - appending a brand-new file:
- A `presentPaths` set lets `replace` know in O(1) whether a path pre-exists; a new file skips the O(N) buffer rebuild entirely and just appends (geometric growth, amortized O(1)).
- New rows append *past* `baseRows` and are scored as a small "delta" matmul per query, so an ordinary indexing append does **not** rebuild the ~0.8 GB resident base matrix - that happens only on a structural change (delete/reload) or when the delta exceeds `foldThreshold` (50 000).
- Schema is a rebuildable cache: WAL journaling, `synchronous=NORMAL`, 256 MB mmap and page cache. On a `schemaVersion` mismatch the `chunks` table is simply dropped and recreated; display-metadata columns (`width`, `height`, `duration`) were added via lazy `ADD COLUMN` migrations without forcing a reindex.
```swift
// VectorStore.swift - new-file fast path after COMMIT
if presentPaths.contains(path) { removeRowsLocked { $0.path == path } }
for (i, c) in chunks.enumerated() {
rows.append(Row(...)); flat16.append(contentsOf: bfs[i]); fileID.append(internPath(c.path))
}
presentPaths.insert(path)
// No invalidateBase(): a new path's rows are scored as delta.
```
`replaceMany(...)` handles file-watcher bursts (bulk edit, git checkout, synced folder) as one transaction and one in-memory rebuild instead of O(N) per file, and `deletePaths(...)` does the same for batched deletions. `Sources: [VectorStore.swift:84-131](Sources/OmniKit/VectorStore.swift#L84-L131), [VectorStore.swift:219-272](Sources/OmniKit/VectorStore.swift#L219-L272), [VectorStore.swift:279-370](Sources/OmniKit/VectorStore.swift#L279-L370), [VectorStore.swift:159-206](Sources/OmniKit/VectorStore.swift#L159-L206)`
Non-finite embeddings are filtered before storage; a file with zero finite chunks is logged and counted as skipped, while a store failure increments `failed`. `Sources: [Indexer.swift:174-184](Sources/OmniKit/Indexer.swift#L174-L184)`
## The FSWatcher: keeping the index live
After the first full pass, re-indexing is event-driven. `FSWatcher` wraps FSEvents with `kFSEventStreamCreateFlagFileEvents`, coalescing bursts with a 1.5 s latency. It persists `lastEventId` so a relaunch can replay changes missed while the app was closed (`since:` is read from `UserDefaults` at construction).
```swift
// FSWatcher.swift
let flags = UInt32(kFSEventStreamCreateFlagFileEvents | kFSEventStreamCreateFlagNoDefer
| kFSEventStreamCreateFlagUseCFTypes | kFSEventStreamCreateFlagWatchRoot)
FSEventStreamCreate(kCFAllocatorDefault, fsEventsCallback, &context,
paths as CFArray, sinceWhen, 1.5 /* coalesce bursts */, flags)
```
`AppModel.handleFSChange` is the consumer. It drops events under paused folders, **buffers** changes while a full index is running (draining them on completion), and otherwise calls `Indexer.update(paths:settings:)` on a detached utility task. `update(...)` performs a *targeted* re-embed with **no crawl**: each path is re-classified by its current `(modified, size)`; unchanged files are ignored, now-unsupported/deleted files are queued for deletion, and a directory event triggers a scoped `FileCrawler.walk` so freshly added subtrees get indexed. The whole batch is applied via `deletePaths` + `replaceMany`. `Sources: [FSWatcher.swift:13-51](Sources/OmniKit/FSWatcher.swift#L13-L51), [AppModel.swift:1338-1381](App/AppModel.swift#L1338-L1381), [Indexer.swift:345-390](Sources/OmniKit/Indexer.swift#L345-L390)`
## Configuration surface
`IndexSettings` is the single struct carrying everything the indexer needs. `AppModel.effectiveSettings()` populates it from user preferences before each pass.
| Field | Default | Effect |
|-------|---------|--------|
| `enabledKinds` | all four | Which modalities are crawled and indexed |
| `disabledExtensions` | empty | Extensions turned off within an enabled kind |
| `kindOrder` | `[image, audio, video, text]` | Phase order; sets which modality embeds first |
| `maxImageDimension` | 1568 | Largest image/PDF-page side decoded |
| `maxVideoFrames` | 6 | Keyframes sampled per video |
| `maxCharsPerChunk` | 1800 | Longest text slice per chunk before overlap split |
| `minImageDimension` / `minAudioSeconds` / `minVideoSeconds` / `minTextChars` | 0 | Per-modality skip thresholds |
`Sources: [IndexSettings.swift:1-56](Sources/OmniKit/IndexSettings.swift#L1-L56), [AppModel.swift:1323-1334](App/AppModel.swift#L1323-L1334)`
## Summary
Omni's ingestion path is deliberately asymmetric: cheap, parallelizable work (crawling, extraction, rasterization, frame sampling, mel STFT, image patchify) is spread across all CPU cores, while the one expensive, non-shareable resource - the MLX GPU forward - is fed by a single serialized consumer that batches across files for throughput yet stays small enough (text batch 16) to keep concurrent search responsive. Incrementality is enforced end to end by `(mtime, size)` change detection, per-file atomic `replace`, abandon-on-cancel semantics, and deletion reconciliation gated on a complete pass. Persistence keeps SQLite as durable truth while a resident bf16 matrix with a base/delta split absorbs ordinary appends without rebuilding. The `FSWatcher` closes the loop, turning filesystem events into targeted, crawl-free updates so the index stays current after the initial build. `Sources: [Indexer.swift:84-341](Sources/OmniKit/Indexer.swift#L84-L341), [VectorStore.swift:84-273](Sources/OmniKit/VectorStore.swift#L84-L273)`
---
## 05. Search Path: Query Qualifiers and bf16 Matmul Scoring
> How a search-box string becomes results: the dependency-free SearchQueryParser that splits semantic text from key:value qualifiers (type, ext, in, date, after, score, sort), then exact brute-force cosine over the resident bf16 matrix split into a GPU-resident base plus a small delta, with top-K from a bounded min-heap and post-filtering.
- Page Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/05-search-path-query-qualifiers-and-bf16-matmul-scoring.md
- Generated: 2026-06-08T13:32:45.601Z
### Source Files
- `Sources/OmniKit/SearchQueryParser.swift`
- `Sources/OmniKit/VectorStore.swift`
- `Sources/OmniKit/OmniEngine.swift`
- `App/ResultsList.swift`
- `Tests/OmniKitTests/SearchQueryParserTests.swift`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [Sources/OmniKit/SearchQueryParser.swift](Sources/OmniKit/SearchQueryParser.swift)
- [Sources/OmniKit/VectorStore.swift](Sources/OmniKit/VectorStore.swift)
- [Sources/OmniKit/OmniEngine.swift](Sources/OmniKit/OmniEngine.swift)
- [App/AppModel.swift](App/AppModel.swift)
- [App/ResultsList.swift](App/ResultsList.swift)
- [Tests/OmniKitTests/SearchQueryParserTests.swift](Tests/OmniKitTests/SearchQueryParserTests.swift)
</details>
# Search Path: Query Qualifiers and bf16 Matmul Scoring
This page follows a single search-box string all the way to a sorted result list. Two things happen to that string, and they are deliberately kept on separate channels. The free-text part becomes a *semantic* query: it is embedded into a vector and scored against every indexed chunk by exact brute-force cosine. The structured `key:value` parts (`type:image`, `in:~/Documents`, `after:30d`, `score:50%`, `sort:date`) become *qualifiers*: they tune filters and post-processing without ever touching the embedding. The split is done by `SearchQueryParser`, a pure, dependency-free parser, and the scoring is done by `VectorStore`, which keeps every embedding resident as a `bf16` matrix and scores a query with one or two GPU matmuls.
If you are new to the codebase, read these three in order: `SearchQueryParser.parse` (how text splits into two channels), `AppModel.applyParsedQuery` (how qualifiers become filter state), and `VectorStore.search` / `reduceTopK` (how scoring and top-K selection work). Everything else on this page is detail around those three.
## The two-channel pipeline
A query string is never interpreted by one component end to end. It crosses three ownership boundaries: a pure parser in `OmniKit`, the `AppModel` view-model in the app target, and the `VectorStore` back in `OmniKit`. The semantic channel and the qualifier channel are disjoint by construction and only rejoin at the very end, where post-filters (`minScore`, `sortOrder`) re-rank the hits the store returned.
```mermaid
flowchart TB
raw["raw search-box string"]
subgraph parser["SearchQueryParser (OmniKit, pure)"]
parse["parse(raw)"]
parsed["ParsedQuery: semanticText + qualifiers"]
end
subgraph model["AppModel (App target)"]
apply["applyParsedQuery: qualifiers -> filter state"]
filt["currentFilter() -> SearchFilter (kinds/folder/ext/since)"]
embed["embedQuery(semanticText) -> [Float]"]
recompute["recomputeResults (minScore + sortOrder)"]
end
subgraph store["VectorStore (OmniKit)"]
matmul["base matmul + delta matmul (bf16, GPU)"]
reduce["reduceTopK (per-file min-heap)"]
end
view["ResultsList (SwiftUI)"]
raw --> parse --> parsed
parsed -->|semanticText| embed
parsed -->|qualifiers| apply
apply --> filt
apply -->|minScore, sortOrder| recompute
embed --> matmul
filt --> matmul
matmul --> reduce --> recompute --> view
```
Sources: [Sources/OmniKit/SearchQueryParser.swift:62-107](), [App/AppModel.swift:779-921](), [Sources/OmniKit/VectorStore.swift:485-523]()
## SearchQueryParser: splitting text from qualifiers
`SearchQueryParser` is an `enum` with only static methods, intentionally pure and dependency-free so it is unit-testable and can be shared across layers. Its output is a `ParsedQuery` carrying a `semanticText` string and an array of `Qualifier { key, value, negated }`. The grammar follows the lexical-qualifier convention of GitHub / Gmail / Spotlight: `key:value`, quoted values, multi-value (`type:image,video`), and negation (`-type:audio`).
The make-or-break rule is **colon adjacency plus a whitelist**. A run is a qualifier only if (a) its key canonicalizes to a known qualifier, and (b) a value follows the colon with no whitespace on either side. That single condition is what keeps prose like `notes about type: theory`, a time like `12:30`, a ratio like `3:1`, and a URL like `http://x` out of the filter channel.
```swift
// Sources/OmniKit/SearchQueryParser.swift
// Qualifier candidate: `word:value`, key whitelisted, value present, no space around `:`.
if i < n, s[i] == ":", let canon = canonicalKey(word), i + 1 < n, !s[i + 1].isWhitespace {
i += 1 // consume ':'
// ... read a quoted "..." value or a bare run to the next whitespace ...
if !value.isEmpty {
quals.append(.init(key: canon, value: value, negated: negated))
continue
}
}
// Not a qualifier: take the whole run (incl. leading '-' and inner ':') as one free-text span.
```
Anything that is not a qualifier is appended verbatim as a free-text span; `semanticText` is those spans joined in original order with single spaces. The two channels never overlap.
### Canonical keys and aliases
`canonicalKey` is the whitelist. It also folds aliases onto a canonical key, so the rest of the system only sees seven keys.
| User types | Canonical key | Meaning |
|------------|---------------|---------|
| `type`, `kind` | `type` | file category (image/video/audio/text) |
| `ext`, `extension` | `ext` | file extension (no dot) |
| `in`, `folder`, `path` | `in` | restrict to a folder |
| `date` | `date` | named date bucket |
| `after`, `since` | `after` | relative/absolute "newer than" |
| `score`, `relevance`, `min` | `score` | minimum relevance threshold |
| `sort` | `sort` | result ordering |
Keys are matched case-insensitively (the word is lowercased before lookup), but the value's case is preserved (`TYPE:Image` -> `type=Image`). Quoted values may contain spaces and support `\"` / `\\` escapes, which is how `in:"~/Documents/Project X"` survives as a single qualifier.
Sources: [Sources/OmniKit/SearchQueryParser.swift:5-107]()
### What the tests pin down
The parser's correctness gate is explicit because a parser that turns `12:30` into a filter is worse than no parser. The tests lock in the disjoint-channel behavior and every edge case of the adjacency rule:
| Input | qualifiers | semanticText |
|-------|-----------|--------------|
| `sunset photos type:image after:30d` | `type=image`, `after=30d` | `sunset photos` |
| `notes about type: theory` | (none) | `notes about type: theory` |
| `meeting at 12:30 ratio 3:1 see http://example.com` | (none) | unchanged |
| `color:red running shoes` | (none, unknown key) | `color:red running shoes` |
| `type:image,video beach` | `type=image,video` | `beach` |
| `meeting notes -type:audio` | `type=audio` (negated) | `meeting notes` |
| `budget type:` | (none, no value yet) | `budget type:` |
| `-france report` | (none) | `-france report` |
Note the last two: a trailing `type:` with no value (mid-typing) stays prose, and a bare leading `-` stays literal because Omni has no full-text exclusion.
Sources: [Tests/OmniKitTests/SearchQueryParserTests.swift:11-104]()
### Inline tinting mirrors, never drives, the filter
For cosmetic highlighting of qualifier tokens in the AppKit text field, the parser also exposes `qualifierNSRanges`, a regex-based pass that returns UTF-16 `NSRange`s. It is compiled once (regex compilation dwarfs matching, and this runs per keystroke on the main thread) and intentionally separate from `parse`. The regex mirrors `parse`'s notion of a qualifier, so any drift between them only mis-tints; it can never mis-filter, because filtering always goes through `parse`.
Sources: [Sources/OmniKit/SearchQueryParser.swift:43-60]()
## From qualifiers to filter state: AppModel.applyParsedQuery
`SearchQueryParser` decides *what* the qualifiers are; `AppModel.applyParsedQuery` decides what they *do*. It re-parses the raw box on every user edit and maps each qualifier onto a filter dimension. The governing principle is "the box owns only what it mentions": a filter the box previously set but no longer names is reset, while a filter set through the toolbar menu is left untouched (tracked via `stringOwnedFilters`).
| Qualifier | AppModel target | Mapping helper / notes |
|-----------|-----------------|------------------------|
| `type` | `filterKinds` | `mapKind` aliases (`photos`->image, etc.); `,`-split for multi-value; negated -> exclude set; `-type:x` becomes "everything but x" |
| `ext` | `filterExt` | strips a leading dot |
| `in` | `filterFolder` | `resolveFolder` expands `~` |
| `date` | `dateRange` | `DateRange(rawValue:)` |
| `after` | `dateRange` | `mapAfter`: named bucket or relative `7d/2w/3m/1y`, snapped to week/month/year |
| `score` | `minScore` | `mapScore`: `50%` or `0..1`, clamped |
| `sort` | `sortOrder` | `mapSort`: relevance/name/dateModified |
Two of these never reach the vector store. `currentFilter()` builds a `SearchFilter` from only `kinds`, `folderPrefix`, `ext`, and `since`; `score` and `sort` are deliberately excluded and applied afterward (see [Post-filtering](#post-filtering-minscore-and-sortorder)). A "literal mode" toggle short-circuits the whole mapping: it releases every box-owned filter and embeds the raw string as-is.
Sources: [App/AppModel.swift:779-921]()
## VectorStore: exact brute-force cosine over a bf16 matrix
The semantic channel ends at `VectorStore.search`. Embeddings are L2-normalized, so cosine similarity is just a dot product, and the store computes every dot product exactly, no approximate index. SQLite is the durable source of truth, but for scoring the store mirrors every vector into one contiguous in-memory buffer, `flat16`, holding `count * dim` `UInt16` values. These are `bf16` bits (2 bytes per dimension), which halves residency and disk versus `fp32` with negligible recall loss on normalized vectors. Crucially, the bytes are *reinterpreted* as `bf16` for the GPU, not converted at score time.
```swift
// Sources/OmniKit/VectorStore.swift (fp32 <-> bf16, round-to-nearest-even)
@inline(__always) static func toBF16(_ x: Float) -> UInt16 {
let b = x.bitPattern
return UInt16(truncatingIfNeeded: (b &+ 0x7FFF &+ ((b >> 16) & 1)) >> 16)
}
@inline(__always) static func fromBF16(_ x: UInt16) -> Float { Float(bitPattern: UInt32(x) << 16) }
```
Sources: [Sources/OmniKit/VectorStore.swift:84-124]()
### Base + delta: why scoring is two matmuls
A naive design would rebuild the GPU score matrix on every query. Instead the resident matrix is split so that ordinary indexing inserts do not force a recopy. `mlxBase` is an MLX-owned copy of rows `[0, baseRows)` (MLX copies the bytes at construction, so it is independent of `flat16`'s storage even as that buffer reallocates during indexing). Rows appended past `baseRows` are the "delta" and are scored per query with one small extra matmul. The 0.8 GB-class base is rebuilt only on a structural change (delete / reload) or once the delta grows past `foldThreshold` (50,000 rows); a plain indexing append just extends the delta.
```text
flat16: contiguous bf16, length = count * dim (row i = rows[i])
row 0 ┐
... ├─ mlxBase (MLX-owned, rows [0, baseRows)) ──matmul──┐
baseRows-1 ┘ ├─► scores[count]
baseRows ┐ │
... ├─ delta (rows [baseRows, count), <= foldThreshold) ──matmul┘
count-1 ┘
```
At query time the store builds `qv = MLXArray(query,[dim,1]).asType(.bfloat16)`, runs the base matmul and (if there is a delta) the delta matmul, fuses them into one GPU sync with a single `MLX.eval`, and reads both back into a single `Float` scores array. Base + delta covers all rows, so the result is identical to a full rebuild.
```swift
// Sources/OmniKit/VectorStore.swift (search core)
let qv = MLXArray(query, [dim, 1]).asType(.bfloat16)
let baseScore = MLX.matmul(mlxBase!, qv)
if n > baseRows {
// delta = rows [baseRows, n), built via bytesNoCopy then copied into an owned MLXArray
let ds = /* matmul(MLXArray(delta bytes, [deltaCount, dim], .bfloat16), qv) */
MLX.eval(baseScore, ds) // one fused GPU sync for both matmuls
// concatenate baseScore + ds into scores[n]
}
```
Search runs under the store's serial `queue` (the same lock as mutations). The header comment records that routing the matmul through the engine's priority gate or taking an off-lock snapshot was measured and both *hurt* latency or memory, so search stays under the lock; the real wins are base+delta and the numeric reducer below.
Sources: [Sources/OmniKit/VectorStore.swift:97-115](), [Sources/OmniKit/VectorStore.swift:476-523](), [Sources/OmniKit/VectorStore.swift:636-649]()
### reduceTopK: per-file best chunk via a bounded min-heap
The index stores one vector per *chunk*, but results are per *file*. `reduceTopK` collapses `N` chunk scores into the top-K files in two passes, and it is engineered to avoid two costs that dominated at ~420K rows: path-string hashing and a full sort of `String`-bearing structs.
1. **Per-file max.** Each row carries a dense `Int32` `fileID` (a flat-array lookup, not a path hash). The reducer keeps `bestScore` / `bestRow` per file. The hot case (no `kind`/`since` filter) runs over primitive buffers via unsafe pointers, never touching `rows[i]`, so there is no ARC traffic from copying the row's three `String`s.
2. **Top-K.** Instead of materializing a `SearchHit` for every file and sorting, it keeps a size-K min-heap (parallel `heapScore` / `heapRow` arrays). A file that cannot beat the current K-th is skipped, path-based filters (`folderPrefix` / `ext`) are applied to each file's winner via `filter.accepts`, and only the K survivors are turned into `SearchHit`s, then ordered by descending score.
```swift
// Sources/OmniKit/VectorStore.swift (top-K insertion against a size-K min-heap)
if heapScore.count >= topK && s <= heapScore[0] { continue } // can't beat the current K-th
let r = rows[Int(ri)]
if !filter.accepts(path: r.path, kind: r.kind, modified: r.modified) { continue }
if heapScore.count < topK {
heapScore.append(s); heapRow.append(ri); siftUp(heapScore.count - 1)
} else if s > heapScore[0] {
heapScore[0] = s; heapRow[0] = ri; siftDown(0)
}
```
A string-keyed `reduceTopKReference` is kept verbatim, not in production, purely as a differential-test oracle for `reduceTopK`.
### SearchFilter: what the store filters on
`SearchFilter` carries `kinds`, `folderPrefix`, `ext`, and `since`. Its `accepts` is path-boundary aware for folders (`path == f || path.hasPrefix(f + "/")`) and matches `ext` against a lowercased suffix. Score thresholding is intentionally *not* in `SearchFilter`: the comment notes the view fetches unfiltered-by-score so it can offer "show all".
Sources: [Sources/OmniKit/VectorStore.swift:65-82](), [Sources/OmniKit/VectorStore.swift:526-634]()
## Post-filtering: minScore and sortOrder
Two qualifiers, `score` and `sort`, are applied after the store returns. `AppModel.search` calls `store.search(..., topK: 60)` into `rawResults`; `recomputeResults` then filters by the relevance threshold and re-orders. This is memoized so the frequent indexing updates never re-filter the visible list.
```swift
// App/AppModel.swift
private func recomputeResults() {
let above = rawResults.filter { Self.relevance($0.score) >= minScore }
hiddenByThreshold = rawResults.count - above.count
switch sortOrder {
case .relevance: results = above
case .name: results = above.sorted { /* localized lastPathComponent compare */ }
case .dateModified: results = above.sorted { $0.modified > $1.modified }
}
}
```
`relevance` clamps the raw cosine into `0...1`; the default `minScore` is `0.0` (show everything, let the user raise the bar). The view renders each hit's score as a whole-percent string (`String(format: "%.0f%%", ...)`).
Sources: [App/AppModel.swift:98-103](), [App/AppModel.swift:591-605](), [App/AppModel.swift:1254-1308](), [App/ResultsList.swift:370]()
## Why this shape
The recurring decision across this path is **keep the two channels disjoint and push work to its cheapest place**. The parser is pure so its rules are testable in isolation and identical wherever they run. Qualifiers that change the candidate set (`type`/`ext`/`in`/`date`/`after`) ride into the store as a `SearchFilter` and prune inside the hot loop; qualifiers that only re-rank what is already found (`score`/`sort`) stay in the view, so raising a threshold or changing the order never re-embeds or re-scores. The scoring itself stays exact: a resident `bf16` matrix split into a stable GPU base plus a tiny per-query delta means an indexing append costs a delta matmul rather than a full rebuild, and a dense-`fileID` min-heap turns the per-file top-K into `O(F log K)` over primitive buffers. The throughline is provider-neutral and portable: the parser and store know nothing about which embedding model produced the vectors, only that they are L2-normalized and share `dim`, so the same search path works for any backend that emits compatible embeddings.
Sources: [Sources/OmniKit/SearchQueryParser.swift:17-27](), [Sources/OmniKit/VectorStore.swift:65-82](), [Sources/OmniKit/VectorStore.swift:476-523]()
---
## 06. The SwiftUI App: AppModel, Views, and Window Commands
> The app shell that drives the engine: the @main scene and menu-bar command groups, the AppModel state object that owns indexing and search, the main content layout (sidebar, results list, Quick Look, settings), and how onboarding and updater hooks are wired into launch.
- Page Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/06-the-swiftui-app-appmodel-views-and-window-commands.md
- Generated: 2026-06-08T13:33:00.181Z
### Source Files
- `App/OmniApp.swift`
- `App/AppModel.swift`
- `App/ContentView.swift`
- `App/Sidebar.swift`
- `App/ResultsList.swift`
- `App/SettingsView.swift`
- `App/OnboardingView.swift`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [App/OmniApp.swift](App/OmniApp.swift)
- [App/AppModel.swift](App/AppModel.swift)
- [App/ContentView.swift](App/ContentView.swift)
- [App/Sidebar.swift](App/Sidebar.swift)
- [App/ResultsList.swift](App/ResultsList.swift)
- [App/SettingsView.swift](App/SettingsView.swift)
- [App/OnboardingView.swift](App/OnboardingView.swift)
- [App/Updater.swift](App/Updater.swift)
- [App/QuickLook.swift](App/QuickLook.swift)
</details>
# The SwiftUI App: AppModel, Views, and Window Commands
`OmniKit` is the engine (encoders, vector store, indexer). The `App` target is the thin SwiftUI shell that drives it: one window, one observable state object, and a set of views that render whatever that state currently says. If you are new to the repo, this is the layer to read first — it shows *what the app does* before you dig into *how embeddings are computed*. Everything user-visible flows through one type, `AppModel`, which owns the engine, the vector store, the indexer, search, history, settings, and the in-process serving layer. The views are deliberately dumb: they read `AppModel` and call its methods.
Two ideas anchor the whole shell. First, **the menu bar is the source of truth for actions** — keyboard shortcuts live on `CommandGroup` buttons, and the toolbar/context menus are click targets that name the same shortcuts (so a shortcut works even when its toolbar button is hidden). Second, **`AppModel.phase` drives the entire detail pane** — a four-state machine (`loadingModel`, `noModel`, `ready`, `failed`) decides whether you see a spinner, onboarding, the search UI, or an error.
## Ownership map: who holds what
The app has exactly one long-lived state object. `OmniApp` constructs it once with `@State`, injects it into both scenes via `.environment(model)`, and every view reaches it with `@Environment(AppModel.self)`. `AppModel` in turn privately owns the `OmniKit` types — views never touch the engine directly.
```text
App target (SwiftUI shell) OmniKit (engine)
┌────────────────────────────────────────┐ ┌──────────────────────┐
│ OmniApp @main │ │ │
│ ├─ Window "main" → ContentView │ │ │
│ ├─ .commands {CommandGroups} │ │ │
│ ├─ Settings → SettingsView │ │ │
│ └─ .task → Updater.checkOnLaunchIfDue │ │ │
│ │ .environment(model) │ │ │
│ ▼ │ owns │ │
│ AppModel @MainActor @Observable ─────┼────────▶│ OmniEngine │
│ phase / query / fileQuery │ (private)│ VectorStore │
│ results / selection / previewURL │◀────────┤ Indexer / FSWatcher │
│ roots / settings / searchHistory │ │ ServingController │
│ ▲ read via @Environment │ └──────────────────────┘
│ ┌────────┴───────────────────────┐ │
│ ContentView (NavigationSplitView)│ │
│ ├─ Sidebar (folders + history) │ │
│ ├─ detail: switch on phase │ │
│ │ ├─ OnboardingView │ │
│ │ ├─ EngineFailedView │ │
│ │ └─ ResultsList / empty │ │
│ └─ .searchable toolbar field │ │
└────────────────────────────────────────┘
```
Sources: [App/OmniApp.swift:4-16](App/OmniApp.swift), [App/AppModel.swift:93-95](App/AppModel.swift), [App/ContentView.swift:5-13](App/ContentView.swift)
## The `@main` scene and launch hooks
`OmniApp` is the entry point. Its `body` declares two scenes: a single `Window` (not a `WindowGroup` — Omni is a one-window app) hosting `ContentView`, and a `Settings` scene hosting `SettingsView`. Both receive the shared model through `.environment(model)`. The window enforces a minimum content size and a default 1000×660.
The launch-time wiring is compact:
- `.task { Updater.checkOnLaunchIfDue() }` fires the silent, once-per-day update check when the window appears.
- `AppModel()` runs its `init`, which loads persisted state from `UserDefaults` and then kicks `Task { await bootstrap() }` to load the model and index.
```swift
// App/OmniApp.swift
@main
struct OmniApp: App {
@State private var model = AppModel()
var body: some Scene {
Window("Omni", id: "main") {
ContentView()
.environment(model)
.frame(minWidth: 820, minHeight: 520)
.task { Updater.checkOnLaunchIfDue() } // silent once-a-day check
}
.defaultSize(width: 1000, height: 660)
.windowResizability(.contentMinSize)
.commands { /* command groups */ }
Settings { SettingsView().environment(model) }
}
}
```
Sources: [App/OmniApp.swift:4-94](App/OmniApp.swift), [App/AppModel.swift:357-370](App/AppModel.swift)
## Menu-bar command groups
The `.commands { ... }` block is where Omni declares its keyboard interactions and menu-bar entries. Each `CommandGroup` is positioned relative to a system anchor, and most buttons gate themselves on `AppModel` computed properties (`hasSelection`, `hasActiveSearch`, `canIndex`, `isIndexing`). Because the menu bar *owns* the shortcuts, an action like Bookmark (`⌘D`) works even when its toolbar star is hidden — the comments in the file call this out explicitly.
| Command group anchor | Entries | Shortcut | Gated on |
|---|---|---|---|
| `replacing: .appInfo` | About Omni, Check for Updates…, Run Profiling… | — | profiling disabled while running / `!canIndex` |
| `after: .newItem` | Open / Quick Look / Reveal in Finder | `⌘O` / `⌘Y` / `⇧⌘R` | `hasSelection` |
| `after: .newItem` | Bookmark / Remove Bookmark Search | `⌘D` | `hasActiveSearch` |
| `after: .sidebar` | View as Gallery / List; Sort By | `⌘1` / `⌘2` | — |
| `after: .toolbar` | Index / Update / Resume Indexing | `⇧⌘I` | `isIndexing || !canIndex` |
| `after: .toolbar` | Pause Indexing | — | `!isIndexing` |
| `after: .textEditing` | Find (focus search field) | `⌘F` | — |
| `replacing: .help` | Omni Keyboard Shortcuts | `⌘/` | — |
Two details worth noting. The View options are added `after: .sidebar` — i.e. they extend the *system* View menu that `NavigationSplitView` already provides (Show Sidebar / Full Screen), rather than creating a second "View" menu. And `Find` (`⌘F`) is implemented by hand: `.searchable` does not bind `⌘F`, so the command walks `NSApp.keyWindow` to find the `NSSearchToolbarItem` and makes its field first responder. The Help command opens a retained, reused `NSWindow` (`ShortcutsView`) so reopening is instant.
Sources: [App/OmniApp.swift:17-111](App/OmniApp.swift)
## `AppModel`: the state object
`AppModel` is a `@MainActor @Observable final class`. It is large because it is the single coordination point for the whole app; the table below groups its responsibilities. SwiftUI re-renders automatically when observed properties change, and almost all heavy work (embedding, store search, stats) is dispatched off the main actor with `Task.detached`, hopping back only to assign small results.
| Area | Representative state / methods |
|---|---|
| Lifecycle | `phase` (`loadingModel`/`noModel`/`ready`/`failed`), `bootstrap()`, `retryBootstrap()` |
| Query | `rawQuery` (literal box text), `query` (semantic remainder), `fileQuery`, `applyParsedQuery()`, `literalQuery` |
| Results | `rawResults` (filtered) → `results` (thresholded + sorted), `recomputeResults()`, `selection`, `previewURL` |
| Search | `search()`, `setFileQuery()`, query-vector LRU cache (`queryEmbedCache`) |
| Indexing | `indexState`, `progress`, `startIndexing()`, `pauseIndexing()`, FSEvents `watcher`, `pausedRoots` |
| Roots | `roots`, `addRoot()`, `removeRoot()`, `canonicalizeRoots()` |
| History | `searchHistory`, `historyGroups`, `recordCurrentSearchToHistory()`, `toggleBookmarkCurrentSearch()` |
| Settings | `settings`, perf knobs (`maxImageDimension`, `maxMemoryGB`, …), persisted via `UserDefaults` |
| Serving | `serving` (`ServingController`), attached during `bootstrap()` |
A small but important design choice: the search box binds to `rawQuery` (exactly what the user typed, qualifiers and all), and `query` is the *derived* semantic text after `applyParsedQuery` strips `key:value` qualifiers (`type:`, `ext:`, `in:`, `date:`, `score:`, `sort:`). The "box owns only what it mentions" rule means a filter the box previously set but no longer names is cleared, while a filter set from the toolbar menu is left alone (`stringOwnedFilters`).
Sources: [App/AppModel.swift:93-355](App/AppModel.swift), [App/AppModel.swift:779-847](App/AppModel.swift), [App/AppModel.swift:589-619](App/AppModel.swift)
### The phase state machine
`AppModel.Phase` is what `ContentView.detail` switches on, so it effectively chooses the whole right-hand screen. `bootstrap()` loads the store and engine concurrently (`async let`), and the path through the machine depends on whether a model is found and whether loading succeeds.
```mermaid
stateDiagram-v2
[*] --> loadingModel: AppModel.init → bootstrap()
loadingModel --> noModel: resolvedModelDir() == nil
loadingModel --> ready: store + engine loaded
loadingModel --> failed: VectorStore / OmniEngine throws
noModel --> loadingModel: downloadModel() / setModelDir()
failed --> loadingModel: retryBootstrap() / Choose Model Folder
ready --> loadingModel: switchVariant() / setDatabaseDir()
note right of ready
detail shows ResultsList or an
empty-state CenteredStatus;
canIndex → startIndexing()
end note
note right of noModel
detail shows OnboardingView
end note
```
On entering `ready`, `bootstrap()` hands the live engine and store to the serving layer (`serving.attach(...)`), restarts the FSEvents watcher, optionally compacts a sparse index, and — if `canIndex` — kicks a background index pass so the index silently catches up on every launch.
Sources: [App/AppModel.swift:96-105](App/AppModel.swift), [App/AppModel.swift:993-1044](App/AppModel.swift), [App/ContentView.swift:80-91](App/ContentView.swift)
### How a search runs
Typing is debounced in `ContentView` (180 ms), then `AppModel.search()` does the work. It is token-guarded (`searchToken`) so stale results are discarded, and it embeds off the main actor. A query-side embedding cache lets repeated / history / bookmarked queries skip the GPU entirely — instant, and crucially GPU-free while indexing runs.
```mermaid
sequenceDiagram
participant U as User
participant CV as ContentView
participant AM as AppModel
participant E as OmniEngine
participant S as VectorStore
U->>CV: type in .searchable box
CV->>AM: applyParsedQuery(raw) → set query/filters
CV->>CV: scheduleSearch() debounce 180ms
CV->>AM: search()
alt query vector cached
AM->>S: store.search(cached, filter, topK: 60)
else embed needed
AM->>E: embedQuery(q) (off-actor, high priority)
E-->>AM: vector
AM->>S: store.search(vec, filter, topK: 60)
end
S-->>AM: [SearchHit] → rawResults (didSet recomputeResults)
AM-->>CV: results (thresholded + sorted) re-render
```
File-as-query (`setFileQuery`, drag-and-drop, or "Find Similar") follows the same shape but calls `engine.embedFileQuery` first — any modality works because the embedding space is shared. Results feed `rawResults`, whose `didSet` runs `recomputeResults()` to apply the relevance threshold (`minScore`) and `sortOrder`, producing the `results` the views render.
Sources: [App/ContentView.swift:18-37](App/ContentView.swift), [App/ContentView.swift:320-326](App/ContentView.swift), [App/AppModel.swift:1254-1319](App/AppModel.swift), [App/AppModel.swift:1230-1247](App/AppModel.swift)
## The content layout
`ContentView` renders a `NavigationSplitView`: `Sidebar` on the left, and a `detail` that switches on `phase`. The whole split is wrapped in `.searchable` only when `showsSearch` is true (`phase == .ready && !roots.isEmpty`) — progressive disclosure, so the search field is *hidden* (not dimmed) during loading and onboarding. The toolbar is also progressive: search-by-file and the filter/sort/view controls appear only once there are results to act on, with the filter menu kept reachable whenever a filter is active so a filter that hides everything can still be cleared.
The `ready` content is a `VStack` of: an optional `FileQueryChip` or `QualifierBar`, then either `ResultsList` (when there are results) or an `emptyState`. The empty state is itself a small decision tree of `CenteredStatus` cards — "Add a folder", "couldn't search by that file", obsolete-index ("Switch to … or reindex"), the calm idle/searching prompt, threshold-hidden matches, and active-filter dead ends. The content view is also a Finder drop target: dropping a supported file anywhere starts a search by that file.
Sources: [App/ContentView.swift:25-164](App/ContentView.swift), [App/ContentView.swift:187-276](App/ContentView.swift)
### Sidebar: folders and history
`Sidebar` is a single `List` driven by a `SidebarSelection` enum (`.folder(URL)` or `.history(String)`) so folders and saved searches share native source-list behavior (focus highlight, arrow keys, Delete). The Folders section shows a per-folder file count, a pause glyph for paused folders, or a `CloudSyncPie` — an iCloud-Drive-style pie that fills with real indexing progress (or an indeterminate spinner for a brief reconcile). History rows are grouped (`historyGroups`: Bookmarks, then Today / Yesterday / Previous 7 Days / Earlier); selecting one re-runs that search with its saved filters via `runHistoryQuery`. Right-click menus handle pause/resume, reveal, bookmark, and remove; folders also accept dragged-in directories from Finder.
Sources: [App/Sidebar.swift:7-142](App/Sidebar.swift), [App/Sidebar.swift:195-243](App/Sidebar.swift)
### Results list, gallery, and Quick Look
`ResultsList` renders either a list (`LazyVStack` of `ResultRow`) or a gallery (`LazyVGrid` of `ResultGridItem`), chosen by `model.viewMode`. It deliberately does *not* use `List(selection:)` — a plain scroll lets click, double-click, right-click, and arrow-key navigation behave identically in both modes, all driven through `AppModel.selection` and `moveSelection(rowDelta:)`. Text rows are expandable: toggling one loads ranked matching passages (`model.passages(for:)`, computed off-actor) into a `PassagesView`.
Quick Look is wired two ways: `.quickLookPreview(...)` binds to `model.previewURL`, and an invisible `QuickLookKeyMonitor` installs an `NSEvent` local monitor so the space bar toggles the preview regardless of which subview holds focus (a focus-based `.onKeyPress` is swallowed by the list). While the panel is open, the monitor also routes arrow keys to `moveSelection`, and `selection.didSet` keeps `previewURL` on the newly selected row — so arrowing updates the live preview, Finder-style. Space is left untouched while editing the search field.
Sources: [App/ResultsList.swift:6-93](App/ResultsList.swift), [App/ResultsList.swift:149-171](App/ResultsList.swift), [App/QuickLook.swift:9-56](App/QuickLook.swift), [App/AppModel.swift:124-163](App/AppModel.swift)
### Settings
The `Settings` scene is a six-tab `TabView`: Indexing, Content, Performance, Storage, History, and Serving. The pane sizes to the selected tab (`.fixedSize` vertically) rather than forcing one height. The Indexing tab is the live home for index status and the manual Index / Reindex / Pause controls, reading `model.indexState`, `model.progress`, and the smoothed `filesPerSec` / `tokensPerSec` throughput.
Sources: [App/SettingsView.swift:5-90](App/SettingsView.swift)
## Onboarding and the updater
**Onboarding** is not a separate flow — it is the `noModel` phase. When `resolvedModelDir()` finds no complete model, `detail` renders `OnboardingView`, which offers two on-device model downloads (Nano ~1.9 GB, Small ~3.1 GB) plus a "Choose Model Folder…" escape hatch. `downloadModel(_:)` streams progress into `downloadFraction`/`downloadLabel`, and on completion calls `setModelDir`, which flips `phase` back to `loadingModel` and re-runs `bootstrap()`. The same view emphasizes the privacy stance: everything runs on-device.
**The updater** is a dependency-free in-app updater (no Sparkle). The launch hook `Updater.checkOnLaunchIfDue()` checks a JSON manifest at most once per 24 hours and stays silent unless a newer build exists; the menu's "Check for Updates…" calls `Updater.check(userInitiated: true)`, which also reports "up to date" and errors. When an update is accepted, it downloads the versioned DMG with progress, verifies its MD5, mounts and stages the app (`ditto` + a `codesign --verify --deep --strict` gate), then a detached shell helper waits for the app to quit, re-verifies the signature, replaces the bundle in place, and relaunches. If anything fails, it falls back to saving the DMG to Downloads and opening it for a manual drag-install.
```text
launch ─▶ .task → checkOnLaunchIfDue() menu ─▶ check(userInitiated: true)
│ (>24h since last) │
▼ ▼
fetch latest.json ──▶ isNewer? ──no──▶ (silent) / "up to date"
│ yes
▼
download DMG ─▶ verify MD5 ─▶ mount+stage (ditto, codesign --verify)
│ │ failure
▼ ▼
detached helper: wait → re-verify → fallback(): save DMG to
rm old → ditto new → relaunch Downloads, open installer
```
Sources: [App/OnboardingView.swift:5-79](App/OnboardingView.swift), [App/AppModel.swift:960-991](App/AppModel.swift), [App/Updater.swift:9-119](App/Updater.swift), [App/Updater.swift:170-216](App/Updater.swift)
## Summary
The SwiftUI shell is intentionally small and centralized. `OmniApp` (`@main`) declares one window plus a settings scene, registers the menu-bar command groups that own every keyboard shortcut, and wires the once-a-day update check into launch. `AppModel` is the one `@Observable` object that owns the engine, store, indexer, serving layer, search, history, and settings — and its `phase` state machine decides whether the user sees loading, onboarding, an error, or the live search UI. The views (`ContentView`, `Sidebar`, `ResultsList`, `SettingsView`, `OnboardingView`) are thin readers of that state, with heavy work pushed off the main actor and shortcuts/toolbar/context-menus kept consistent because they all route back through the same `AppModel` methods. To extend the app, the reliable pattern is: add state and a method to `AppModel`, then expose it from a view and (if it deserves a shortcut) a `CommandGroup` button.
This page describes the `App` target only; the embedding, vector store, and indexer it drives live in `OmniKit` and are documented separately.
---
## 07. Local Embedding Server: OpenAI / Cohere / Gemini-Compatible APIs
> A subsystem the README never mentions: an in-app HTTP server that exposes the engine as drop-in embedding APIs. Covers the Router and auth gate, the single ServingBackend seam onto OmniEngine + VectorStore, the per-provider SchemaAdapters (/v1/embeddings, /v1/embed, /v2/embed, Gemini :embedContent, /v1/search), and the controller/tab/log that manage it.
- Page Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/07-local-embedding-server-openai-cohere-gemini-compatible-apis.md
- Generated: 2026-06-08T13:33:23.835Z
### Source Files
- `App/Serving/Router.swift`
- `App/Serving/ServingBackend.swift`
- `App/Serving/SchemaAdapters.swift`
- `App/Serving/HTTPServer.swift`
- `App/Serving/ServingController.swift`
- `App/Serving/ServingTab.swift`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [App/Serving/Router.swift](App/Serving/Router.swift)
- [App/Serving/ServingBackend.swift](App/Serving/ServingBackend.swift)
- [App/Serving/SchemaAdapters.swift](App/Serving/SchemaAdapters.swift)
- [App/Serving/HTTPServer.swift](App/Serving/HTTPServer.swift)
- [App/Serving/HTTPMessage.swift](App/Serving/HTTPMessage.swift)
- [App/Serving/ServingController.swift](App/Serving/ServingController.swift)
- [App/Serving/ServingTab.swift](App/Serving/ServingTab.swift)
- [App/Serving/ServingLog.swift](App/Serving/ServingLog.swift)
- [App/AppModel.swift](App/AppModel.swift)
- [Sources/OmniKit/VectorStore.swift](Sources/OmniKit/VectorStore.swift)
</details>
# Local Embedding Server: OpenAI / Cohere / Gemini-Compatible APIs
Omni ships an HTTP server inside the app. The README pitches the project as "no server, no cloud" (that line refers to the *embedding runtime* never being a Python service), so a first-time reader will not learn from the README that the app can also turn itself into a drop-in embeddings endpoint for OpenAI, Jina, Cohere, and Gemini SDKs, plus a native search route. Everything under `App/Serving/` implements that subsystem, and it is wired in through a single `attach()` call from `AppModel`.
This page is the orientation map for that directory. It covers the layering (transport → routing → adapters → backend seam → engine), the auth/scope security model, the per-provider request and response shapes, and the SwiftUI controller and tab that start, persist, and observe the server. The design goal worth keeping in mind while reading: the serving layer adds **no new locks and no new vector math**. It is a thin, stateless translation membrane in front of the existing `OmniEngine` and `VectorStore`.
Sources: [README.md:19](README.md#L19), [App/AppModel.swift:351-355](App/AppModel.swift#L351-L355), [App/AppModel.swift:1013-1017](App/AppModel.swift#L1013-L1017)
## Where to start reading
Read the files in dependency order, bottom-up:
| Layer | File | Responsibility |
| --- | --- | --- |
| Transport | `HTTPServer.swift` | Network.framework listener, connection loop, keep-alive, body cap. Knows nothing about embeddings. |
| Wire types | `HTTPMessage.swift` | `HTTPRequest`/`HTTPResponse` value types + a hand-rolled HTTP/1.1 parser (`HTTPParse`). |
| Routing | `Router.swift` | Method+path → adapter dispatch, plus the auth gate and per-provider 401/404 envelopes. |
| Translation | `SchemaAdapters.swift` | Stateless per-provider enums that parse JSON and emit provider-shaped JSON. |
| Engine seam | `ServingBackend.swift` | The only protocol that touches `OmniKit`; `EngineServingBackend` wraps engine + store. |
| Control | `ServingController.swift` | `@MainActor @Observable` lifecycle, persistence, auth snapshot, log coalescing. |
| UI | `ServingTab.swift`, `ServingLog.swift` | Settings tab and the live request log row/model. |
The cleanest mental model: `HTTPServer` and `HTTPMessage` are pure transport; `Router` + `SchemaAdapters` are pure translation; `ServingBackend` is the *one* seam onto the engine; `ServingController` + `ServingTab` are the main-actor control plane.
## System architecture
The diagram below shows the ownership boundaries. Note the actor boundary: the controller and UI live on the `@MainActor`, while every request is serviced off it — on the server's own `DispatchQueue` and in a detached `Task` — with only `LogEntry` marshalled back.
```mermaid
flowchart TB
subgraph MainActor["@MainActor control plane"]
Tab["ServingTab (SwiftUI)"]
Ctrl["ServingController\n(@Observable, persists omni.serving.*)"]
AppModel["AppModel.attach(engine, store, modelName)"]
end
subgraph OffActor["Off-actor request path (DispatchQueue + detached Task)"]
Srv["HTTPServer\n(NWListener, keep-alive, 8MB cap)"]
Parse["HTTPParse / HTTPRequest / HTTPResponse"]
Router["Router\n(auth gate + dispatch)"]
subgraph Adapters["SchemaAdapters (stateless enums)"]
OAI["OpenAIJinaAdapter"]
Coh["CohereAdapter (v1/v2)"]
Gem["GeminiAdapter"]
Search["SearchAdapter"]
Health["HealthAdapter"]
end
Backend["EngineServingBackend\n(ServingBackend seam, @unchecked Sendable)"]
end
subgraph Engine["OmniKit (thread-safe)"]
OmniEngine["OmniEngine\nembedTextBatch / embedQuery"]
Store["VectorStore\nsearch()"]
end
AppModel -->|attach| Ctrl
Tab -->|binds enabled/scope/port/token| Ctrl
Ctrl -->|builds Router + auth closure, starts| Srv
Srv --> Parse --> Router
Router --> OAI & Coh & Gem & Search & Health
OAI & Coh & Gem & Search --> Backend
Backend --> OmniEngine
Backend --> Store
Srv -. "LogEntry (coalesced)" .-> Ctrl
```
Sources: [App/Serving/HTTPServer.swift:11-35](App/Serving/HTTPServer.swift#L11-L35), [App/Serving/Router.swift:5-54](App/Serving/Router.swift#L5-L54), [App/Serving/ServingController.swift:95-141](App/Serving/ServingController.swift#L95-L141), [App/Serving/ServingBackend.swift:6-52](App/Serving/ServingBackend.swift#L6-L52)
## The transport: HTTPServer + HTTPParse
`HTTPServer` is a hand-rolled HTTP/1.1 server on `Network.framework`. It binds an `NWListener` on all interfaces (the reliable path; `requiredLocalEndpoint` with a port throws `EINVAL`), and for `local` scope it instead enforces loopback **per connection** by cancelling any peer whose endpoint is not `127.0.0.1`/`::1`/`localhost`. Bind failures such as a busy port arrive asynchronously through `onFailure`, which the controller maps to a `portInUse`/`failed` state.
Connection callbacks all fire on a single serial `DispatchQueue` (`omni.serving.http`), so there is no actor hop on the hot path. Per request, the server hands the parsed `HTTPRequest` to a detached `Task` that awaits the async handler, then hops back to the queue to write the response and continue the keep-alive loop. The parser (`HTTPParse.tryParse`) handles pipelining (it drains residual bytes already buffered), enforces an 8 MB body cap (`413`), and explicitly rejects chunked transfer encoding — only `Content-Length` framing is supported.
```swift
// App/Serving/HTTPServer.swift:176-188 — service off the main actor, then log
Task { [weak self] in
guard let self else { return }
let resp = await self.handler(req)
let ms = Double(DispatchTime.now().uptimeNanoseconds - started.uptimeNanoseconds) / 1_000_000.0
let entry = LogEntry(time: Date(), method: req.method, path: req.routePath,
status: resp.status, ms: ms, client: client)
self.onLog(entry)
// ... hop back to queue to write + continue ...
}
```
`HTTPResponse.json(_:status:)` is the single response constructor used everywhere; `serialize(keepAlive:)` adds framing headers (`Content-Length`, `Connection`, `Date`) and a small reason-phrase table.
Sources: [App/Serving/HTTPServer.swift:39-112](App/Serving/HTTPServer.swift#L39-L112), [App/Serving/HTTPServer.swift:116-209](App/Serving/HTTPServer.swift#L116-L209), [App/Serving/HTTPMessage.swift:45-105](App/Serving/HTTPMessage.swift#L45-L105), [App/Serving/HTTPMessage.swift:132-199](App/Serving/HTTPMessage.swift#L132-L199)
## The Router and the auth gate
`Router` is a `Sendable` value type so it can be captured by the `@Sendable` handler closure. It carries the backend and a pre-snapshotted `auth` closure (no main-actor state). `handle()` does three things in order:
1. **Liveness first, pre-auth.** `GET /health` and `GET /v1/models` are always open.
2. **Auth gate.** If `auth(req)` fails, it returns a 401 whose JSON envelope is shaped by path prefix — Gemini gets `{error:{status:"UNAUTHENTICATED"}}`, Cohere gets `{message:"invalid api token"}`, everything else gets the OpenAI `{error:{code:"invalid_api_key"}}` shape.
3. **Dispatch.** A `(method, route)` switch covers the fixed paths; Gemini is matched separately because its action lives after a `:` in `/v1beta/models/{model}:embedContent`.
```swift
// App/Serving/Router.swift:24-51 — dispatch table
switch (req.method, route) {
case ("POST", "/v1/embeddings"): return OpenAIJinaAdapter.handle(req, backend)
case ("POST", "/v1/embed"): return CohereAdapter.handle(req, backend, v2: false)
case ("POST", "/v2/embed"): return CohereAdapter.handle(req, backend, v2: true)
case ("POST", "/v1/search"): return SearchAdapter.handle(req, backend)
default: break
}
// Gemini: /v1beta/models/{model}:embedContent | :batchEmbedContents
```
The auth closure itself is built in `ServingController.startServer()` as a snapshot, so a token rotation requires a restart. It accepts the token via three transports to match each provider's SDK convention: `Authorization: Bearer`, the Gemini `x-goog-api-key` header, or a `?key=` query parameter.
```swift
// App/Serving/ServingController.swift:97-109 — token only when public AND set
let requireToken = isPublic && !token.isEmpty
let auth: @Sendable (HTTPRequest) -> Bool = { req in
guard requireToken else { return true }
if req.bearer == token { return true }
if req.googApiKey == token { return true }
if req.query["key"] == token { return true }
return false
}
```
Sources: [App/Serving/Router.swift:11-79](App/Serving/Router.swift#L11-L79), [App/Serving/ServingController.swift:95-109](App/Serving/ServingController.swift#L95-L109), [App/Serving/HTTPMessage.swift:30-43](App/Serving/HTTPMessage.swift#L30-L43)
## The ServingBackend seam
`ServingBackend` is the only place the serving layer touches `OmniKit`. It is deliberately tiny — `dim`, `modelName`, `embedBatch(_:query:)`, `search(_:topK:filter:)` — so adapters never reach into the engine directly. `EngineServingBackend` is the production conformer; its `@unchecked Sendable` is justified in-source because it adds no mutable state and every member it calls is documented thread-safe (the engine's `NSCondition` run-gate, the store's serial queue), letting it be called straight from the connection's detached `Task`.
Two behaviors matter for callers:
- **Priority routing.** `query == true` routes through the engine's high-priority query path (`OmniInputType.query`); otherwise the low-priority passage/indexing path. Each adapter decides this from its provider's own field (see table below).
- **Batch splitting.** Large client batches are chunked into groups of `groupCap = 48` to match the indexer's forward-pass width, so serving never exceeds the engine's batch expectations. Output order matches input order.
```swift
// App/Serving/ServingBackend.swift:33-46 — split, embed, preserve order
let type: OmniInputType = query ? .query : .passage
while i < texts.count {
let end = min(i + groupCap, texts.count)
out.append(contentsOf: engine.embedTextBatch(Array(texts[i..<end]), as: type))
i = end
}
```
Sources: [App/Serving/ServingBackend.swift:6-52](App/Serving/ServingBackend.swift#L6-L52)
## The per-provider SchemaAdapters
All adapters are stateless `enum`s. They parse with `JSONSerialization`, call the backend, and emit provider-shaped JSON. A shared invariant is stated at the top of the file: the engine emits fixed **1024-d, L2-normalized** float vectors; adapters **never truncate, requantize, or fabricate** vectors. Only the `usage`/`billed_units` token counts are an acknowledged whitespace heuristic (`tokenEstimate`).
| Endpoint | Method | Adapter | Provider shape | Query-path signal |
| --- | --- | --- | --- | --- |
| `/v1/embeddings` | POST | `OpenAIJinaAdapter` | OpenAI `{object:"list", data:[{embedding}]}` + Jina | `task == "query"` or `task` ends `.query` |
| `/v1/embed` | POST | `CohereAdapter(v2:false)` | Cohere v1; bare list unless `embedding_types` given | `input_type == "search_query"` |
| `/v2/embed` | POST | `CohereAdapter(v2:true)` | Cohere v2; always `{float:[...]}`; `input_type` required | `input_type == "search_query"` |
| `/v1beta/models/{m}:embedContent` | POST | `GeminiAdapter(batch:false)` | `{embedding:{values}}` | `taskType` ∈ {`RETRIEVAL_QUERY`,`QUESTION_ANSWERING`,`CODE_RETRIEVAL_QUERY`} |
| `/v1beta/models/{m}:batchEmbedContents` | POST | `GeminiAdapter(batch:true)` | `{embeddings:[{values}]}` | any request's `taskType` is a query type |
| `/v1/search` | POST | `SearchAdapter` | `{query, results:[{path,score,snippet,kind,modified}]}` | n/a (always high-priority query embed) |
| `/health`, `/v1/models` | GET | `HealthAdapter` | status / OpenAI model list | open, pre-auth |
Provider-specific quirks worth knowing before you debug a 400:
- **OpenAI/Jina** accept `input` as a string, `[String]`, or `[{text:...}]`, and emit base64 little-endian Float32 when either OpenAI's `encoding_format` or Jina's `embedding_type` requests `base64`.
- **Cohere** only produces `float`; any other `embedding_types` value is a 400 (`unsupported embedding_type`). v2 requires `input_type`. Texts come from `texts` or the v4 multimodal `inputs[].text`.
- **Gemini** accepts `outputDimensionality` only if it equals the engine's `dim` (the engine never truncates); otherwise `INVALID_ARGUMENT`. Text is the joined `content.parts[].text`.
- **Search** clamps `top_k` to `[1, 200]`, builds a `SearchFilter` from `filters` (`kinds`, `folder`, `ext`, `since`), clamps scores to `[0,1]`, and maps `SearchHit` fields straight through.
```swift
// App/Serving/SchemaAdapters.swift:119-136 — Cohere refuses to fake quantized vectors
if let bad = requestedTypes.first(where: { $0 != "float" }) {
return cohereError("unsupported embedding_type: \(bad)")
}
// v2 always emits the object form; v1 emits bare list unless embedding_types given.
let embeddings: Any = (v2 || !requestedTypes.isEmpty) ? ["float": floatRows] : floatRows
```
Sources: [App/Serving/SchemaAdapters.swift:1-104](App/Serving/SchemaAdapters.swift#L1-L104), [App/Serving/SchemaAdapters.swift:106-234](App/Serving/SchemaAdapters.swift#L106-L234), [App/Serving/SchemaAdapters.swift:236-289](App/Serving/SchemaAdapters.swift#L236-L289), [Sources/OmniKit/VectorStore.swift:36-71](Sources/OmniKit/VectorStore.swift#L36-L71)
## Request lifecycle
The sequence below traces one POST through every boundary, including the off-actor service and the coalesced log hop back to the main actor.
```mermaid
sequenceDiagram
participant C as Client (SDK/curl)
participant S as HTTPServer (queue)
participant P as HTTPParse
participant R as Router
participant A as SchemaAdapter
participant B as EngineServingBackend
participant E as OmniEngine / VectorStore
participant Ctrl as ServingController (@MainActor)
C->>S: TCP + HTTP/1.1 request
S->>P: tryParse(buffer)
P-->>S: HTTPRequest (or "need more bytes")
S->>R: await handler(req) (detached Task, off main actor)
alt /health or /v1/models (GET)
R-->>S: open response, pre-auth
else authorized
R->>A: dispatch by (method, path)
A->>B: embedBatch(query:) / search(...)
B->>E: embedTextBatch / embedQuery / store.search
E-->>A: 1024-d L2 vectors / hits
A-->>R: provider-shaped JSON
else unauthorized
R-->>S: 401 (provider-shaped envelope)
end
R-->>S: HTTPResponse
S->>Ctrl: onLog(LogEntry) (Task @MainActor, coalesced)
S->>C: serialize + write; keep-alive loop
```
Sources: [App/Serving/HTTPServer.swift:116-209](App/Serving/HTTPServer.swift#L116-L209), [App/Serving/Router.swift:11-54](App/Serving/Router.swift#L11-L54), [App/Serving/ServingController.swift:112-116](App/Serving/ServingController.swift#L112-L116)
## ServingController: lifecycle and state
`ServingController` is the single instance `AppModel` owns. It is `@MainActor @Observable`; the SwiftUI tab binds only to its documented surface. Persisted settings (`enabled`, `scope`, `port`, `bearerToken`) live in `UserDefaults` under `omni.serving.*`, and each `didSet` calls `persist()` plus a reconcile/restart — gated by an `isLoading` flag so loading saved values doesn't clobber a half-loaded snapshot.
The engine and store are never owned by the controller; `AppModel` hands them in via `attach(engine:store:modelName:)`, which builds an `EngineServingBackend`, then either restarts a running server against the new backend (model swap) or reconciles (auto-start if previously enabled). Changing `scope` or `port` while running triggers `restart()`; rotating the token applies only on the next start.
```mermaid
stateDiagram-v2
[*] --> stopped
stopped --> running: enabled && backend attached (startServer)
running --> stopped: disabled / detach (stopServer)
running --> running: scope/port change (restart)
stopped --> portInUse: bind EADDRINUSE
stopped --> failed: listener error (onFailure)
portInUse --> running: retry after enable
failed --> running: retry after enable
running --> portInUse: runtime bind failure
running --> failed: runtime listener failure
```
State transitions are driven by `reconcile()` (start/stop based on `enabled` + backend presence) and by `HTTPServer.onFailure`, which flips the controller to `portInUse`/`failed`, sets `isRunning = false`, and disables serving. The live log is capped at 200 entries, newest-first, with request/error counters incremented in `ingest()` — coalesced to one main-actor invalidation per runloop tick.
Sources: [App/Serving/ServingController.swift:14-93](App/Serving/ServingController.swift#L14-L93), [App/Serving/ServingController.swift:143-185](App/Serving/ServingController.swift#L143-L185), [App/Serving/ServingLog.swift:5-19](App/Serving/ServingLog.swift#L5-L19)
## ServingTab: the control surface
`ServingTab` is the `Settings > Serving` form. It exposes the on/off toggle, a scope picker ("This Mac only" vs "Local network"), a port field clamped to `1...65535`, and a bearer-token field with show/hide, copy, and a "Generate New" button that produces a URL-safe base64 token from `SecRandomCopyBytes(24)`. A status dot reflects the controller's `State`, and the bound address is shown and selectable when running.
The tab also renders **ready-to-run `curl` examples** for the selected schema — including the right auth header (`Authorization: Bearer` for most, `x-goog-api-key` for Gemini) — and a live request log (`LogRow`: time, method, path, color-coded status, latency). The examples are the most useful piece of in-app documentation for this otherwise-undocumented subsystem:
```swift
// App/Serving/ServingTab.swift:208-217 — schema-specific example bodies
case .openai: "curl \(base)/v1/embeddings ... -d '{\"model\":\"omni\",\"input\":[\"your text\"]}'"
case .jina: "... -d '{...\"input\":\"your text\",\"task\":\"retrieval.query\"}'"
case .cohere: "curl \(base)/v2/embed ... -d '{...\"input_type\":\"search_document\",\"embedding_types\":[\"float\"]}'"
case .gemini: "curl \(base)/v1beta/models/omni:embedContent ... -d '{\"content\":{\"parts\":[{\"text\":\"your text\"}]}}'"
```
Sources: [App/Serving/ServingTab.swift:26-149](App/Serving/ServingTab.swift#L26-L149), [App/Serving/ServingTab.swift:193-291](App/Serving/ServingTab.swift#L193-L291)
## Security model and provider neutrality
The security posture is intentionally minimal and local-first: on `local` scope the server is loopback-only (enforced per connection) and **never requires a token**; a token is enforced only on `public` (LAN) scope and only when non-empty. The UI footer makes this explicit ("Local network reaches other devices, so set a token"). There is no TLS, no rate limiting, and the token check is a plain equality compare — appropriate for a single-user desktop tool on a trusted LAN, and worth flagging if anyone proposes exposing it more widely.
A useful framing for a Grok-Wiki or BYOC/BYOK integration: this subsystem is *provider-neutral by construction*. It impersonates four vendors' wire formats over the same local engine, so any client SDK can point its base URL at `http://127.0.0.1:<port>` without code changes and without depending on a hosted model service. The neutrality lives entirely in `SchemaAdapters`; adding a fifth provider means one more stateless enum behind the `ServingBackend` seam, not a change to the engine or transport. (This page was synthesized directly from repository source — no `docs/solutions/` notes or `STRATEGY.md` were present to draw on, and there are currently no automated tests under `App/Serving/`, so the curl examples in `ServingTab` are the practical contract.)
Sources: [App/Serving/HTTPServer.swift:43-86](App/Serving/HTTPServer.swift#L43-L86), [App/Serving/HTTPServer.swift:95-100](App/Serving/HTTPServer.swift#L95-L100), [App/Serving/ServingController.swift:97-133](App/Serving/ServingController.swift#L97-L133), [App/Serving/ServingTab.swift:134-140](App/Serving/ServingTab.swift#L134-L140)
---
## 08. Verifying the Engine and Where to Go Next
> The closing page: how numeric parity is proven rather than assumed (omni-verify against Python-generated fixtures, cosine >= 0.999 with matching token ids, image/video/audio matching the upstream model.py), how the test suite and fixture generators are run, and a short map of what to read next after the first 30 minutes.
- Page Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/pages/08-verifying-the-engine-and-where-to-go-next.md
- Generated: 2026-06-08T13:33:13.962Z
### Source Files
- `Sources/omni-verify/main.swift`
- `Tools/gen_fixtures.py`
- `Tests/OmniKitTests/TextEncoderTests.swift`
- `Tests/OmniKitTests/VectorStoreTests.swift`
- `Scripts/run-tests.sh`
- `Makefile`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [Sources/omni-verify/main.swift](Sources/omni-verify/main.swift)
- [Tools/gen_fixtures.py](Tools/gen_fixtures.py)
- [Tests/OmniKitTests/TextEncoderTests.swift](Tests/OmniKitTests/TextEncoderTests.swift)
- [Tests/OmniKitTests/VectorStoreTests.swift](Tests/OmniKitTests/VectorStoreTests.swift)
- [Tests/OmniKitTests/VisionEncoderTests.swift](Tests/OmniKitTests/VisionEncoderTests.swift)
- [Tests/OmniKitTests/AVEncoderTests.swift](Tests/OmniKitTests/AVEncoderTests.swift)
- [Scripts/run-tests.sh](Scripts/run-tests.sh)
- [Makefile](Makefile)
</details>
# Verifying the Engine and Where to Go Next
Omni reimplements a Python embedding model (`jinaai/jina-embeddings-v5-omni-small-mlx`) as an in-process MLX-Swift engine. The central risk in any port is silent numeric drift: a vector that is *close enough to look right* but ranks files differently from the reference. Omni's answer is to never assume parity and instead **measure it against fixtures the original Python model produced** — exact token-id equality plus cosine similarity above a hard threshold, modality by modality. This page explains how that proof works, how to run it, and what to read once parity is no longer a mystery.
The mental model is two independent implementations meeting at a shared, version-controlled artifact. The Python reference writes embeddings and token ids to `Fixtures/`; the Swift port loads the *same* original Hugging Face weights, merges the same retrieval LoRA at load time, and must reproduce those numbers. If it cannot, the verifier and the test suite both exit non-zero.
## The parity loop: reference, artifact, port
```mermaid
flowchart LR
subgraph REF["Reference side (Python, run in mls venv)"]
GEN["Tools/gen_fixtures.py<br/>model.encode(query/passage)"]
end
subgraph ART["Shared artifacts (checked in)"]
TXT["Fixtures/text_fixtures.json<br/>token ids + embeddings"]
IMG["image_ref / video_ref / audio_ref<br/>.safetensors"]
META["meta.json<br/>dim, prefixes, lora r/alpha"]
end
subgraph PORT["Port side (Swift, MLX)"]
WS["WeightStore<br/>load HF safetensors + merge LoRA"]
ENC["OmniTextEncoder / Vision / Audio towers"]
VER["omni-verify (default)"]
TST["OmniKitTests (XCTest)"]
end
GEN --> TXT
GEN --> META
TXT --> VER
TXT --> TST
IMG --> TST
WS --> ENC --> VER
ENC --> TST
VER -->|"cos >= 0.999 + token ids exact"| GATE{"exit 0 / fail"}
TST -->|"XCTAssertGreaterThanOrEqual"| GATE
```
The two sides share weights but not code. `gen_fixtures.py` runs inside the Python `mls` venv, calls `model.encode(...)` with `task_type="retrieval.query"` and `"retrieval.passage"`, and records both the embeddings and the token ids the reference actually fed (`"Query: " + t` / `"Document: " + t`). The Swift side loads the same snapshot through `WeightStore` and must reproduce the result.
Sources: [Tools/gen_fixtures.py:42-98](Tools/gen_fixtures.py), [Sources/omni-verify/main.swift:898-906](Sources/omni-verify/main.swift)
## Text parity: token ids exact, cosine >= 0.999
The default mode of `omni-verify` (invoked as `omni-verify <modelDir> <fixturesJson>` with no benchmark subcommand) is the text parity gate. It loads the encoder once, then for every fixture record it checks two things and tracks the worst case across all records:
- **Token ids must match exactly.** `encoder.tokenIds(text, .query) == record.query_token_ids` (and the passage equivalent). A single divergent id fails the run — tokenization is treated as a correctness invariant, not a fuzzy match.
- **Embeddings must hit cosine >= 0.999** against the reference query and passage vectors.
```swift
// Sources/omni-verify/main.swift
let qTokMatch = qIds == r.query_token_ids
let pTokMatch = pIds == r.passage_token_ids
...
let cq = cosine(q, r.query_embedding)
let cp = cosine(p, r.passage_embedding)
let flag = (cq >= 0.999 && cp >= 0.999 && qTokMatch && pTokMatch) ? "ok " : "BAD"
...
exit(worstQ >= 0.999 && worstP >= 0.999 && tokOK ? 0 : 1)
```
The probe strings are deliberately diverse — short, long, German, Chinese, a Python snippet, punctuation, and the single character `"a"` — so the gate exercises edge cases of tokenization and pooling rather than one happy-path sentence.
Sources: [Sources/omni-verify/main.swift:908-929](Sources/omni-verify/main.swift), [Tools/gen_fixtures.py:30-39](Tools/gen_fixtures.py)
### The same gate, run as a test
`TextEncoderTests` asserts the identical contract through XCTest, with one critical detail: it pins the encoder to the **fp32** compute path before building weights, because the fixtures are fp32 while the shipping app defaults to bf16.
```swift
// Tests/OmniKitTests/TextEncoderTests.swift — setUp()
setenv("OMNI_BF16_COMPUTE", "0", 1)
setenv("OMNI_BACKBONE_BF16", "0", 1)
```
Three text tests guard distinct claims: `testTextEmbeddingsMatchReference` (the small model, cos >= 0.999), `testNanoEmbeddingsMatchReference` (the 768-dim nano variant against its own fixtures, with an explicit dimension check to catch a wrong-model-loaded mistake), and `testBatchedMatchesSingle` (right-padded batched encoding must equal per-string encoding *and* the reference, so padding never perturbs a vector). Each test `XCTSkip`s cleanly when the model snapshot is not staged locally, so the suite stays green on machines without the weights.
Sources: [Tests/OmniKitTests/TextEncoderTests.swift:17-23](Tests/OmniKitTests/TextEncoderTests.swift), [Tests/OmniKitTests/TextEncoderTests.swift:67-123](Tests/OmniKitTests/TextEncoderTests.swift)
## Image, video, and audio parity against `model.py`
Non-text modalities are verified the same way, against `.safetensors` reference dumps from the upstream model. The key technique is to feed the Swift towers the **reference's exact preprocessed inputs** (`pixel_values`, `grid_thw`, mel frames) so a parity test isolates the tower + injection + pooling math from preprocessing/resize differences. A tower-level gate is strict (cos >= 0.999); an end-to-end gate that includes Omni's own resize/decode path is looser (cos >= 0.90), reflecting that pixel resampling legitimately differs.
| Modality | What is compared | Gate | Where |
|---|---|---|---|
| Text query + passage | cosine vs reference embedding, exact token ids | cos >= 0.999, ids identical | `omni-verify` default; `TextEncoderTests` |
| Text (nano, dim 768) | same, against nano fixtures | cos >= 0.999 | `TextEncoderTests.testNanoEmbeddingsMatchReference` |
| Batched text | batched vs single vs reference | cos >= 0.999 | `TextEncoderTests.testBatchedMatchesSingle` |
| Image tower (same `pixel_values`) | cosine vs `encode_image` | cos >= 0.999 | `VisionEncoderTests` |
| Image end-to-end (resize path) | cosine vs reference | cos >= 0.90 | `VisionEncoderTests` |
| Video tower (same input) | cosine vs `encode_video` | cos >= 0.999 | `AVEncoderTests` |
| Audio tower (same mel) | cosine vs `encode_audio` | cos >= 0.999 | `AVEncoderTests` |
| Audio single-vs-batched | clip-isolation under batching | cos >= 0.99999 | `AVEncoderTests.testAudioBatchParity` |
The `0.99999` batch gate is stronger than the reference gate on purpose: batched media forwards use block-diagonal (`cu_seqlens`) attention, and that gate proves one image/clip never leaks into another's vector — batching must be bit-for-bit equivalent to one-at-a-time, even where the reference comparison only needs 3 nines.
Sources: [Tests/OmniKitTests/VisionEncoderTests.swift:46-69](Tests/OmniKitTests/VisionEncoderTests.swift), [Tests/OmniKitTests/AVEncoderTests.swift:42-99](Tests/OmniKitTests/AVEncoderTests.swift), [Tests/OmniKitTests/AVEncoderTests.swift:107-155](Tests/OmniKitTests/AVEncoderTests.swift)
## Parity vs. benchmarks: two different questions
`omni-verify` is one binary with many subcommands dispatched off `args[1]`. It is important to separate the two kinds: **parity/quality** modes answer "is the output correct?" and exit non-zero on failure; **benchmark** modes answer "how fast / how much memory?" and never gate correctness. The page above is about the first group.
| Mode | Question it answers |
|---|---|
| *(default)* `<modelDir> <fixtures.json>` | Text token-id + cosine parity (the gate) |
| `imgbatchparity` | Image single-vs-batched (cos >= 0.99999) + reference gate (cos >= 0.999) |
| `audiocheck` | Audio path returns a finite, L2-normalized vector |
| `retrieve` / `xmodal` | Retrieval *quality*: top-1 accuracy + MRR, and text->image cross-modal search |
| `levercheck` | Optional perf levers (`OMNI_ASYNC_EVAL`, `OMNI_COMPILE_BLOCK`) are output-neutral |
| `bench`, `searchbench`, `concbench`, `concbench2`, `storemem`, `loadbench`, `crawlbench`, `indexbench`, `mediabench`, `audiobench` | Throughput, latency, memory, concurrency — performance only |
Note that `retrieve` and `xmodal` measure something parity cannot: a port can be numerically faithful yet still retrieve poorly if the *model* is weak, so these report accuracy on labelled corpora and confusable "hard" clusters as a separate signal.
Sources: [Sources/omni-verify/main.swift:459-543](Sources/omni-verify/main.swift), [Sources/omni-verify/main.swift:631-718](Sources/omni-verify/main.swift), [Sources/omni-verify/main.swift:831-873](Sources/omni-verify/main.swift)
## The store path: tested without the GPU
Not everything needs MLX. `VectorStoreTests` exercises insert, search ranking, per-file chunk ranking, delete compaction, in-place replace, and reload-from-disk using only SQLite + Accelerate, with orthonormal basis vectors so a dot product *equals* cosine and expected scores are exactly `1.0` or `0.0`. This makes the store's contiguous-`flat`-buffer bookkeeping (does a deleted row stay aligned? does reopen rebuild identically?) verifiable cheaply and deterministically, independent of the embedding engine.
```swift
// Tests/OmniKitTests/VectorStoreTests.swift
let hits = store.search(basis(1), topK: 10)
XCTAssertEqual(hits.first?.path, "/b.txt")
XCTAssertEqual(hits.first?.score ?? 0, 1.0, accuracy: 1e-6)
```
Sources: [Tests/OmniKitTests/VectorStoreTests.swift:14-89](Tests/OmniKitTests/VectorStoreTests.swift)
## Running it
The `Makefile` is the front door. `make fixtures` regenerates the Python references and copies `text_fixtures.json` into the test resources; `make test` runs the bundle (optionally filtered with `ONLY=`); `make app` generates and builds the Xcode project.
| Command | Effect |
|---|---|
| `make fixtures` | Run `gen_fixtures.py`, copy fixtures into `Tests/OmniKitTests/Resources/` |
| `make test` | Build + run `OmniKitTests` via `Scripts/run-tests.sh` |
| `make test ONLY=OmniKitTests.VectorStoreTests` | Run a single test class |
| `swift run omni-verify <modelDir> Fixtures/text_fixtures.json` | The text parity gate as a CLI |
`Scripts/run-tests.sh` exists because plain `xcodebuild test` fails twice in this project, and the header documents exactly why: `swift-tokenizers` ships its Rust FFI as an SE-0482 static-library artifact bundle that `xcodebuild` will not expose (build error `Cannot find type 'RustBuffer'`), and the resulting SPM test bundle is unsigned so the test runner refuses to load it. The script applies the same global module-map / static-lib overrides the app build uses, builds *for testing*, ad-hoc signs the `.xctest` bundle, and runs it directly with `xcrun xctest` — temporarily moving `Omni.xcodeproj` aside so it does not shadow the SwiftPM package, and restoring it on any exit.
Sources: [Makefile:18-24](Makefile), [Scripts/run-tests.sh:7-47](Scripts/run-tests.sh)
### A note on portability
The verification design is model-agnostic by construction: the gate compares against whatever `OMNI_MODEL_DIR` / `OMNI_NANO_MODEL_DIR` point at, and the same fixture-and-cosine harness already covers both the small (dim 1024) and nano (dim 768) variants. Swapping in a different model snapshot means regenerating fixtures from that model's own reference code and rerunning — no part of the proof is tied to a hosted service or a single weights provider.
Sources: [Tests/OmniKitTests/TextEncoderTests.swift:36-44](Tests/OmniKitTests/TextEncoderTests.swift), [Tools/gen_fixtures.py:23-26](Tools/gen_fixtures.py)
## Where to go next after the first 30 minutes
Once you trust the numbers, read the engine that produces them, then the pipeline that uses them.
```text
Engine internals (Sources/OmniKit/)
WeightStore.swift load HF safetensors, merge retrieval LoRA, fp32 upcast <- starts the parity chain
OmniTextEncoder.swift Qwen3 text tower, last-token pool + L2
OmniVisionTower.swift image/video ViT + merger, block-diagonal attention
OmniAudioTower.swift mel -> audio tower; OmniAudioPreprocess.swift (STFT)
OmniConfig.swift dims, loraScale, prefixes (mirrors Fixtures/meta.json)
OmniEngine.swift the serialized facade the app + benchmarks drive
Index + search (Sources/OmniKit/)
FileCrawler.swift / FileExtractor.swift crawl + text extraction
Indexer.swift crawl -> decode -> batched embed -> store
VectorStore.swift SQLite + contiguous flat buffer + cosine search
SearchQueryParser.swift query syntax
Fixtures + generators
Tools/gen_{image,audio,video}_fixtures.py per-modality reference dumps
Fixtures/*.safetensors, meta.json the artifacts the gates load
App surface (App/)
OmniApp.swift / ContentView.swift / ResultsList.swift SwiftUI shell
Serving/HTTPServer.swift, Router.swift local serving endpoints
```
A productive path: `WeightStore.swift` (the load + LoRA merge that everything downstream depends on), then `OmniTextEncoder.swift` (where the verified text vector is actually produced), then `OmniEngine.swift` and `Indexer.swift` to see how those vectors reach `VectorStore.swift` and on-screen results. The fixtures and `meta.json` are the Rosetta stone between the Python reference and the Swift config — keep them open while reading the towers.
Sources: [Sources/omni-verify/main.swift:902-905](Sources/omni-verify/main.swift), [Sources/omni-verify/main.swift:549-555](Sources/omni-verify/main.swift)
## Summary
Omni proves its port rather than trusting it: a Python reference writes token ids and embeddings to checked-in fixtures, the Swift engine loads the same weights and merges the same LoRA, and `omni-verify` plus the `OmniKitTests` suite fail loudly unless token ids match exactly and cosine clears 0.999 for text (with strict 0.999 tower gates and 0.99999 batch-isolation gates for image/video/audio). The store is tested separately with orthonormal vectors and no GPU, and the whole suite runs through a purpose-built script that works around the `swift-tokenizers` Rust-artifact and code-signing constraints. With the gate understood, the natural next reading order is `WeightStore` → `OmniTextEncoder` → `OmniEngine`/`Indexer` → `VectorStore` — the exact chain the verifier walks to produce a number you can now trust.
---