# Search Path: Query Qualifiers and bf16 Matmul Scoring

> How a search-box string becomes results: the dependency-free SearchQueryParser that splits semantic text from key:value qualifiers (type, ext, in, date, after, score, sort), then exact brute-force cosine over the resident bf16 matrix split into a GPU-resident base plus a small delta, with top-K from a bounded min-heap and post-filtering.

- Repository: hanxiao/omni-macos
- GitHub: https://github.com/hanxiao/omni-macos
- Human wiki: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05
- Complete Markdown: https://grok-wiki.com/public/wiki/hanxiao-omni-macos-7817a5cffe05/llms-full.txt

## Source Files

- `Sources/OmniKit/SearchQueryParser.swift`
- `Sources/OmniKit/VectorStore.swift`
- `Sources/OmniKit/OmniEngine.swift`
- `App/ResultsList.swift`
- `Tests/OmniKitTests/SearchQueryParserTests.swift`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [Sources/OmniKit/SearchQueryParser.swift](Sources/OmniKit/SearchQueryParser.swift)
- [Sources/OmniKit/VectorStore.swift](Sources/OmniKit/VectorStore.swift)
- [Sources/OmniKit/OmniEngine.swift](Sources/OmniKit/OmniEngine.swift)
- [App/AppModel.swift](App/AppModel.swift)
- [App/ResultsList.swift](App/ResultsList.swift)
- [Tests/OmniKitTests/SearchQueryParserTests.swift](Tests/OmniKitTests/SearchQueryParserTests.swift)
</details>

# Search Path: Query Qualifiers and bf16 Matmul Scoring

This page follows a single search-box string all the way to a sorted result list. Two things happen to that string, and they are deliberately kept on separate channels. The free-text part becomes a *semantic* query: it is embedded into a vector and scored against every indexed chunk by exact brute-force cosine. The structured `key:value` parts (`type:image`, `in:~/Documents`, `after:30d`, `score:50%`, `sort:date`) become *qualifiers*: they tune filters and post-processing without ever touching the embedding. The split is done by `SearchQueryParser`, a pure, dependency-free parser, and the scoring is done by `VectorStore`, which keeps every embedding resident as a `bf16` matrix and scores a query with one or two GPU matmuls.

If you are new to the codebase, read these three in order: `SearchQueryParser.parse` (how text splits into two channels), `AppModel.applyParsedQuery` (how qualifiers become filter state), and `VectorStore.search` / `reduceTopK` (how scoring and top-K selection work). Everything else on this page is detail around those three.

## The two-channel pipeline

A query string is never interpreted by one component end to end. It crosses three ownership boundaries: a pure parser in `OmniKit`, the `AppModel` view-model in the app target, and the `VectorStore` back in `OmniKit`. The semantic channel and the qualifier channel are disjoint by construction and only rejoin at the very end, where post-filters (`minScore`, `sortOrder`) re-rank the hits the store returned.

```mermaid
flowchart TB
  raw["raw search-box string"]
  subgraph parser["SearchQueryParser (OmniKit, pure)"]
    parse["parse(raw)"]
    parsed["ParsedQuery: semanticText + qualifiers"]
  end
  subgraph model["AppModel (App target)"]
    apply["applyParsedQuery: qualifiers -> filter state"]
    filt["currentFilter() -> SearchFilter (kinds/folder/ext/since)"]
    embed["embedQuery(semanticText) -> [Float]"]
    recompute["recomputeResults (minScore + sortOrder)"]
  end
  subgraph store["VectorStore (OmniKit)"]
    matmul["base matmul + delta matmul (bf16, GPU)"]
    reduce["reduceTopK (per-file min-heap)"]
  end
  view["ResultsList (SwiftUI)"]

  raw --> parse --> parsed
  parsed -->|semanticText| embed
  parsed -->|qualifiers| apply
  apply --> filt
  apply -->|minScore, sortOrder| recompute
  embed --> matmul
  filt --> matmul
  matmul --> reduce --> recompute --> view
```

Sources: [Sources/OmniKit/SearchQueryParser.swift:62-107](), [App/AppModel.swift:779-921](), [Sources/OmniKit/VectorStore.swift:485-523]()

## SearchQueryParser: splitting text from qualifiers

`SearchQueryParser` is an `enum` with only static methods, intentionally pure and dependency-free so it is unit-testable and can be shared across layers. Its output is a `ParsedQuery` carrying a `semanticText` string and an array of `Qualifier { key, value, negated }`. The grammar follows the lexical-qualifier convention of GitHub / Gmail / Spotlight: `key:value`, quoted values, multi-value (`type:image,video`), and negation (`-type:audio`).

The make-or-break rule is **colon adjacency plus a whitelist**. A run is a qualifier only if (a) its key canonicalizes to a known qualifier, and (b) a value follows the colon with no whitespace on either side. That single condition is what keeps prose like `notes about type: theory`, a time like `12:30`, a ratio like `3:1`, and a URL like `http://x` out of the filter channel.

```swift
// Sources/OmniKit/SearchQueryParser.swift
// Qualifier candidate: `word:value`, key whitelisted, value present, no space around `:`.
if i < n, s[i] == ":", let canon = canonicalKey(word), i + 1 < n, !s[i + 1].isWhitespace {
    i += 1   // consume ':'
    // ... read a quoted "..." value or a bare run to the next whitespace ...
    if !value.isEmpty {
        quals.append(.init(key: canon, value: value, negated: negated))
        continue
    }
}
// Not a qualifier: take the whole run (incl. leading '-' and inner ':') as one free-text span.
```

Anything that is not a qualifier is appended verbatim as a free-text span; `semanticText` is those spans joined in original order with single spaces. The two channels never overlap.

### Canonical keys and aliases

`canonicalKey` is the whitelist. It also folds aliases onto a canonical key, so the rest of the system only sees seven keys.

| User types | Canonical key | Meaning |
|------------|---------------|---------|
| `type`, `kind` | `type` | file category (image/video/audio/text) |
| `ext`, `extension` | `ext` | file extension (no dot) |
| `in`, `folder`, `path` | `in` | restrict to a folder |
| `date` | `date` | named date bucket |
| `after`, `since` | `after` | relative/absolute "newer than" |
| `score`, `relevance`, `min` | `score` | minimum relevance threshold |
| `sort` | `sort` | result ordering |

Keys are matched case-insensitively (the word is lowercased before lookup), but the value's case is preserved (`TYPE:Image` -> `type=Image`). Quoted values may contain spaces and support `\"` / `\\` escapes, which is how `in:"~/Documents/Project X"` survives as a single qualifier.

Sources: [Sources/OmniKit/SearchQueryParser.swift:5-107]()

### What the tests pin down

The parser's correctness gate is explicit because a parser that turns `12:30` into a filter is worse than no parser. The tests lock in the disjoint-channel behavior and every edge case of the adjacency rule:

| Input | qualifiers | semanticText |
|-------|-----------|--------------|
| `sunset photos type:image after:30d` | `type=image`, `after=30d` | `sunset photos` |
| `notes about type: theory` | (none) | `notes about type: theory` |
| `meeting at 12:30 ratio 3:1 see http://example.com` | (none) | unchanged |
| `color:red running shoes` | (none, unknown key) | `color:red running shoes` |
| `type:image,video beach` | `type=image,video` | `beach` |
| `meeting notes -type:audio` | `type=audio` (negated) | `meeting notes` |
| `budget type:` | (none, no value yet) | `budget type:` |
| `-france report` | (none) | `-france report` |

Note the last two: a trailing `type:` with no value (mid-typing) stays prose, and a bare leading `-` stays literal because Omni has no full-text exclusion.

Sources: [Tests/OmniKitTests/SearchQueryParserTests.swift:11-104]()

### Inline tinting mirrors, never drives, the filter

For cosmetic highlighting of qualifier tokens in the AppKit text field, the parser also exposes `qualifierNSRanges`, a regex-based pass that returns UTF-16 `NSRange`s. It is compiled once (regex compilation dwarfs matching, and this runs per keystroke on the main thread) and intentionally separate from `parse`. The regex mirrors `parse`'s notion of a qualifier, so any drift between them only mis-tints; it can never mis-filter, because filtering always goes through `parse`.

Sources: [Sources/OmniKit/SearchQueryParser.swift:43-60]()

## From qualifiers to filter state: AppModel.applyParsedQuery

`SearchQueryParser` decides *what* the qualifiers are; `AppModel.applyParsedQuery` decides what they *do*. It re-parses the raw box on every user edit and maps each qualifier onto a filter dimension. The governing principle is "the box owns only what it mentions": a filter the box previously set but no longer names is reset, while a filter set through the toolbar menu is left untouched (tracked via `stringOwnedFilters`).

| Qualifier | AppModel target | Mapping helper / notes |
|-----------|-----------------|------------------------|
| `type` | `filterKinds` | `mapKind` aliases (`photos`->image, etc.); `,`-split for multi-value; negated -> exclude set; `-type:x` becomes "everything but x" |
| `ext` | `filterExt` | strips a leading dot |
| `in` | `filterFolder` | `resolveFolder` expands `~` |
| `date` | `dateRange` | `DateRange(rawValue:)` |
| `after` | `dateRange` | `mapAfter`: named bucket or relative `7d/2w/3m/1y`, snapped to week/month/year |
| `score` | `minScore` | `mapScore`: `50%` or `0..1`, clamped |
| `sort` | `sortOrder` | `mapSort`: relevance/name/dateModified |

Two of these never reach the vector store. `currentFilter()` builds a `SearchFilter` from only `kinds`, `folderPrefix`, `ext`, and `since`; `score` and `sort` are deliberately excluded and applied afterward (see [Post-filtering](#post-filtering-minscore-and-sortorder)). A "literal mode" toggle short-circuits the whole mapping: it releases every box-owned filter and embeds the raw string as-is.

Sources: [App/AppModel.swift:779-921]()

## VectorStore: exact brute-force cosine over a bf16 matrix

The semantic channel ends at `VectorStore.search`. Embeddings are L2-normalized, so cosine similarity is just a dot product, and the store computes every dot product exactly, no approximate index. SQLite is the durable source of truth, but for scoring the store mirrors every vector into one contiguous in-memory buffer, `flat16`, holding `count * dim` `UInt16` values. These are `bf16` bits (2 bytes per dimension), which halves residency and disk versus `fp32` with negligible recall loss on normalized vectors. Crucially, the bytes are *reinterpreted* as `bf16` for the GPU, not converted at score time.

```swift
// Sources/OmniKit/VectorStore.swift  (fp32 <-> bf16, round-to-nearest-even)
@inline(__always) static func toBF16(_ x: Float) -> UInt16 {
    let b = x.bitPattern
    return UInt16(truncatingIfNeeded: (b &+ 0x7FFF &+ ((b >> 16) & 1)) >> 16)
}
@inline(__always) static func fromBF16(_ x: UInt16) -> Float { Float(bitPattern: UInt32(x) << 16) }
```

Sources: [Sources/OmniKit/VectorStore.swift:84-124]()

### Base + delta: why scoring is two matmuls

A naive design would rebuild the GPU score matrix on every query. Instead the resident matrix is split so that ordinary indexing inserts do not force a recopy. `mlxBase` is an MLX-owned copy of rows `[0, baseRows)` (MLX copies the bytes at construction, so it is independent of `flat16`'s storage even as that buffer reallocates during indexing). Rows appended past `baseRows` are the "delta" and are scored per query with one small extra matmul. The 0.8 GB-class base is rebuilt only on a structural change (delete / reload) or once the delta grows past `foldThreshold` (50,000 rows); a plain indexing append just extends the delta.

```text
flat16: contiguous bf16, length = count * dim   (row i = rows[i])

 row 0          ┐
   ...          ├─ mlxBase  (MLX-owned, rows [0, baseRows))  ──matmul──┐
 baseRows-1     ┘                                                      ├─► scores[count]
 baseRows       ┐                                                      │
   ...          ├─ delta  (rows [baseRows, count), <= foldThreshold) ──matmul┘
 count-1        ┘
```

At query time the store builds `qv = MLXArray(query,[dim,1]).asType(.bfloat16)`, runs the base matmul and (if there is a delta) the delta matmul, fuses them into one GPU sync with a single `MLX.eval`, and reads both back into a single `Float` scores array. Base + delta covers all rows, so the result is identical to a full rebuild.

```swift
// Sources/OmniKit/VectorStore.swift  (search core)
let qv = MLXArray(query, [dim, 1]).asType(.bfloat16)
let baseScore = MLX.matmul(mlxBase!, qv)
if n > baseRows {
    // delta = rows [baseRows, n), built via bytesNoCopy then copied into an owned MLXArray
    let ds = /* matmul(MLXArray(delta bytes, [deltaCount, dim], .bfloat16), qv) */
    MLX.eval(baseScore, ds)   // one fused GPU sync for both matmuls
    // concatenate baseScore + ds into scores[n]
}
```

Search runs under the store's serial `queue` (the same lock as mutations). The header comment records that routing the matmul through the engine's priority gate or taking an off-lock snapshot was measured and both *hurt* latency or memory, so search stays under the lock; the real wins are base+delta and the numeric reducer below.

Sources: [Sources/OmniKit/VectorStore.swift:97-115](), [Sources/OmniKit/VectorStore.swift:476-523](), [Sources/OmniKit/VectorStore.swift:636-649]()

### reduceTopK: per-file best chunk via a bounded min-heap

The index stores one vector per *chunk*, but results are per *file*. `reduceTopK` collapses `N` chunk scores into the top-K files in two passes, and it is engineered to avoid two costs that dominated at ~420K rows: path-string hashing and a full sort of `String`-bearing structs.

1. **Per-file max.** Each row carries a dense `Int32` `fileID` (a flat-array lookup, not a path hash). The reducer keeps `bestScore` / `bestRow` per file. The hot case (no `kind`/`since` filter) runs over primitive buffers via unsafe pointers, never touching `rows[i]`, so there is no ARC traffic from copying the row's three `String`s.
2. **Top-K.** Instead of materializing a `SearchHit` for every file and sorting, it keeps a size-K min-heap (parallel `heapScore` / `heapRow` arrays). A file that cannot beat the current K-th is skipped, path-based filters (`folderPrefix` / `ext`) are applied to each file's winner via `filter.accepts`, and only the K survivors are turned into `SearchHit`s, then ordered by descending score.

```swift
// Sources/OmniKit/VectorStore.swift  (top-K insertion against a size-K min-heap)
if heapScore.count >= topK && s <= heapScore[0] { continue }   // can't beat the current K-th
let r = rows[Int(ri)]
if !filter.accepts(path: r.path, kind: r.kind, modified: r.modified) { continue }
if heapScore.count < topK {
    heapScore.append(s); heapRow.append(ri); siftUp(heapScore.count - 1)
} else if s > heapScore[0] {
    heapScore[0] = s; heapRow[0] = ri; siftDown(0)
}
```

A string-keyed `reduceTopKReference` is kept verbatim, not in production, purely as a differential-test oracle for `reduceTopK`.

### SearchFilter: what the store filters on

`SearchFilter` carries `kinds`, `folderPrefix`, `ext`, and `since`. Its `accepts` is path-boundary aware for folders (`path == f || path.hasPrefix(f + "/")`) and matches `ext` against a lowercased suffix. Score thresholding is intentionally *not* in `SearchFilter`: the comment notes the view fetches unfiltered-by-score so it can offer "show all".

Sources: [Sources/OmniKit/VectorStore.swift:65-82](), [Sources/OmniKit/VectorStore.swift:526-634]()

## Post-filtering: minScore and sortOrder

Two qualifiers, `score` and `sort`, are applied after the store returns. `AppModel.search` calls `store.search(..., topK: 60)` into `rawResults`; `recomputeResults` then filters by the relevance threshold and re-orders. This is memoized so the frequent indexing updates never re-filter the visible list.

```swift
// App/AppModel.swift
private func recomputeResults() {
    let above = rawResults.filter { Self.relevance($0.score) >= minScore }
    hiddenByThreshold = rawResults.count - above.count
    switch sortOrder {
    case .relevance:    results = above
    case .name:         results = above.sorted { /* localized lastPathComponent compare */ }
    case .dateModified: results = above.sorted { $0.modified > $1.modified }
    }
}
```

`relevance` clamps the raw cosine into `0...1`; the default `minScore` is `0.0` (show everything, let the user raise the bar). The view renders each hit's score as a whole-percent string (`String(format: "%.0f%%", ...)`).

Sources: [App/AppModel.swift:98-103](), [App/AppModel.swift:591-605](), [App/AppModel.swift:1254-1308](), [App/ResultsList.swift:370]()

## Why this shape

The recurring decision across this path is **keep the two channels disjoint and push work to its cheapest place**. The parser is pure so its rules are testable in isolation and identical wherever they run. Qualifiers that change the candidate set (`type`/`ext`/`in`/`date`/`after`) ride into the store as a `SearchFilter` and prune inside the hot loop; qualifiers that only re-rank what is already found (`score`/`sort`) stay in the view, so raising a threshold or changing the order never re-embeds or re-scores. The scoring itself stays exact: a resident `bf16` matrix split into a stable GPU base plus a tiny per-query delta means an indexing append costs a delta matmul rather than a full rebuild, and a dense-`fileID` min-heap turns the per-file top-K into `O(F log K)` over primitive buffers. The throughline is provider-neutral and portable: the parser and store know nothing about which embedding model produced the vectors, only that they are L2-normalized and share `dim`, so the same search path works for any backend that emits compatible embeddings.

Sources: [Sources/OmniKit/SearchQueryParser.swift:17-27](), [Sources/OmniKit/VectorStore.swift:65-82](), [Sources/OmniKit/VectorStore.swift:476-523]()
