# File-Based Storage & Git-Backed History

> How Cabinet stores all knowledge as markdown files on disk with no traditional database: path conventions (path-utils.ts), page I/O with gray-matter front-matter (page-io.ts), the virtual file tree builder (tree-builder.ts), low-level filesystem operations (fs-operations.ts), and the simple-git integration that auto-commits every save and powers the full diff viewer (git-service.ts). Covers cabinet discovery, multi-cabinet support, and the data-dir layout.

- Repository: hilash/cabinet
- GitHub: https://github.com/hilash/cabinet
- Human wiki: https://grok-wiki.com/public/wiki/hilash-cabinet-73c70f449a59
- Complete Markdown: https://grok-wiki.com/public/wiki/hilash-cabinet-73c70f449a59/llms-full.txt

## Source Files

- `src/lib/storage/path-utils.ts`
- `src/lib/storage/page-io.ts`
- `src/lib/storage/tree-builder.ts`
- `src/lib/storage/fs-operations.ts`
- `src/lib/git/git-service.ts`
- `src/lib/cabinets/discovery.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [src/lib/storage/path-utils.ts](src/lib/storage/path-utils.ts)
- [src/lib/storage/page-io.ts](src/lib/storage/page-io.ts)
- [src/lib/storage/tree-builder.ts](src/lib/storage/tree-builder.ts)
- [src/lib/storage/fs-operations.ts](src/lib/storage/fs-operations.ts)
- [src/lib/storage/order-store.ts](src/lib/storage/order-store.ts)
- [src/lib/git/git-service.ts](src/lib/git/git-service.ts)
- [src/lib/cabinets/discovery.ts](src/lib/cabinets/discovery.ts)
- [src/lib/cabinets/files.ts](src/lib/cabinets/files.ts)
- [src/lib/runtime/runtime-config.ts](src/lib/runtime/runtime-config.ts)
</details>

# File-Based Storage & Git-Backed History

Cabinet uses no traditional database. Every knowledge page is a Markdown file on disk, with structured metadata carried as YAML front-matter inside the same file. This architecture gives users transparent ownership of their data, enables standard text tooling (grep, diff, editors) to work on the knowledge base, and makes full-history versioning a natural fit for Git.

This page explains how the storage layer is structured: where data lives (`path-utils.ts`), how individual pages are read and written (`page-io.ts`), how the entire tree is turned into a virtual file tree for the sidebar (`tree-builder.ts`), what low-level I/O primitives underpin everything (`fs-operations.ts`), how cabinets are discovered across the data directory (`discovery.ts`), and how every save is automatically committed to a local Git repository (`git-service.ts`).

---

## Data Directory Layout

All user content lives under a single root called `DATA_DIR`. The resolved path depends on the runtime environment, resolved in priority order:

| Priority | Source | Value |
|----------|--------|-------|
| 1 | `CABINET_DATA_DIR` env var | Any absolute path |
| 2 | `.cabinet-install.json` config file | `json.dataDir` field |
| 3 | Electron default (macOS / Windows) | `~/Documents/Cabinet` |
| 4 | Electron default (Linux) | `~/Cabinet` |
| 5 | Source-run fallback | `<project-root>/data` |

Sources: [src/lib/runtime/runtime-config.ts:56-72]()

Inside `DATA_DIR`, the structure is entirely user-controlled directories and files. Cabinet maintains a small hidden directory `.cabinet-state/` at the data root for runtime metadata such as install state and update status, but this is never shown in the knowledge base UI.

```text
DATA_DIR/
├── .cabinet-state/        ← internal runtime state (hidden)
│   ├── install.json
│   └── update-status.json
├── my-cabinet/            ← a named cabinet (contains .cabinet manifest)
│   ├── .cabinet           ← marks this dir as a cabinet
│   ├── index.md           ← cabinet root page
│   ├── notes/
│   │   ├── index.md
│   │   └── meeting-notes/
│   │       └── index.md
│   └── .cabinet-order.yaml   ← sibling ordering sidecar
└── another-page/
    └── index.md
```

---

## Path Conventions (`path-utils.ts`)

The storage layer operates with two path namespaces:

- **Virtual path**: a slash-separated string relative to `DATA_DIR`, e.g. `my-cabinet/notes/meeting-2024`. This is what the UI and API use.
- **Filesystem path**: the absolute on-disk path. All operations translate from virtual → absolute before any I/O.

`resolveContentPath(virtualPath)` performs this translation and enforces a **path traversal guard**: if the resolved absolute path does not begin with `DATA_DIR`, an error is thrown immediately.

```ts
// src/lib/storage/path-utils.ts
export function resolveContentPath(virtualPath: string): string {
  const resolved = path.resolve(DATA_DIR, virtualPath);
  if (!resolved.startsWith(DATA_DIR)) {
    throw new Error("Path traversal detected");
  }
  return resolved;
}
```

The reverse direction — from an absolute filesystem path back to a virtual path — is handled by `virtualPathFromFs(fsPath)`, which strips the `DATA_DIR` prefix and the leading slash.

`isHiddenEntry(name)` controls which filesystem entries are invisible to the UI. It hides dot-files and the standard build/dependency directories (`node_modules`, `dist`, `build`, etc.).

Sources: [src/lib/storage/path-utils.ts:1-43]()

---

## Page I/O with Gray-Matter Front-Matter (`page-io.ts`)

Every knowledge page is a Markdown file whose metadata is embedded as a YAML front-matter block parsed by the `gray-matter` library. The canonical shape of a stored page file is:

```markdown
---
title: Meeting Notes
created: 2024-01-15T09:00:00.000Z
modified: 2024-01-15T09:00:00.000Z
tags: []
order: 200
---

# Meeting Notes

Body content here...
```

### Page Resolution Strategy

`readPage(virtualPath)` resolves which file to open in this order:

1. `<resolved>/index.md` — preferred, treats every page as a directory
2. `<resolved>.md` — legacy flat-file form
3. `<resolved>` directly — non-Markdown files (images, code, etc.)
4. `.cabinet-meta` sidecar in a linked directory — fallback for linked external folders with no `index.md`

### Front-Matter Fields

| Field | Type | Purpose |
|-------|------|---------|
| `title` | string | Display name |
| `created` | ISO string | Creation timestamp |
| `modified` | ISO string | Auto-updated on every write |
| `tags` | string[] | Tagging / filtering |
| `icon` | string? | Emoji or icon code for sidebar |
| `order` | number? | Sort position among siblings |
| `dir` | `"rtl"` \| undefined | Text direction override |
| `google` | object? | Google Drive metadata for synced docs |

### RTL Auto-Detection

`writePage()` contains a heuristic that auto-sets `dir: "rtl"` when it detects that the document content is predominantly Hebrew (Unicode block U+0590–U+05FF). It samples the first 600 characters and sets RTL if Hebrew letters exceed 50% of all letter characters. This fires only when the caller does not explicitly set `dir`, so it never overrides a manual setting.

Sources: [src/lib/storage/page-io.ts:95-115](), [src/lib/storage/page-io.ts:29-75]()

### Page Lifecycle Operations

| Function | Action |
|----------|--------|
| `readPage(vPath)` | Read and parse a page; returns `PageData` with `content` and `frontmatter` |
| `writePage(vPath, content, fm)` | Serialize and overwrite; auto-updates `modified` |
| `createPage(vPath, title)` | Scaffold `index.md` with `defaultFrontmatter`; appends order at end of siblings |
| `deletePage(vPath)` | `rm -rf` or `unlink` for symlinks; removes `.cabinet-meta` from symlink targets |
| `movePage(from, toParent, opts)` | `fs.rename`; falls back to copy+delete on EXDEV (cross-device); updates order sidecar |
| `renamePage(vPath, newName)` | Renames directory, updates `title` in front-matter, rewrites wiki-link references across the whole cabinet |

---

## Virtual File Tree Builder (`tree-builder.ts`)

The sidebar and navigation are powered by `buildTree()`, which performs a recursive walk of `DATA_DIR` and returns a `TreeNode[]` tree.

### File Type Classification

`classifyFile(ext)` maps file extensions to `TreeNode["type"]` values. The recognized types are:

| Type | Extensions (representative) |
|------|----------------------------|
| `file` | `.md` (non-index) |
| `directory` | (bare directory) |
| `cabinet` | directory with `.cabinet` manifest |
| `code` | `.ts`, `.js`, `.py`, `.go`, `.rs`, … |
| `image` | `.png`, `.jpg`, `.svg`, `.webp`, … |
| `video` | `.mp4`, `.webm`, `.mov` |
| `audio` | `.mp3`, `.wav`, `.ogg` |
| `pdf` | `.pdf` |
| `csv` | `.csv` |
| `notebook` | `.ipynb` |
| `docx` / `xlsx` / `pptx` | Office formats |
| `mermaid` | `.mermaid`, `.mmd` |
| `website` / `app` | dirs with `index.html` |
| `unknown` | archives, design files, legacy Office |

Files whose extension does not match any known set are silently omitted from the tree.

### Directory Metadata Resolution

For a directory node, `buildTree` reads front-matter in this order:
1. `<dir>/index.md` frontmatter — preferred
2. `.cabinet-meta` YAML in a linked (symlinked) directory — fallback for externally linked folders

Cabinet directories are identified by the presence of a `.cabinet` manifest file (the `CABINET_MANIFEST_FILE` constant from `src/lib/cabinets/files.ts`).

### Ordering

Sibling ordering respects two storage locations:
- **`order` field in front-matter** — used for `.md`-backed pages
- **`.cabinet-order.yaml` sidecar** — used for non-Markdown files (images, code, PDFs) that cannot carry YAML front-matter

The tree sorts nodes by `order` ascending, then alphabetically by title as a tiebreaker.

### TTL Cache

Because `buildTree` walks potentially thousands of files on every request (sidebar, search, auto-link, composer), it is protected by a 5-second TTL cache (`createTtlCache`). After a user saves a page, the cache self-heals within those 5 seconds. Explicit invalidation is available via `invalidateTreeCache()`.

Sources: [src/lib/storage/tree-builder.ts:118-160](), [src/lib/storage/tree-builder.ts:221-232]()

---

## Low-Level Filesystem Operations (`fs-operations.ts`)

All I/O in the storage layer routes through a small set of functions in `fs-operations.ts`. These are thin wrappers around Node's `fs/promises` API.

| Function | Description |
|----------|-------------|
| `readFileContent(absPath)` | UTF-8 file read |
| `writeFileContent(absPath, content)` | UTF-8 file write |
| `writeFileAtomic(absPath, content)` | Crash-safe write via temp file + `rename()` |
| `deleteFileOrDir(absPath)` | `rm -rf` |
| `listDirectory(absPath)` | `readdir` with symlink resolution via `stat` |
| `unlinkSymlink(absPath)` | Remove `.cabinet-meta` from symlink target, then `unlink` the symlink |
| `ensureDirectory(absPath)` | `mkdir -p` |
| `fileExists(absPath)` | Boolean existence check via `access()` |

### Crash-Safe Writes

`writeFileAtomic` is used for metadata files whose corruption would silently break features. It writes content to a sibling temp file (`<path>.tmp-<pid>-<random>`), then calls `fs.rename()` to atomically move it over the target. A process crash between write and rename leaves either the old complete file or the new complete file on disk — never a half-written one.

Sources: [src/lib/storage/fs-operations.ts:16-34]()

---

## Sibling Ordering (`order-store.ts`)

Order values are managed by `order-store.ts`, which provides an abstraction layer that automatically picks the right storage location (front-matter vs. sidecar) for each entry type.

- The constant `ORDER_GAP = 100` is used as the default gap between adjacent sibling order values, leaving room for insertions without needing to renumber the full list.
- `computeInsertOrder(parentVirtualPath, prevName, nextName)` calculates the correct midpoint integer when inserting between two siblings. If no gap remains between them, it renumbers all siblings first.
- `appendOrder(parentVirtualPath)` returns `max(existing orders) + ORDER_GAP` for new entries appended at the end.

---

## Cabinet Discovery (`discovery.ts`)

A **cabinet** is any directory inside `DATA_DIR` that contains a `.cabinet` manifest file. Cabinet discovery is a recursive walk:

```ts
// src/lib/cabinets/discovery.ts
async function walkCabinets(dir: string, results: string[]): Promise<void> {
  for (const entry of entries) {
    const childDir = path.join(dir, entry.name);
    if (fs.existsSync(path.join(childDir, CABINET_MANIFEST_FILE))) {
      results.push(cabinetPathFromFs(childDir));
    }
    await walkCabinets(childDir, results);  // recurse into all dirs
  }
}
```

`discoverCabinetPaths()` is the async public API, protected by a 10-second TTL cache. It always includes `ROOT_CABINET_PATH` (the data root itself) as the first entry, so the root cabinet is always present even with no subdirectory cabinets. A sync variant `discoverCabinetPathsSync()` is available for startup paths where async I/O is not practical.

Hidden directories (dot-folders and the ignored set from `isHiddenEntry`) are skipped during the walk, so `.cabinet-state` and similar internal directories never appear as cabinets.

Sources: [src/lib/cabinets/discovery.ts:10-35](), [src/lib/cabinets/discovery.ts:55-65]()

---

## Git-Backed History (`git-service.ts`)

### Initialization

`getGit()` lazily initializes a `simple-git` instance pointed at `DATA_DIR`. On first call it checks whether `DATA_DIR/.git` exists. If it does not, it runs `git init` and configures a fixed committer identity (`Cabinet <kb@cabinet.dev>`). The result is cached in a module-level variable so subsequent calls avoid the filesystem check.

Sources: [src/lib/git/git-service.ts:7-22]()

### Auto-Commit with Debouncing

Every page save triggers `autoCommit(pagePath, action)`. To avoid spamming Git with one commit per keystroke, the function debounces via a 5-second timer. If another save arrives within the window, the previous timer is cleared and the window resets. When the timer fires, it stages all changes (`git add .`) and commits with the message `<action> <pagePath>` — for example `Update my-cabinet/notes/meeting`.

```ts
// src/lib/git/git-service.ts
export async function autoCommit(pagePath: string, action: "Update" | "Add" | "Delete") {
  if (commitTimer) clearTimeout(commitTimer);
  commitTimer = setTimeout(async () => {
    const g = await getGit();
    await g.add(".");
    const status = await g.status();
    if (status.staged.length === 0 && status.modified.length === 0) return;
    await g.commit(`${action} ${pagePath}`);
  }, 5000);
}
```

Sources: [src/lib/git/git-service.ts:27-42]()

### Page History and Diff Viewer

`getPageHistory(virtualPath)` queries `git log` for a specific file, trying three candidate paths (directory `index.md`, `.md` suffix, bare path) until one returns results. It returns up to 50 `GitLogEntry` objects, each carrying `hash`, `date`, `message`, and `author`.

`getDiff(hash)` returns the textual diff for a given commit using `git diff <hash>~1 <hash>`. For the first commit in the repository (no parent), it falls back to `git diff <hash>`.

### Point-in-Time Restore

`restoreFileFromCommit(hash, filePath)` performs a two-step restore:
1. `git checkout <hash> -- <filePath>` — reverts the working tree file to its state at that commit
2. `git add <filePath>` + `git commit` — records the restoration as a new commit, preserving history linearity

### Status and Uncommitted Changes

`getStatus()` surfaces uncommitted files to the UI's status bar. It filters out Cabinet-internal writes (`.cabinet-state/`, `.next/`, `node_modules/`, `.cabinet-cache/`) using a list of regex patterns, so runtime state changes never appear as pending user edits. The result is capped at 50 files; a `truncated` flag lets the UI show a "+N more" indicator.

Sources: [src/lib/git/git-service.ts:135-175]()

---

## Data Flow: Save to Committed History

The sequence below shows what happens from the moment a user saves a page to a stable Git commit:

```mermaid
sequenceDiagram
    participant UI as Browser UI
    participant API as Next.js API Route
    participant pageIO as page-io.ts
    participant fsOps as fs-operations.ts
    participant git as git-service.ts

    UI->>API: POST /api/pages/[path] (content + frontmatter)
    API->>pageIO: writePage(virtualPath, content, fm)
    pageIO->>pageIO: resolveContentPath() + path traversal check
    pageIO->>pageIO: inferDirFromText() → auto RTL?
    pageIO->>pageIO: matter.stringify(content, fm)
    pageIO->>fsOps: writeFileContent(absPath, output)
    fsOps-->>pageIO: written
    API->>git: autoCommit(pagePath, "Update")
    Note over git: 5-second debounce timer
    git->>git: g.add(".") + g.status()
    git->>git: g.commit("Update <pagePath>")
    git-->>API: committed
```

---

## Summary

Cabinet's file-based storage is deliberately simple: every page is a directory containing an `index.md` file, with YAML front-matter for structured metadata and Markdown for body content. Path safety is enforced at the boundary between virtual and absolute paths in `resolveContentPath`. The `buildTree` function converts the raw filesystem into a typed node tree for the sidebar, using a short TTL cache to keep performance acceptable. Git is initialized automatically in `DATA_DIR`, and every write is debounced into a commit within 5 seconds, providing a full, browseable change history for free with no additional infrastructure.

Sources: [src/lib/storage/path-utils.ts:15-19](), [src/lib/storage/tree-builder.ts:221-240](), [src/lib/git/git-service.ts:7-42]()
