# Troubleshooting

> Diagnose server lock state, run `ok diagnose health` and `ok diagnose bundle`, file bug reports, repair stale skills/MCP configs, and recover from crash leftovers with `ok clean`.

- Repository: sashimikun/open-knowledge
- GitHub: https://github.com/sashimikun/open-knowledge
- Human docs: https://grok-wiki.com/public/docs/sashimikun-open-knowledge-5c45105c876e
- Complete Markdown: https://grok-wiki.com/public/docs/sashimikun-open-knowledge-5c45105c876e/llms-full.txt

## Source Files

- `packages/cli/src/commands/diagnose.ts`
- `packages/cli/src/commands/diagnose-health.ts`
- `packages/cli/src/commands/bug-report.ts`
- `packages/cli/src/commands/clean.ts`
- `packages/cli/src/commands/repair-skills.ts`
- `packages/cli/src/commands/status.ts`
- `packages/cli/src/commands/lock-state.ts`

---

---
title: "Troubleshooting"
description: "Diagnose server lock state, run `ok diagnose health` and `ok diagnose bundle`, file bug reports, repair stale skills/MCP configs, and recover from crash leftovers with `ok clean`."
---

The `ok` CLI exposes a layered diagnostics stack: lock inspection (`ok status`), environment probes (`ok diagnose health`), project-local support bundles (`ok diagnose bundle`), global bug-report zips (`ok bug-report`), and crash recovery (`ok clean`). Skills and MCP editor entries are refreshed automatically on `ok start`; `ok repair-skills` forces an explicit skill sweep when agents stop seeing bundled guidance.

## Diagnosis workflow

<Steps>
<Step title="Check lock state">

From the project root, inspect server and UI lockfiles:

```bash
ok status
```

Use `--json` for structured output. Each lock resolves to one of five states (see [Lock states](#lock-states)).

</Step>

<Step title="Run health checks">

Probe prerequisites, config, content directory, shadow repo, and macOS signing:

```bash
ok diagnose health
```

Add `--verbose` for per-check detail. Target a single check with `--check server-lock`. Exit code is `0` when no checks fail; warnings do not fail the command.

</Step>

<Step title="Recover stale locks">

If `ok status` reports `stale` or `corrupt`, prune only dead or unparseable lockfiles:

```bash
ok clean
```

`ok clean` never removes locks held by live processes.

</Step>

<Step title="Restart and re-wire">

Stop live servers if needed, then start again (MCP config and skill repair sweeps run automatically):

```bash
ok stop
ok start
```

Force a skill refresh without a full server boot:

```bash
ok repair-skills
```

</Step>

<Step title="Capture a support bundle">

For GitHub issues, generate a redacted global bundle:

```bash
ok bug-report --no-reveal
```

For project-local telemetry and server state, use `ok diagnose bundle` (see [Support bundles](#support-bundles)).

</Step>
</Steps>

## Lock states

Open Knowledge records one lock per project under `.ok/local/` for the collaboration server (`server.lock`) and the editor UI (`ui.lock`). `inspectLock` classifies each file:

| State | Meaning | Typical action |
| --- | --- | --- |
| `missing` | No lock file | Safe to run `ok start` |
| `alive` | PID exists on this host | Server or UI is running; use `ok stop` if you need the port |
| `dead-pid` | Lock references a process that no longer exists | Run `ok clean` |
| `corrupt` | JSON missing or invalid PID | Run `ok clean` |
| `foreign-host` | Lock hostname differs from local machine | Delete the lock file manually when the remote process is confirmed gone |

<RequestExample>

```bash
ok status
```

</RequestExample>

<ResponseExample>

```text
server  alive  pid=48291 port=3847 started=2026-06-25T14:02:11.000Z
ui      not running
```

</ResponseExample>

When `ok status` reports stale or corrupt locks, the rendered text includes an explicit `ok clean` hint. The `server-lock` health check surfaces the same conditions with remediation lines.

<Warning>
`ok clean` only deletes locks in `dead-pid` or `corrupt` state. Live locks are never touched. If `ok start` fails because another process holds the lock, stop that process with `ok stop` (or `ok stop all`) before cleaning.
</Warning>

## `ok diagnose health`

Runs eight named checks in sequence (5 s timeout each):

| Check | Validates |
| --- | --- |
| `git` | `git` on PATH and minimum version |
| `bun` | `bun --version` probe |
| `config-yaml` | `.ok/config.yml` parses and resolves `content.dir` |
| `content-dir` | Content root exists, is a directory, and is writable |
| `server-lock` | Server lock is not held by a live local process |
| `shadow-repo` | Shadow gitdir resolves and has `HEAD` |
| `shadow-health` | Loose objects, packfiles, WIP refs, dead agent chains |
| `macos-codesig` | Desktop app bundle signing (skipped off macOS or in dev mode) |

<ParamField body="--json" type="boolean">
Emit one NDJSON `CheckResult` object per line.
</ParamField>

<ParamField body="--verbose" type="boolean">
Include the `detail` field and probe context in human-readable output.
</ParamField>

<ParamField body="--check" type="string">
Run a single named check. Valid values: `git`, `bun`, `config-yaml`, `content-dir`, `server-lock`, `shadow-repo`, `shadow-health`, `macos-codesig`.
</ParamField>

<ParamField body="--quiet" type="boolean">
Suppress output; return exit code only.
</ParamField>

Each result has `status` (`pass`, `warn`, or `fail`), `summary`, and optional `remediation` and `detail`. Unknown `--check` values exit with code `2`.

<Info>
The `server-lock` check fails (not warns) when a live local process holds the lock — use `ok stop` rather than `ok clean`. Stale and corrupt locks produce warnings with `ok clean` remediation.
</Info>

## Support bundles

Two commands produce zip archives for different audiences. Choose based on what you need to share.

### `ok diagnose bundle`

Project-scoped bundle under `.ok/local/diagnostics/` by default. Collects local telemetry sinks, server logs, lock state, shadow-repo head, and optional live-process diagnostics.

<ParamField body="--pid" type="number">
Include `ok diagnose process <pid>` output under `process/` in the bundle.
</ParamField>

<ParamField body="--out" type="string">
Write the zip to a custom path (parent directory must exist).
</ParamField>

<ParamField body="--yes" type="boolean">
Skip the interactive `Write bundle? [y/N]` prompt.
</ParamField>

<ParamField body="--redact" type="boolean">
Hash `doc.name` telemetry attributes and replace the content-dir prefix with `<CONTENT_DIR>`. Writes a sidecar `.docnames.json` map alongside the zip.
</ParamField>

Default output path:

```text
<contentDir>/.ok/local/diagnostics/bundle-<timestamp>.zip
```

Bundle layout (staged before zip):

| Path | Contents |
| --- | --- |
| `telemetry/spans-current.jsonl` | Current span sink |
| `telemetry/spans-prev.jsonl` | Rotated span sink |
| `logs/server-current.jsonl` | Current server log sink |
| `logs/server-prev.jsonl` | Rotated server log sink |
| `state/server.lock` | Lock metadata when present |
| `state/agent-presence.json` | Live agent presence (server running only) |
| `state/shadow-head.txt` | Recent shadow-repo commits |
| `state/last-spawn-error.log` | Last server spawn failure |
| `state/runtime.json` | OK version, Node, platform, desktop env |
| `state/server-status.txt` | `running` or `not-running (<reason>)` |
| `process/` | Optional process diagnose output (`--pid`) |
| `manifest.json` | Schema version 1 manifest with file inventory |

Local telemetry is controlled by `telemetry.localSink` in project config (see [Configuration reference](/configuration-reference)). If the server is not running, the summary notes that live state is unavailable.

### `ok diagnose process`

Captures deep runtime data for a single hung Node process (typically the collaboration server PID from `ok status`):

```bash
ok diagnose process <pid>
```

<ParamField body="--cpu-profile" type="number" default="15">
CPU profile duration in seconds (Node inspector required).
</ParamField>

<ParamField body="--output" type="string">
Output directory. Defaults to `<contentDir>/.ok/local/diagnostics/process-<pid>-<timestamp>` when the PID matches a known server lock, otherwise `ok-diagnose-<pid>-<timestamp>` in the current directory.
</ParamField>

<ParamField body="--no-inspector" type="boolean">
Collect metadata and `lsof` only; skip CPU profiling.
</ParamField>

<Warning>
Review bundle contents before sharing. `lsof.txt`, `cpu.cpuprofile`, and `stacks.txt` may contain private paths and source file names. `process-stats.jsonl` (CPU/MEM/RSS only) is safer to share.
</Warning>

### `ok bug-report`

Global diagnostic zip written to `~/.ok/bug-reports/<timestamp>-bugreport.zip`. Intended for GitHub issues.

```bash
ok bug-report --no-reveal
```

<ParamField body="--no-reveal" type="boolean">
Skip revealing the bundle in Finder (preferred for agent-driven capture).
</ParamField>

Collects:

| Path | Source |
| --- | --- |
| `logs/` | NDJSON files from `~/.ok/logs/` (filtered to current project when `.ok/config.yml` is present) |
| `lockdir/` | `.ok/local/server.lock`, `last-spawn-error.log` |
| `local-logs/` | `.ok/local/logs/server-current.jsonl`, `server-prev.jsonl` |
| `sysinfo.json` | Platform, Node/Bun versions, OK package version, memory |
| `MANIFEST.json` | File inventory and redaction audit (`disciplineVersion: 1.0.0`) |
| `README.md` | Privacy summary |

All text content passes through auto-redaction (home paths, GitHub PATs, AWS keys, Anthropic/OpenAI tokens, bearer headers, JWTs, URL credentials). The command prints the zip path to stdout and reports redaction counts on stderr when lines were scrubbed.

## Repair skills and MCP configs

Editor integration drifts when CLI versions change or manual edits overwrite bundled entries. Open Knowledge repairs wiring on every `ok start`:

1. **MCP configs** — `repairMcpConfigs` migrates stale entries for Claude Code, Cursor, and Codex (user-global and project-local scopes). Outcomes: `no-entry`, `canonical`, `repaired`, or `write-failed`.
2. **Skills** — `repairSkills` refreshes project skills (`.claude/skills/open-knowledge`, `.cursor/skills/open-knowledge`, `.agents/skills/open-knowledge`) and user-global discovery skills (`~/.agents/skills/open-knowledge-discovery` plus per-host copies).

Force an explicit skill sweep without starting the server:

```bash
ok repair-skills
```

Initial MCP registration happens during `ok init`. If agents still cannot reach Open Knowledge after `ok start`, re-run `ok init` in the project root or verify editor MCP config files contain the `# ok-mcp-v1` sentinel.

<Tip>
Set `OK_RECLAIM_DISABLE=1` to skip automatic MCP, launch.json, and skill repair sweeps during `ok start`. Use this in test environments or when you manage editor configs manually.
</Tip>

## Common failure modes

<AccordionGroup>
<Accordion title="`ok start` says server lock held">

Run `ok status`. If the server is `alive`, another instance is running — use `ok stop` or `ok ps` to identify it. If `stale` or `corrupt`, run `ok clean` then `ok start`.

</Accordion>

<Accordion title="Editor MCP tools do not respond">

1. Confirm the server is alive: `ok status`
2. Run `ok diagnose health --check server-lock`
3. Run `ok start` (triggers MCP config repair) or `ok repair-skills`
4. Re-verify with an `exec` tool prompt (see [Wire agent editors](/wire-agent-editors))

</Accordion>

<Accordion title="Shadow repo or history warnings">

`shadow-repo` and `shadow-health` warnings often clear after `ok start` initializes packing and journal cleanup. Persistent warnings suggest inspecting server logs in `.ok/local/logs/`.

</Accordion>

<Accordion title="macOS desktop app quarantine">

The `macos-codesig` check fails when the app runs translocated from Downloads. Drag Open Knowledge.app to `/Applications/` and relaunch.

</Accordion>

<Accordion title="Spawn errors after crash">

Check `.ok/local/last-spawn-error.log`. Run `ok clean`, then `ok start`. Include `ok bug-report` output when filing an issue.

</Accordion>
</AccordionGroup>

```text
Symptom                    → First command        → If unresolved
─────────────────────────────────────────────────────────────────
Port in use / lock held    → ok status            → ok stop / ok clean
Prereq / config failure    → ok diagnose health   → fix reported check
Agent integration broken   → ok repair-skills     → ok start (MCP repair)
Need logs for GitHub       → ok bug-report        → attach zip to issue
Need project telemetry     → ok diagnose bundle   → --redact if sharing
Hung server process        → ok diagnose process  → include in bundle --pid
```

## Related pages

<CardGroup>
<Card title="CLI reference" href="/cli-reference">
Full `ok` subcommand list including `start`, `stop`, `ps`, `status`, and `clean`.
</Card>
<Card title="Collaboration server" href="/collaboration-server">
Lock lifecycle, server-free MCP reads, and protocol boundaries.
</Card>
<Card title="Wire agent editors" href="/wire-agent-editors">
MCP registration, bundled skills, and `exec` verification prompts.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
`telemetry.localSink` and other keys that affect diagnostic collection.
</Card>
</CardGroup>
