# Relay Server & Agent Communication Protocol

> The WebSocket relay server (src/relay/) that every agent subprocess connects to for filesystem access, PTY sessions, git operations, and workspace-session state. Covers protocol handshake, handler dispatch, plugin overlay sandboxing, and the shared protocol envelope types.

- Repository: stablyai/orca
- GitHub: https://github.com/stablyai/orca
- Human wiki: https://grok-wiki.com/public/wiki/stablyai-orca-47ffb1f68457
- Complete Markdown: https://grok-wiki.com/public/wiki/stablyai-orca-47ffb1f68457/llms-full.txt

## Source Files

- `src/relay/relay.ts`
- `src/relay/protocol.ts`
- `src/relay/dispatcher.ts`
- `src/relay/pty-handler.ts`
- `src/relay/fs-handler.ts`
- `src/relay/git-handler.ts`
- `src/relay/workspace-session-handler.ts`
- `src/relay/plugin-overlay.ts`
- `src/shared/runtime-rpc-envelope.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [src/relay/relay.ts](src/relay/relay.ts)
- [src/relay/protocol.ts](src/relay/protocol.ts)
- [src/relay/dispatcher.ts](src/relay/dispatcher.ts)
- [src/relay/pty-handler.ts](src/relay/pty-handler.ts)
- [src/relay/fs-handler.ts](src/relay/fs-handler.ts)
- [src/relay/git-handler.ts](src/relay/git-handler.ts)
- [src/relay/workspace-session-handler.ts](src/relay/workspace-session-handler.ts)
- [src/relay/plugin-overlay.ts](src/relay/plugin-overlay.ts)
- [src/relay/relay-handshake.ts](src/relay/relay-handshake.ts)
- [src/relay/agent-hook-server.ts](src/relay/agent-hook-server.ts)
- [src/shared/runtime-rpc-envelope.ts](src/shared/runtime-rpc-envelope.ts)
</details>

# Relay Server & Agent Communication Protocol

The Orca relay is a self-contained Node.js daemon deployed to remote hosts (via SCP over an SSH exec channel). It acts as the single transport bridge between the Electron main process and every feature that needs to run on the remote machine: PTY sessions, filesystem access, git operations, workspace-session state, and agent-status hooks. All communication flows over a binary-framed JSON-RPC 2.0 protocol initially carried on stdin/stdout, with an optional Unix domain socket fallback that keeps live PTY sessions alive across app restarts.

Understanding the relay is essential for contributors working on SSH workspaces, agent integration, or any feature that has a "remote" code path. The relay is versioned independently from the Orca Electron app, and its protocol is carefully versioned to support rolling upgrades, grace-period reconnection, and multi-client fanout for synced remote workspaces.

---

## Architecture Overview

```text
┌───────────────────────────────────────────────────────────┐
│  Orca Electron (main process / SSH channel)               │
│  SshChannelMultiplexer ↔ framed JSON-RPC                  │
└────────────────┬──────────────────────────────────────────┘
                 │ stdin / stdout  (initial SSH exec channel)
                 │ OR Unix domain socket (--connect bridge)
┌────────────────▼──────────────────────────────────────────┐
│  relay.ts  —  daemon entry point                          │
│  ┌───────────────────────────────────────────────────┐    │
│  │  RelayDispatcher  (protocol.ts framing)            │    │
│  │  ┌──────────┐ ┌─────────┐ ┌──────┐ ┌──────────┐  │    │
│  │  │PtyHandler│ │FsHandler│ │GitH. │ │WsSession │  │    │
│  │  └──────────┘ └─────────┘ └──────┘ └──────────┘  │    │
│  │  ┌───────────────────┐  ┌────────────────────────┐│    │
│  │  │PluginOverlayMgr   │  │RelayAgentHookServer    ││    │
│  │  │(~/.orca-relay/)   │  │(loopback HTTP :0)      ││    │
│  │  └───────────────────┘  └────────────────────────┘│    │
│  └───────────────────────────────────────────────────┘    │
│  Unix domain socket  (relay.sock)  ← --connect bridge     │
└───────────────────────────────────────────────────────────┘
                 │
          agent CLIs in PTYs (Pi, OMP, OpenCode)
          post hook events → loopback HTTP → agent.hook JSON-RPC
```

---

## Wire Protocol

### Binary Frame Format

Every byte on the wire — in both directions — is wrapped in a fixed 13-byte header defined in `src/relay/protocol.ts`.

| Offset | Size | Field | Description |
|--------|------|-------|-------------|
| 0 | 1 | `type` | `1` = Regular JSON-RPC, `2` = Handshake, `9` = KeepAlive |
| 1 | 4 | `id` | Outgoing sequence number (uint32BE) |
| 5 | 4 | `ack` | Highest received sequence number from peer (uint32BE) |
| 9 | 4 | `length` | Payload byte count (uint32BE) |
| 13 | N | payload | JSON-encoded JSON-RPC 2.0 message (Regular/Handshake) or empty (KeepAlive) |

Maximum payload size is 16 MB (`MAX_MESSAGE_SIZE = 16 * 1024 * 1024`). Oversized frames are discarded rather than raising an exception so the decoder stays synchronized with the stream.

Sources: [src/relay/protocol.ts:14-18](), [src/relay/protocol.ts:78-102]()

### JSON-RPC 2.0 Envelope

Regular frames carry a standard JSON-RPC 2.0 object:

```typescript
// Request (client → relay)
{ jsonrpc: '2.0', id: number, method: string, params?: Record<string, unknown> }

// Response (relay → client)
{ jsonrpc: '2.0', id: number, result?: unknown }
{ jsonrpc: '2.0', id: number, error?: { code: number, message: string, data?: unknown } }

// Notification (either direction, no id)
{ jsonrpc: '2.0', method: string, params?: Record<string, unknown> }
```

Cancellation uses a standard `rpc.cancel` notification with `{ id: <request-id> }`, which aborts the corresponding in-flight `AbortController`.

Sources: [src/relay/protocol.ts:59-75](), [src/relay/dispatcher.ts:185-191]()

### KeepAlive

The dispatcher emits a `KeepAlive` frame (type `9`, empty payload) on a 5-second interval (`KEEPALIVE_SEND_MS = 5_000`) for every attached client. The timer is `unref()`-ed so it does not prevent natural process exit.

Sources: [src/relay/protocol.ts:27](), [src/relay/dispatcher.ts:232-252]()

### File Streaming Constants

Large file reads use a chunked streaming subprotocol:

| Constant | Value |
|----------|-------|
| `STREAM_CHUNK_SIZE` | 256 KB |
| `MAX_CONCURRENT_STREAMS` | 16 |
| `RelayErrorCode.TooManyStreams` | -33006 |
| `RelayErrorCode.StreamProtocolError` | -33007 |

Sources: [src/relay/protocol.ts:35-42]()

### Runtime RPC Envelope

`src/shared/runtime-rpc-envelope.ts` defines a second, higher-level envelope used by runtime clients (CLI, desktop) that wrap agent subprocess responses — distinct from the relay wire frame. It is validated with Zod and carries a `runtimeId` for routing:

```typescript
type RuntimeRpcSuccess<TResult> = { id: string; ok: true; result: TResult; _meta: { runtimeId: string } }
type RuntimeRpcFailure = { id: string; ok: false; error: { code: string; message: string }; _meta?: { runtimeId: string | null } }
type RuntimeRpcKeepaliveFrame = { _keepalive: true }
```

Sources: [src/shared/runtime-rpc-envelope.ts:4-70]()

---

## Startup & Handshake

### Normal Mode

1. The Electron app deploys `relay.js` via SCP and launches it with `ssh exec`.
2. The relay starts a Unix domain socket server (`relay.sock` next to `relay.js`, perms `0o600` set atomically via `umask(0o177)` before `listen()`).
3. After all async setup (socket server + agent hook server bound), the relay writes `RELAY_SENTINEL` (`ORCA-RELAY v0.1.0 READY\n`) to stdout.
4. The client sees the sentinel, begins sending framed JSON-RPC data over stdin.

Sources: [src/relay/protocol.ts:6](), [src/relay/relay.ts:398-400](), [src/relay/relay.ts:340-351]()

### --connect Bridge Mode (Reconnection)

When the Orca app restarts after a disconnect, the live relay is still running in its grace period with PTYs intact. A new SSH exec channel runs `relay.js --connect`, which:

1. Opens a TCP connection to the existing `relay.sock`.
2. Performs a **version handshake** (Handshake frame type) before any JSON-RPC frames are sent.
3. On success, writes `RELAY_SENTINEL` to the new channel's stdout and pipes `stdin ↔ socket`.

The handshake uses a dedicated frame type (`MessageType.Handshake = 2`) carrying JSON:

```typescript
// Bridge → Daemon
{ type: 'orca-relay-handshake', version: string }

// Daemon → Bridge (accept)
{ type: 'orca-relay-handshake-ok', version: string }

// Daemon → Bridge (reject)
{ type: 'orca-relay-handshake-mismatch', expected: string, got: string }
```

A version mismatch causes the bridge to exit with code **42** (`EXIT_CODE_VERSION_MISMATCH`), which the Electron client maps to a non-retryable `RelayVersionMismatchError`.

Sources: [src/relay/relay-handshake.ts:20-24](), [src/relay/relay-handshake.ts:80-120](), [src/relay/relay.ts:118-164]()

### Grace Period

When stdin closes (SSH channel drops), the relay does **not** exit immediately. It starts a configurable grace timer (default from `DEFAULT_SSH_RELAY_GRACE_PERIOD_SECONDS`) during which all PTYs remain alive and `relay.sock` stays open. The grace is canceled the moment a `--connect` client connects. The host can reduce or eliminate the grace window via the `SSH_RELAY_CONFIGURE_GRACE_TIME_METHOD` request/notification.

```
SIGHUP is explicitly ignored (instead of using Node's default exit)
so the relay survives the OS signal sent when the SSH session drops.
```

Sources: [src/relay/relay.ts:86-90](), [src/relay/relay.ts:402-415](), [src/relay/relay.ts:451-458]()

### Stale Socket Detection

On startup, if `relay.sock` already exists (`EADDRINUSE`), the relay probes the path with a short `connect()` call (`STALE_SOCKET_PROBE_TIMEOUT_MS = 500 ms`). If the probe is refused (`ECONNREFUSED`), the socket is stale (left by a crashed relay), and the relay unlinks it and retries `listen()`. If the probe connects, a live daemon owns it and startup fails cleanly.

Sources: [src/relay/relay.ts:253-325]()

---

## Dispatcher (`RelayDispatcher`)

`RelayDispatcher` is the central hub that receives raw binary data from one or more clients, decodes frames, routes JSON-RPC messages to handlers, and multiplexes outgoing frames back to each client.

```mermaid
sequenceDiagram
    participant Client as Orca (SSH/socket)
    participant Dispatcher as RelayDispatcher
    participant Handler as MethodHandler

    Client->>Dispatcher: feed(chunk) [framed JSON-RPC]
    Dispatcher->>Dispatcher: FrameDecoder → handleFrame
    alt Request
        Dispatcher->>Handler: handler(params, context)
        Handler-->>Dispatcher: result / Error
        Dispatcher-->>Client: JSON-RPC Response frame
    else Notification
        Dispatcher->>Handler: handler(params, context)
    else rpc.cancel
        Dispatcher->>Dispatcher: AbortController.abort()
    end
    Dispatcher->>Client: keepalive frame (every 5s)
```

### Multi-Client Support

A `RelayClient` record tracks per-client state: `FrameDecoder`, write callback, outgoing sequence counter, highest received sequence (for acks), a generation counter, and a `closed` flag.

- **Primary client**: the initial stdin/stdout transport. Its write callback is replaced (via `setWrite`) when a `--connect` socket client takes over.
- **Secondary clients**: attached via `attachClient()` for synced-workspace scenarios where multiple Orca instances share one relay.

Sequence counters and decoder state reset on reconnect so the new client's multiplexer starts fresh at seq=1 without triggering false positive timeout alarms on the client side.

Sources: [src/relay/dispatcher.ts:36-55](), [src/relay/dispatcher.ts:62-88]()

### Request Lifecycle & Stale Detection

Each in-flight request gets an `AbortController`. The `RequestContext` passed to handlers exposes:

- `isStale()` — returns `true` if the client disconnected mid-flight (generation mismatch, client removed, or signal aborted).
- `signal` — an `AbortSignal` usable in `fetch`/`fs` calls.

Handlers check `context.isStale()` after any `await` and drop responses to disconnected clients. Mutating work (e.g., a `pty.spawn` that completed while the client was reconnecting) is torn down immediately.

Sources: [src/relay/dispatcher.ts:162-200]()

---

## Handler Domains

### PtyHandler — `pty.*`

`PtyHandler` wraps `node-pty` (loaded lazily; absent module returns a clean error) and registers the following RPC surface:

| Method | Type | Description |
|--------|------|-------------|
| `pty.spawn` | Request | Spawn a login shell, returns `{ id }` |
| `pty.attach` | Request | Attach to existing PTY; returns buffered replay (last 100 KB) |
| `pty.shutdown` | Request | SIGTERM + 5 s SIGKILL fallback, or immediate SIGKILL |
| `pty.sendSignal` | Request | Send an allowlisted POSIX signal |
| `pty.getCwd` | Request | Resolve current working directory via `/proc`/`lsof` |
| `pty.resize` | Notification | Resize PTY columns/rows |
| `pty.data` | Notification | Client → relay: write data to PTY |
| `pty.getDefaultShell` | Request | Resolve default login shell |
| `pty.serialize` / `pty.revive` | Request | Persist PTY state across relay cold restarts |
| `pty.listProcesses` | Request | List active PTYs with cwd and title |

**Spawn env composition**: `process.env` is merged with the renderer-supplied env, then each registered `PtyEnvAugmenter` is applied in order. Augmenters inject `ORCA_AGENT_HOOK_*` coordinates (from the agent hook server) and plugin overlay paths (`OPENCODE_CONFIG_DIR`, `PI_CODING_AGENT_DIR`). Augmenter values override renderer-supplied env, ensuring remote paths win over local `userData` paths.

**Grace timer**: the PTY handler owns the grace countdown. It is started when stdin closes and canceled when any client sends data.

Sources: [src/relay/pty-handler.ts:106-140](), [src/relay/pty-handler.ts:155-205](), [src/relay/pty-handler.ts:261-340]()

### FsHandler — `fs.*`

`FsHandler` exposes a full filesystem API over the relay channel:

| Method | Description |
|--------|-------------|
| `fs.readDir` | Directory listing with symlink-target type resolution |
| `fs.readFile` | Full file read (UTF-8) |
| `fs.readFileStream` | Chunked streaming read (256 KB chunks, max 16 concurrent) |
| `fs.writeFile` | Atomic write (rejects directories) |
| `fs.stat` / `fs.lstat` | File metadata; `stat` follows symlinks |
| `fs.createFile` / `fs.createDir` / `fs.createDirNoClobber` | Create operations |
| `fs.deletePath` | Delete file or directory (recursive flag required for dirs) |
| `fs.rename` / `fs.copy` | Move and copy |
| `fs.realpath` | Resolve symlink target |
| `fs.search` | Full-text search via `rg` (ripgrep), fallback to `git grep` |
| `fs.listFiles` | File listing via `rg --files` → `git ls-files` → readdir fallback |
| `fs.watch` / `fs.unwatch` | Filesystem watch via `@parcel/watcher`; emits `fs.changed` notifications |
| `fs.workspaceSpaceScan` | Workspace Space directory scan |
| `fs.cancelStream` | Abort an in-progress streaming read |

File watches are tracked per client: when a client disconnects, its watch subscriptions are released automatically via `onClientDetached`. Maximum concurrent watches: 20.

All paths pass through `expandTilde()` before any fs operation.

Sources: [src/relay/fs-handler.ts:66-95](), [src/relay/fs-handler.ts:157-230]()

### GitHandler — `git.*`

`GitHandler` shells out to `git` for all operations. Key RPC methods:

| Method | Description |
|--------|-------------|
| `git.status` | Working tree status |
| `git.checkIgnored` | Check paths against `.gitignore` |
| `git.history` | Commit log |
| `git.commit` | Create commit (used by worktree ops) |
| `git.diff` | File diff |
| `git.stage` / `git.unstage` / `git.bulkStage` | Staging area management |
| `git.discard` / `git.bulkDiscard` | Discard working-tree changes |
| `git.conflictOperation` | Conflict detection |
| `git.branchCompare` / `git.commitCompare` | Branch/commit diff |
| `git.upstreamStatus` | Upstream divergence |
| `git.fetch` / `git.push` / `git.pull` | Remote operations |
| `git.rebaseFromBase` | Rebase workflow |
| `git.listWorktrees` / `git.addWorktree` / `git.removeWorktree` | Git worktree management |
| `git.exec` | Validated arbitrary `git` exec (args validated via `validateGitExecArgs`) |
| `git.isGitRepo` | Check whether a path is inside a git repo |

Max git output buffer: 10 MB. Bulk stage/unstage operations are chunked at 100 paths.

Sources: [src/relay/git-handler.ts:59-88](), [src/relay/git-handler.ts:44-46]()

### WorkspaceSessionHandler — `workspace.*`

Persists shared workspace state for synced remote workspaces. Snapshots are written as JSON to `~/.orca/sessions/<namespace>.json` using atomic rename (write to `.tmp`, then `rename()`), with `0o700` directory permissions and `0o600` file permissions.

| Method | Description |
|--------|-------------|
| `workspace.get` | Read the current session snapshot for a namespace |
| `workspace.patch` | Apply a `replace-session` patch; emits `workspace.changed` notification to all clients |
| `workspace.presence` | Heartbeat for connected clients; expires stale entries after 45 s |

The `patch` method includes optimistic concurrency: it compares `baseRevision` to the stored revision and returns `{ ok: false, reason: 'stale-revision', snapshot }` on mismatch.

The session shape tracks `activeRepoId`, `activeWorktreeId`, `activeTabId`, `tabsByWorktree`, and `terminalLayoutsByTabId`.

Sources: [src/relay/workspace-session-handler.ts:39-75](), [src/relay/workspace-session-handler.ts:107-160]()

---

## Plugin Overlay Sandboxing

### Problem

Agent CLIs (OpenCode, Pi, OMP) read configuration from paths like `OPENCODE_CONFIG_DIR` or `PI_CODING_AGENT_DIR`. In local Orca sessions these paths point to Electron `userData`. In SSH sessions the remote host has no `userData`, so the relay must materialize equivalent overlay directories on the remote filesystem.

### Solution: `PluginOverlayManager`

On receipt of `agent_hook.installPlugins` (sent by Orca at session-ready), the relay stores the plugin source bodies in memory:

- `opencodePluginSource` → `orca-opencode-status.js`
- `piExtensionSource` → Pi-flavored `orca-agent-status.ts`
- `ompExtensionSource` → OMP-flavored `orca-agent-status.ts`

Source bodies are sent over the wire (not bundled with the relay binary) because Orca and the relay are versioned independently and plugin source changes with every agent event addition. A 256 KB cap per source body is enforced (`assertPluginSourceUnderByteCap`).

On each `pty.spawn`, the `PtyEnvAugmenter` calls `materializeOpenCode(id, sourceDir)` or `materializePi(id, sourceDir, kind)`, which:

1. Removes any prior overlay for this id (`safeRemoveOverlay`).
2. Creates `~/.orca-relay/<type>-overlays/<sha256(id)[0:32]>/`.
3. Mirrors the user's existing config dir into the overlay (skipping the Orca-owned plugin file).
4. Writes the fresh plugin source into the overlay's `plugins/` or `extensions/` directory.

The resulting path is injected as `OPENCODE_CONFIG_DIR` or `PI_CODING_AGENT_DIR` into the PTY env, overriding any renderer-supplied value.

On PTY exit, `clearOverlay(id)` removes the overlay directories for all known roots in one pass.

```text
~/.orca-relay/
├── opencode-overlays/<sha256(paneKey)>/
│   ├── <user's config mirrored>
│   └── plugins/
│       └── orca-opencode-status.js   ← written fresh on each spawn
├── pi-overlays/<sha256(paneKey)>/
│   ├── <user's ~/.pi/agent mirrored>
│   └── extensions/
│       └── orca-agent-status.ts
├── omp-overlays/<sha256(paneKey)>/
│   └── extensions/
│       └── orca-agent-status.ts
└── agent-hooks/
    └── <endpoint env file>           ← hook server binding info
```

Sources: [src/relay/plugin-overlay.ts:1-43](), [src/relay/plugin-overlay.ts:190-255]()

---

## Agent Hook Server

The relay hosts a loopback HTTP server (`RelayAgentHookServer`) on `127.0.0.1:0` so agent CLIs running inside relay-spawned PTYs can POST hook events without leaving the host. Bind happens before `RELAY_SENTINEL` is written to ensure every PTY spawned immediately after the sentinel sees the correct `ORCA_AGENT_HOOK_*` env coordinates.

Parsed hook payloads are forwarded as `agent.hook` JSON-RPC notifications over the existing SSH channel:

```
agent CLI → POST /hook/<source> (loopback HTTP)
          → RelayAgentHookServer
          → dispatcher.notify('agent.hook', envelope)
          → SSH channel → Orca main process
```

The server caches the last status per `paneKey`. After a `--connect` reconnect, Orca issues `agent_hook.requestReplay` to re-receive cached entries, closing the race where a status notification was sent during reconnection.

Sources: [src/relay/relay.ts:208-255](), [src/relay/relay.ts:279-310]()

---

## Session & Utility Requests

Beyond the four primary handler domains, `relay.ts` registers a small set of session-level requests directly on the dispatcher:

| Method | Description |
|--------|-------------|
| `session.registerRoot` | No-op (kept for protocol back-compat with older clients) |
| `session.resolveHome` | Expand `~` to `homedir()` on the remote host |
| `SSH_RELAY_CONFIGURE_GRACE_TIME_METHOD` | Adjust the PTY grace period (e.g., set to 0 before system sleep) |
| `relay.status` | Health check: PID, uptime, memory, active PTY count, socket state, grace state |
| `agent_hook.installPlugins` | Cache plugin source bodies for overlay materialization |
| `agent_hook.requestReplay` | Replay cached per-pane hook payloads after reconnection |

Sources: [src/relay/relay.ts:198-215](), [src/relay/relay.ts:219-230](), [src/relay/relay.ts:248-262]()

---

## Reconnection Flow

```mermaid
sequenceDiagram
    participant Orca as Orca (new launch)
    participant Bridge as relay.js --connect
    participant Daemon as relay daemon (grace)

    Orca->>Bridge: SSH exec relay.js --connect
    Bridge->>Daemon: connect(relay.sock)
    Bridge->>Daemon: Handshake frame {type, version}
    Daemon-->>Bridge: handshake-ok (or mismatch → exit 42)
    Bridge->>Orca: write RELAY_SENTINEL to stdout
    note over Daemon: cancelGrace("socket client accepted")
    Orca->>Bridge: JSON-RPC frames (stdin)
    Bridge->>Daemon: forwarded via socket
    Daemon-->>Bridge: JSON-RPC frames (socket)
    Bridge->>Orca: forwarded to stdout
    Orca->>Daemon: agent_hook.requestReplay
    Daemon-->>Orca: replay cached pane statuses
```

Sources: [src/relay/relay.ts:118-164](), [src/relay/relay-handshake.ts:65-95](), [src/relay/relay.ts:357-390]()

---

## Summary

The relay is a purpose-built, electron-free Node.js process that bridges an SSH exec channel to a full suite of remote capabilities. Its binary frame protocol (13-byte header, JSON-RPC 2.0 payload, sequence+ack tracking, keepalive) provides reliable ordering and reconnection safety. The dispatcher's multi-client model, generation-based stale detection, and per-request `AbortController` make it safe for live PTY state to survive app restarts. Plugin overlay sandboxing ensures that agent CLIs running in remote PTYs receive the correct status-reporting extensions even when the relay binary and Orca app are deployed at different versions. The workspace session handler adds collaborative multi-client session persistence on top of the same JSON-RPC channel, completing the relay's role as the universal remote-capability substrate for Orca.
