# Browser automation

> Automate Orca's embedded browser: accessibility snapshots, interaction commands, tabs/profiles, capture/intercept/storage, and `orca exec` passthrough to agent-browser.

- Repository: stablyai/orca
- GitHub: https://github.com/stablyai/orca
- Human docs: https://grok-wiki.com/public/docs/stablyai-orca-2036d532bf1c
- Complete Markdown: https://grok-wiki.com/public/docs/stablyai-orca-2036d532bf1c/llms-full.txt

## Source Files

- `src/cli/specs/browser-basic.ts`
- `src/cli/specs/browser-advanced.ts`
- `src/main/browser/browser-manager.ts`
- `src/main/browser/agent-browser-bridge.ts`
- `src/cli/handlers/browser-interact.ts`
- `src/cli/handlers/browser-tab.ts`
- `package.json`

---

---
title: "Browser automation"
description: "Automate Orca's embedded browser: accessibility snapshots, interaction commands, tabs/profiles, capture/intercept/storage, and `orca exec` passthrough to agent-browser."
---

The `orca` CLI drives Orca's embedded Electron `<webview>` guests through runtime RPC methods (`browser.*`). Each open browser page gets a per-tab `agent-browser` session backed by a localhost CDP WebSocket proxy (`CdpWsProxy`) that exposes only that guest to automation, while `BrowserManager` owns guest registration, navigation policy, and paintability leases for hidden tabs.

## Architecture

```mermaid
sequenceDiagram
  participant CLI as orca CLI
  participant RPC as Runtime RPC
  participant RT as OrcaRuntimeBrowser
  participant Bridge as AgentBrowserBridge
  participant Proxy as CdpWsProxy
  participant AB as agent-browser binary
  participant Guest as webview guest

  CLI->>RPC: browser.snapshot / browser.click / ...
  RPC->>RT: browserSnapshot(params)
  RT->>Bridge: snapshot(worktreeId, browserPageId)
  Bridge->>Bridge: enqueue per orca-tab-{pageId}
  Bridge->>Proxy: ensureSession + --cdp port
  Bridge->>AB: exec --session orca-tab-{id} snapshot --json
  AB->>Proxy: CDP over ws://127.0.0.1:PORT
  Proxy->>Guest: debugger commands
  AB-->>Bridge: JSON stdout
  Bridge-->>CLI: BrowserSnapshotResult
```

| Layer | Responsibility |
| --- | --- |
| CLI handlers (`src/cli/handlers/browser-*.ts`) | Parse flags, resolve worktree/page target, call RPC, format human or `--json` output |
| `OrcaRuntimeBrowser` | Resolve worktree/page to live `webContents`, tab create/close/profile IPC |
| `AgentBrowserBridge` | Per-tab command queue, session lifecycle, `agent-browser` subprocess execution |
| `CdpWsProxy` | Local HTTP `/json` + WebSocket CDP bridge scoped to one guest |
| `BrowserManager` | Guest registration maps, anti-detection injection, `acquireAutomationVisibility`, downloads/popups |

<Note>
Browser automation requires a running Orca desktop app (or paired runtime) with at least one registered browser page. Commands do not launch a headless browser on their own.
</Note>

## Prerequisites

<Steps>
<Step title="Start Orca and open a browser tab">
Create or focus a worktree, open the browser pane, and load a page (or run `orca tab create --url <url>`).
</Step>
<Step title="Verify connectivity">
From the worktree directory (or with explicit selectors):

```bash
orca status
orca tab list --json
```
</Step>
<Step title="Prefer JSON for agents">
Pass `--json` on every command so output is structured and stable for scripting.
</Step>
</Steps>

On Linux, the public command may be `orca-ide` instead of `orca` (same RPC surface).

## Snapshot–interact loop

Automation centers on accessibility snapshots that assign short element refs (`@e1`, `@e3`, …). Interact by ref, then re-snapshot after navigation or DOM changes.

```bash
orca goto --url https://example.com --json
orca snapshot --json
orca click --element @e3 --json
orca snapshot --json
```

<ParamField body="snapshot" type="BrowserSnapshotResult">
Returns `browserPageId`, `url`, `title`, a text `snapshot` (accessibility tree), and `refs: { ref, role, name }[]`.
</ParamField>

| Ref rule | Behavior |
| --- | --- |
| Assigned at snapshot time | Only refs from the latest snapshot on that tab are valid |
| Tab scope | Refs from one `browserPageId` are invalid on another tab |
| Navigation | Full navigation invalidates refs → `browser_stale_ref` |
| Tab switch | `orca tab switch` changes active tab and invalidates refs on the new target until re-snapshot |

## Command targeting

### Worktree scope

By default, browser commands see only tabs belonging to the **current worktree** (resolved from cwd). Tab indices in `orca tab list` and `orca tab switch --index` are relative to that filtered list.

<ParamField body="--worktree" type="selector">
`active` / `current` (cwd worktree), `id:<uuid>`, `path:<dir>`, `branch:<name>`, or `all` for cross-worktree tab listing.
</ParamField>

```bash
orca snapshot --json                    # current worktree, active tab
orca snapshot --worktree all --json     # all worktrees (tab list only)
orca tab list --worktree id:wt-abc --json
```

If no live tab exists in scope, RPC returns `browser_no_tab`.

### Stable page identity

For concurrent CLI processes or scripts, pin a tab with `--page <browserPageId>` from `orca tab list --json` instead of relying on ambient active-tab state.

```bash
orca tab list --json
orca snapshot --page page-7f2a --json
orca click --page page-7f2a --element @e2 --json
```

When `--page` is set, `--worktree` is optional validation/scoping; without `--page`, commands use the active tab in the resolved worktree.

## Navigation and observation

| Command | RPC method | Notes |
| --- | --- | --- |
| `orca goto --url <url>` | `browser.goto` | 60s CLI RPC timeout (network idle) |
| `orca back` / `forward` / `reload` | `browser.back` / `browser.forward` / `browser.reload` | `reload` uses Electron `webContents.reload()` to avoid process-swap session loss |
| `orca snapshot` | `browser.snapshot` | Fresh refs; bypasses stale-ref guard |
| `orca screenshot [--format png\|jpeg]` | `browser.screenshot` | Viewport; base64 in JSON |
| `orca full-screenshot` | `browser.fullPageScreenshot` | Full page via direct CDP capture |
| `orca pdf` | `browser.pdf` | `webContents.printToPDF()` (not agent-browser CDP) |
| `orca eval --expression <js>` | `browser.eval` | Page-context JavaScript |
| `orca wait …` | `browser.wait` | See wait modes below |

### Wait modes

```bash
orca wait --timeout 1000 --json
orca wait --selector ".results" [--state hidden] [--timeout 30000] --json
orca wait --text "Dashboard" --timeout 30000 --json
orca wait --url "/dashboard" --timeout 30000 --json
orca wait --load networkidle --timeout 60000 --json
orca wait --fn "document.querySelector('.ok')" --timeout 30000 --json
```

`browser.wait` uses a 60s default CLI RPC timeout. Condition waits pass a bridge-level timeout (`timeout + 1s` grace) so missing selectors surface as `browser_timeout` instead of hanging until the generic RPC deadline.

## Interaction commands

Refs are passed as `--element <ref>` (e.g. `@e5`).

| Command | RPC | Purpose |
| --- | --- | --- |
| `click`, `dblclick`, `hover` | `browser.click`, `browser.dblclick`, `browser.hover` | Pointer actions |
| `fill --value` | `browser.fill` | Clear + set value (bridge uses focus + JS setter for webview compatibility) |
| `type --input` | `browser.type` | Type at current focus |
| `select --value` | `browser.select` | Dropdown |
| `check` / `uncheck` | `browser.check` | Checkbox/radio |
| `scroll --direction up\|down [--amount]` | `browser.scroll` | Viewport scroll |
| `scrollintoview --element` | `browser.scrollIntoView` | Element into view |
| `focus`, `clear`, `select-all` | `browser.focus`, `browser.clear`, `browser.selectAll` | Focus and text selection |
| `keypress --key` | `browser.keypress` | Named keys (Enter, Tab, ArrowDown, …) |
| `drag --from --to` | `browser.drag` | Drag between refs |
| `upload --files path,…` | `browser.upload` | File input |
| `get --what` / `is --what` | `browser.get`, `browser.is` | Property and state queries |
| `find --locator --value --action` | `browser.find` | Semantic locators (role, text, label, …) |
| `inserttext --text` | `browser.keyboardInsertText` | Text without key events |
| `mouse move\|down\|up\|wheel` | `browser.mouseMove`, etc. | Low-level mouse; `mouse click` uses direct CDP on the guest |
| `download`, `highlight` | `browser.download`, `browser.highlight` | File download trigger, visual highlight |

## Tab and profile management

### Tabs

| Command | RPC | Behavior |
| --- | --- | --- |
| `orca tab list [--show-profile]` | `browser.tabList` | Lists tabs; `tab list` is side-effect free (does not change active tab) |
| `orca tab current` | `browser.tabCurrent` | Active tab for worktree scope |
| `orca tab show --page <id>` | `browser.tabShow` | Metadata for one page |
| `orca tab switch (--index n \| --page id) [--focus]` | `browser.tabSwitch` | Updates per-worktree active tab; `--focus` surfaces browser UI when already on that worktree |
| `orca tab create [--url] [--profile]` | `browser.tabCreate` | 60s RPC timeout |
| `orca tab close [--index]` | `browser.tabClose` | Destroys agent-browser session for that page |

`AgentBrowserBridge` names sessions `orca-tab-{browserPageId}`, serializes commands per session, and destroys sessions on tab close or Chromium process swap (intercept patterns may be restored on next init).

### Session profiles

Profiles isolate cookies/storage partitions for browser tabs.

```bash
orca tab profile list --json
orca tab profile create --label "Staging" --scope isolated --json
orca tab profile set --page <id> --profile <profileId> --json
orca tab profile show --page <id> --json
orca tab profile use-default --page <id> --json
orca tab profile clone --profile <id> [--page <id>] --json
orca tab profile delete --profile <id> --json
```

<ParamField body="--scope" type="isolated | imported">
`isolated` (default) creates a fresh partition; `imported` is for profiles that import browser state (registry may reject invalid scopes).
</ParamField>

## Cookies, storage, and emulation

| Area | CLI examples | RPC prefix |
| --- | --- | --- |
| Cookies | `cookie get`, `cookie set --name --value`, `cookie delete --name` | `browser.cookie*` via agent-browser `cookies` |
| Viewport | `viewport --width --height [--scale] [--mobile]` | `browser.setViewport` (CDP `Emulation.setDeviceMetricsOverride`; `--mobile` applied in bridge) |
| Geolocation | `geolocation --latitude --longitude` | `browser.setGeolocation` |
| Device / network | `set device`, `set offline`, `set headers`, `set credentials`, `set media` | `browser.set*` |
| Storage | `storage local get\|set\|clear`, `storage session …` | `browser.storage.local.*`, `browser.storage.session.*` |
| Clipboard | `clipboard read`, `clipboard write --text` | `browser.clipboard*` |
| Dialogs | `dialog accept [--text]`, `dialog dismiss` | `browser.dialog*` |

## Capture, intercept, and network tooling

```bash
orca capture start --json
orca capture stop --json
orca console [--limit n] --json
orca network [--limit n] --json
```

| Command | Implementation note |
| --- | --- |
| `intercept enable [--patterns glob,...]` | Maps to agent-browser `network route <pattern>`; first pattern used; state restored after session recreate |
| `intercept disable` | `network unroute` |
| `intercept list` | `network requests` (paused/intercepted listing) |

<Warning>
Per-request `intercept continue` / `intercept block` are **not** implemented yet. agent-browser currently supports URL-pattern routing only, not per-request decisions.
</Warning>

`capture start` / `stop` wrap agent-browser HAR capture (`network har start|stop`). Console and network log commands read captured buffers.

## `orca exec` passthrough

`orca exec --command "<agent-browser subcommand>"` forwards arbitrary agent-browser syntax through `browser.exec`. Orca injects `--session` and `--cdp` itself; user-supplied `--cdp` and `--session` flags are stripped to prevent target hijacking.

```bash
orca exec --command 'get text @e1' --json
orca exec --command 'find role button click --name "Submit"' --json
orca exec --command 'frame main' --json
```

Use `exec` for keyboard helpers (`keyboard type`, `keydown`/`keyup`), frame switching, semantic `find`, and agent-browser features without a dedicated top-level `orca` subcommand. Prefer first-class `orca click`, `orca fill`, etc. when available—they share the same queue and error translation.

## Hidden tabs and screenshots

Inactive browser panes are not painted in the renderer (`display:none`). Before most commands, `BrowserManager.acquireAutomationVisibility` acquires a renderer lease so the target guest can paint **without** switching the user's visible Orca tab. Screenshot paths additionally serialize captures globally and wait ~300–500ms after the lease for compositor settle.

Background throttling is disabled on guests so CDP screenshots work when Orca is not the focused app.

## Error codes

| Code | Typical cause | Recovery |
| --- | --- | --- |
| `browser_no_tab` | No registered/live tab in worktree | `tab create` or open browser UI |
| `browser_tab_not_found` | Bad `--page` or index out of range | `tab list --json` |
| `browser_tab_closed` | Tab closed while command queued/running | Re-create tab |
| `browser_stale_ref` | Ref from old snapshot/navigation | `snapshot` again |
| `browser_timeout` | Condition wait exceeded | Increase `--timeout`, fix selector |
| `browser_error` | Generic agent-browser/CDP failure | Read message; retry after `reload` |

Stale-ref detection maps agent-browser messages matching `unknown ref`, `ref not found`, etc. to `browser_stale_ref`.

<RequestExample>

```bash
orca snapshot --json
```

</RequestExample>

<ResponseExample>

```json
{
  "browserPageId": "page-abc",
  "url": "https://example.com/",
  "title": "Example",
  "snapshot": "- button \"More\" [ref=@e1]\n  ...",
  "refs": [{ "ref": "@e1", "role": "button", "name": "More" }]
}
```

</ResponseExample>

## Agent workflow checklist

1. `orca tab list --json` — discover `browserPageId` values when multiple tabs or agents run in parallel.
2. `orca snapshot --page <id> --json` — capture refs after every navigation or major UI change.
3. Interact with `--page` pinned; avoid bare active-tab assumptions in concurrent scripts.
4. Prefer `orca wait --text` / `--url` / `--load` over blind `wait --timeout`.
5. On `browser_stale_ref`, re-snapshot; on `browser_no_tab`, create or switch tabs explicitly.
6. Use `orca exec` only for agent-browser features without a dedicated Orca wrapper.

<Info>
The bundled `agent-browser` package (see `package.json`, currently `~0.24.1`) ships a platform-specific native binary under `resources/` in production builds or `node_modules/agent-browser/bin/` in dev. Bridge command timeout is 90s per subprocess call; three consecutive timeouts destroy and recreate the session.
</Info>

## Related pages

<CardGroup>
<Card title="CLI browser and computer reference" href="/cli-browser-computer-reference">
Full command signatures for browser and `computer` automation.
</Card>
<Card title="CLI scripting workflow" href="/cli-scripting-workflow">
End-to-end agent automation with terminals, worktrees, and checkpoints.
</Card>
<Card title="Selectors and JSON output" href="/selectors-json-output">
`--worktree`, `--page`, `--json`, and selector grammar.
</Card>
<Card title="Runtime environments" href="/runtime-environments">
Target a remote Orca runtime with `--environment` or `--pairing-code`.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Runtime reachability, selector ambiguity, and timeout failures.
</Card>
</CardGroup>
