# In-App Browser Pane Automation

> The WKWebView sidecar experience for authenticated browser panes, omnibar behavior, find-in-page scripts, devtools affordances, and socket-driven browser automation.

- Repository: manaflow-ai/cmux
- GitHub: https://github.com/manaflow-ai/cmux
- Human wiki: https://grok-wiki.com/public/wiki/manaflow-ai-cmux-5a511656cb1a
- Complete Markdown: https://grok-wiki.com/public/wiki/manaflow-ai-cmux-5a511656cb1a/llms-full.txt

## Source Files

- `Sources/Panels/BrowserPanel.swift`
- `Sources/Panels/BrowserPanelView.swift`
- `Sources/BrowserWindowPortal.swift`
- `Sources/Find/BrowserFindJavaScript.swift`
- `Sources/Find/BrowserSearchOverlay.swift`
- `cmuxTests/BrowserPanelTests.swift`
- `cmuxTests/BrowserImportMappingTests.swift`
- `tests_v2/test_cli_browser_console_errors_text.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [Sources/Panels/BrowserPanel.swift](Sources/Panels/BrowserPanel.swift)
- [Sources/Panels/BrowserPanelView.swift](Sources/Panels/BrowserPanelView.swift)
- [Sources/BrowserWindowPortal.swift](Sources/BrowserWindowPortal.swift)
- [Sources/Find/BrowserFindJavaScript.swift](Sources/Find/BrowserFindJavaScript.swift)
- [Sources/Find/BrowserSearchOverlay.swift](Sources/Find/BrowserSearchOverlay.swift)
- [Sources/TerminalController.swift](Sources/TerminalController.swift)
- [CLI/cmux.swift](CLI/cmux.swift)
- [Sources/Panels/BrowserAutomation.swift](Sources/Panels/BrowserAutomation.swift)
- [cmuxTests/BrowserPanelTests.swift](cmuxTests/BrowserPanelTests.swift)
- [cmuxTests/BrowserImportMappingTests.swift](cmuxTests/BrowserImportMappingTests.swift)
- [tests_v2/test_cli_browser_console_errors_text.py](tests_v2/test_cli_browser_console_errors_text.py)
- [docs/agent-browser-port-spec.md](docs/agent-browser-port-spec.md)
- [skills/cmux-browser/SKILL.md](skills/cmux-browser/SKILL.md)
</details>

# In-App Browser Pane Automation

cmux’s in-app browser is a `WKWebView` surface embedded into the same pane/tab system as terminal surfaces. The interesting part is not just page rendering: the browser pane preserves authenticated state, exposes an omnibar with history/search suggestions, hosts find-in-page UI above WebKit, keeps DevTools usable through split/layout churn, and exposes a socket/CLI automation API for agents.

This page follows the provided Compound Engineering wiki-shape guidance as a portable writing profile. I did not find `STRATEGY.md` or `docs/solutions/**` in this checkout, so the repository code, tests, and local docs are the source of truth.

## Mental Model

```text
SwiftUI BrowserPanelView
  - toolbar, omnibar, profile/theme/devtools buttons
  - lightweight portal anchor

BrowserPanel
  - WKWebView owner, profile/data store, navigation, find, devtools intent
  - console/error/dialog bootstrap scripts

WindowBrowserPortal
  - per-window AppKit sidecar host
  - reparents WKWebView above SwiftUI
  - hosts find bar and omnibar suggestions where WebKit will not cover them

TerminalController + CLI/cmux.swift
  - socket methods and CLI commands
  - browser.open_split, navigate, wait, snapshot, eval, click, cookies, storage, tabs, logs
```

Sources: [Sources/Panels/BrowserPanelView.swift:406-460](), [Sources/Panels/BrowserPanel.swift:2306-2328](), [Sources/BrowserWindowPortal.swift:3091-3150](), [Sources/TerminalController.swift:3888-3970]()

## Authenticated Browser Pane State

Browser panes are profile-backed. `BrowserProfileStore` owns named profiles, tracks last-used profile ID, creates per-profile `WKWebsiteDataStore` instances, and stores per-profile history files outside the default profile. The built-in default profile uses `WKWebsiteDataStore.default()`, while custom profiles use `WKWebsiteDataStore(forIdentifier:)`; that is the key persistence boundary for cookies, local storage, and repeated sign-in flows.

`BrowserPanel` also uses a shared `WKProcessPool`, configures persistent website data, enables JavaScript, and injects cmux bootstrap scripts at document start. Navigation delegates record visits into the bound profile history store only when the callback belongs to the current web view instance, which matters when profiles switch or WebKit processes are replaced.

Sources: [Sources/Panels/BrowserPanel.swift:371-520](), [Sources/Panels/BrowserPanel.swift:3353-3438](), [Sources/Panels/BrowserPanel.swift:3440-3502](), [cmuxTests/BrowserPanelTests.swift:270-313]()

A notable authentication affordance is explicit default handling for `URLAuthenticationChallenge`. The comment calls out TLS client certificate and MDM/SSO flows: implementing this delegate avoids WebKit’s default rejection path and lets system URL loading, keychain identities, root CAs, and SSO extensions participate.

Sources: [Sources/Panels/BrowserPanel.swift:7602-7618]()

## Omnibar And Navigation Behavior

The omnibar is both an address bar and a search box. Search engines are modeled as `BrowserSearchEngine` cases with query URL builders for Google, DuckDuckGo, Bing, Kagi, and Startpage. `navigateSmart(_:)` first asks `resolveNavigableURL(from:)`; if the input is not URL-like, it falls back to the configured search engine. The URL resolver has explicit localhost handling because `URL(string: "localhost:3777")` would otherwise parse `localhost` as a scheme.

Sources: [Sources/Panels/BrowserPanel.swift:91-157](), [Sources/Panels/BrowserPanel.swift:5029-5044](), [Sources/Panels/BrowserPanel.swift:5339-5374]()

The view layer treats the omnibar as stateful UI, not as a thin text field. It tracks buffer text, selection, IME marked text, inline completion, local and remote suggestion loading, and whether suggestions should be rendered in SwiftUI or the AppKit portal. Suggestions combine recent history, history matches, open-tab matches, stale remote suggestions, and URL resolution; remote search suggestions are skipped for empty, single-character, URL-like, or disabled-suggestion cases.

Sources: [Sources/Panels/BrowserPanelView.swift:417-491](), [Sources/Panels/BrowserPanelView.swift:553-599](), [Sources/Panels/BrowserPanelView.swift:1377-1412](), [Sources/Panels/BrowserPanelView.swift:2273-2407]()

The toolbar around the omnibar gives browser-pane workflows first-class controls: back, forward, reload/stop, screenshot-to-clipboard, React Grab injection, DevTools, profile selection, theme selection, and import hints. Profile and theme controls are popovers backed by `BrowserProfileStore` and `BrowserThemeMode`.

Sources: [Sources/Panels/BrowserPanelView.swift:1056-1193](), [Sources/Panels/BrowserPanelView.swift:1195-1370]()

## AppKit Portal Sidecar

`BrowserPanelView` intentionally keeps find UI out of normal SwiftUI when a live WebKit view is mounted: the comment states that rendering find UI in SwiftUI can put it behind the portal-hosted `WKWebView`. `WindowBrowserPortal` is the sidecar that solves this. It binds a `WKWebView` to a SwiftUI anchor, reparents it into a per-window AppKit host, preserves visibility and z-priority, and exposes registry methods for hiding, discarding, refreshing, updating overlays, and locating the web view under a window point.

Sources: [Sources/Panels/BrowserPanelView.swift:709-735](), [Sources/BrowserWindowPortal.swift:3091-3188](), [Sources/BrowserWindowPortal.swift:4031-4195]()

The portal is also where overlay UI survives WebKit layering. `WindowBrowserSlotView` can host the search overlay and omnibar suggestions as AppKit-hosted SwiftUI views. Its omnibar suggestions hosting view only hit-tests inside the popup frame, so the overlay can sit above WebKit without stealing unrelated pointer events.

Sources: [Sources/BrowserWindowPortal.swift:1271-1331](), [Sources/BrowserWindowPortal.swift:1469-1490](), [Sources/BrowserWindowPortal.swift:1623-1692](), [Sources/BrowserWindowPortal.swift:1707-1725]()

DevTools are part of this sidecar problem. Attached Web Inspector can mutate the moved `WKWebView`’s frame, so the slot view pins plain web views to bounds but preserves WebKit-managed split frames when companion WebKit subviews exist.

Sources: [Sources/BrowserWindowPortal.swift:1811-1872]()

## Find In Page

Find-in-page has three cooperating pieces:

| Piece | Responsibility |
|---|---|
| `BrowserSearchState` | Observable `needle`, selected match, and total count |
| `BrowserSearchOverlay` | Draggable find UI with Return, Shift+Return, Escape, and buttons |
| `BrowserFindJavaScript` | DOM mutation scripts that mark matches and restore text nodes |

`BrowserPanel.startFind()` creates or reuses search state, suppresses address-bar focus, posts focus notifications more than once to survive portal mount races, and replays the search after navigation. The actual search scripts run through `webView.evaluateJavaScript`.

Sources: [Sources/Panels/BrowserPanel.swift:2306-2316](), [Sources/Panels/BrowserPanel.swift:2800-2835](), [Sources/Panels/BrowserPanel.swift:6235-6352]()

The JavaScript uses `TreeWalker` over visible text nodes, skips script/style/template/iframe/SVG content, wraps matches in `<mark class="__cmux-find">`, scrolls the current match into view, and returns JSON with `{total,current}`. Next/previous scripts cycle through `window.__cmuxFindMatches`; clear restores text nodes and removes the injected style element.

Sources: [Sources/Find/BrowserFindJavaScript.swift:3-104](), [Sources/Find/BrowserFindJavaScript.swift:106-172](), [Sources/Find/BrowserFindJavaScript.swift:176-206]()

The overlay is compact and operational: the field shows match counts, Return moves next, Shift+Return moves previous, Escape closes, and the user can drag the overlay to the nearest corner.

Sources: [Sources/Find/BrowserSearchOverlay.swift:21-95](), [Sources/Find/BrowserSearchOverlay.swift:116-181](), [Sources/Find/BrowserSearchOverlay.swift:267-293]()

## DevTools Affordances

DevTools are enabled at the WebKit configuration level with `developerExtrasEnabled`, and `WKWebView.isInspectable` is set on macOS 13.3+. The toolbar button calls `panel.toggleDeveloperTools()`, beeping if the operation is not handled.

Sources: [Sources/Panels/BrowserPanel.swift:3353-3374](), [Sources/Panels/BrowserPanel.swift:3388-3390](), [Sources/Panels/BrowserPanelView.swift:1178-1193](), [Sources/Panels/BrowserPanelView.swift:2061-2068]()

The implementation tracks user intent separately from current WebKit visibility. `preferredDeveloperToolsVisible`, presentation mode, detached-window grace periods, retry work items, and visibility-loss checks let cmux restore DevTools after split/layout attachment churn while still respecting manual close gestures. There is also a `showDeveloperToolsConsole()` helper that tries known private WebKit inspector selectors for opening the console tab.

Sources: [Sources/Panels/BrowserPanel.swift:2881-2915](), [Sources/Panels/BrowserPanel.swift:5891-5941](), [Sources/Panels/BrowserPanel.swift:5959-6139](), [Sources/Panels/BrowserPanel.swift:6141-6176]()

## Socket-Driven Browser Automation

The v2 socket surface is broad enough for agent loops. `TerminalController` routes browser commands including opening, navigation, URL reads, snapshots, evaluation, waits, input actions, screenshots, get/is queries, locator families, frames, dialogs, downloads, cookies, storage, tabs, console/errors, state, scripts, styles, and several unsupported-emulation families.

Sources: [Sources/TerminalController.swift:3560-3668](), [Sources/TerminalController.swift:3888-3970]()

Core commands resolve a workspace and browser surface, then execute against the `BrowserPanel.webView`. `browser.open_split` creates or reuses a right-side browser pane; `browser.navigate` calls `navigateSmart`; `browser.eval` runs arbitrary JavaScript with normalized return payloads; `browser.wait` supports selector, URL substring, text substring, load state, and JavaScript function conditions.

Sources: [Sources/TerminalController.swift:11587-11728](), [Sources/TerminalController.swift:11995-12013](), [Sources/TerminalController.swift:12296-12389]()

Interactive automation gets stable element refs from snapshots and finder methods. The finder path computes a CSS path for matched elements, allocates an `@e` element reference, and returns both selector and ref. Role and text finders are implemented in JavaScript against page DOM and ARIA-ish metadata rather than a browser-provider service.

Sources: [Sources/TerminalController.swift:13081-13165](), [Sources/TerminalController.swift:13167-13255]()

Stateful automation can inspect and mutate WebKit cookies, local storage, session storage, and browser tabs. Console and error commands read arrays installed by `BrowserPanel.telemetryHookBootstrapScriptSource`.

Sources: [Sources/Panels/BrowserPanel.swift:2335-2389](), [Sources/TerminalController.swift:14124-14360](), [Sources/TerminalController.swift:14362-14616]()

## CLI Shape And Agent Workflow

The CLI maps `cmux browser ...` onto the socket API and keeps legacy browser aliases routed to the v2 surface. The help text documents an agent-friendly loop: open, navigate, wait, snapshot, act, and optionally request `--snapshot-after`.

Sources: [CLI/cmux.swift:4458-4491](), [CLI/cmux.swift:9705-9762](), [CLI/cmux.swift:13229-13296](), [CLI/cmux.swift:29598-29634]()

The local `cmux-browser` skill file gives the same practical workflow: target a surface, verify URL, wait, snapshot with `--interactive`, act with refs, then re-snapshot after DOM or navigation changes. This is portable guidance: it is a file-backed repository skill, not a dependency on a hosted model provider.

Sources: [skills/cmux-browser/SKILL.md:10-29](), [skills/cmux-browser/SKILL.md:31-56](), [skills/cmux-browser/SKILL.md:81-90]()

## Tests And Gaps Worth Watching

Tests cover both in-process model behavior and socket/CLI behavior. `BrowserPanelTests` includes profile-isolation regression coverage for stale navigation callbacks, address-bar focus request semantics, and browser file-picker bridge behavior. `BrowserImportMappingTests` verifies import plan defaults, separate-profile mapping, duplicate-name handling, hint presentation, and profile creation/reuse behavior. The CLI regression test proves text-mode `browser console` and `browser errors` print captured entries rather than collapsing to `OK`.

Sources: [cmuxTests/BrowserPanelTests.swift:91-210](), [cmuxTests/BrowserPanelTests.swift:270-360](), [cmuxTests/BrowserImportMappingTests.swift:9-93](), [cmuxTests/BrowserImportMappingTests.swift:147-190](), [tests_v2/test_cli_browser_console_errors_text.py:75-145]()

The agent-browser port spec says the target is an LLM-friendly browser API with stable handles, public v2 terminology around `surface`, and meaningful parity with `agent-browser` where `WKWebView` supports it. The repository skill documents current WKWebView limits: viewport emulation, offline emulation, trace/screencast recording, network route interception, and low-level raw input injection are called out as unsupported.

Sources: [docs/agent-browser-port-spec.md:8-15](), [docs/agent-browser-port-spec.md:24-35](), [docs/agent-browser-port-spec.md:189-220](), [skills/cmux-browser/SKILL.md:114-123]()

## Productization Notes

The feature is BYOC/BYOK friendly because the automation boundary is local: WebKit, AppKit, socket methods, CLI commands, and repository skill files. Agents can call `cmux browser` or socket methods regardless of model provider. A Grok-Wiki integration should present this as portable browser-pane knowledge sourced from files, repositories, or cataloged skills, not as a proprietary connector flow.

The strongest reusable ideas are the portal-hosted WebKit sidecar, profile-backed authenticated surfaces, and snapshot/ref-driven CLI loop. Those three patterns let cmux act like an app-native browser for humans while still exposing deterministic automation hooks for agents. Sources: [Sources/BrowserWindowPortal.swift:4031-4140](), [Sources/Panels/BrowserPanel.swift:510-520](), [CLI/cmux.swift:29598-29634]()
