# The Browser Driver

> The browser layer is the remote-control box: Rust owns CDP connections and browser lifecycle, while Python helper code runs page scripts, collects artifacts, and reports browser events without forcing one model vendor.

- Repository: browser-use/terminal
- GitHub: https://github.com/browser-use/terminal
- Human wiki: https://grok-wiki.com/public/wiki/browser-use-terminal-686510dbe50c
- Complete Markdown: https://grok-wiki.com/public/wiki/browser-use-terminal-686510dbe50c/llms-full.txt

## Source Files

- `crates/browser-use-browser/src/lib.rs`
- `crates/browser-use-browser/src/browser_script_helpers.py`
- `crates/browser-use-python-worker/src/lib.rs`
- `python/llm_browser_worker/worker.py`
- `prompts/browser-tool-description.md`
- `prompts/browser-script-tool-description.md`
- `prompts/interaction-skills/connection.md`
- `prompts/interaction-skills/profile-sync.md`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [crates/browser-use-browser/src/lib.rs](crates/browser-use-browser/src/lib.rs)
- [crates/browser-use-browser/src/browser_script_helpers.py](crates/browser-use-browser/src/browser_script_helpers.py)
- [crates/browser-use-python-worker/src/lib.rs](crates/browser-use-python-worker/src/lib.rs)
- [python/llm_browser_worker/worker.py](python/llm_browser_worker/worker.py)
- [prompts/browser-tool-description.md](prompts/browser-tool-description.md)
- [prompts/browser-script-tool-description.md](prompts/browser-script-tool-description.md)
- [prompts/interaction-skills/connection.md](prompts/interaction-skills/connection.md)
- [prompts/interaction-skills/profile-sync.md](prompts/interaction-skills/profile-sync.md)
- [crates/browser-use-core/src/lib.rs](crates/browser-use-core/src/lib.rs)
- [crates/browser-use-browser/Cargo.toml](crates/browser-use-browser/Cargo.toml)
- [crates/browser-use-python-worker/Cargo.toml](crates/browser-use-python-worker/Cargo.toml)
</details>

# The Browser Driver

The browser driver is the remote-control box for web work in `browser-use/terminal`. Rust keeps the durable browser state: which browser is connected, which CDP websocket is open, which tab target is current, and which recovery actions are safe. Python is used for the small scripts that inspect pages, click, type, collect screenshots, and return artifacts.

A simple way to picture it: Rust holds the remote; Python presses the buttons for one task. This matters because a model can change, a user can bring their own browser, and a deployment can bring its own CDP endpoint without rewriting the page-interaction API.

Generation note: requested strategy and solved-problem source classes were not present in this checkout. The Compound Engineering guidance was available as bundled page-shape metadata for this wiki run, not as an installed local skill execution. Repository code and prompts remain the cited source of truth.

## The Main Split

The repository names the split directly: `browser` controls connection, lifecycle, and debug state; `browser_script` runs Python page interaction through the Rust-held CDP connection. The Rust output types also show the boundary: browser commands return command content plus browser events, while browser scripts return text, artifacts, images, and browser events.

Sources: [crates/browser-use-browser/src/lib.rs:1-6](), [crates/browser-use-browser/src/lib.rs:33-54](), [prompts/browser-tool-description.md:23-30](), [prompts/browser-script-tool-description.md:1-15]()

```text
model/tool call
    |
    | browser cmd                browser_script code
    v                           v
Rust browser session       fresh Python process
CDP websocket              helper functions
target/session ids         page JS, clicks, screenshots
browser lifecycle          artifacts/images/output
```

### What Rust Owns

Rust stores the session-level control state in `BrowserSession`: mode, owner, endpoint, CDP connection, current target id, current session id, connection generation, managed process handle, remote browser id, live URL, last errors, and logs. That state is in an in-process session registry keyed by session id.

Sources: [crates/browser-use-browser/src/lib.rs:146-192](), [crates/browser-use-browser/src/lib.rs:194-218]()

### What Python Owns

In the Rust `browser_script` path, Python is fresh per call. Rust builds a Python prelude, injects helper code, runs the user script, auto-collects newly written files, and emits one JSON result marker back to Rust. The prompt repeats the same model-facing rule: Python variables do not persist across `browser_script` calls, while browser/CDP state persists in Rust.

Sources: [crates/browser-use-browser/src/lib.rs:220-344](), [crates/browser-use-browser/src/lib.rs:2782-2974](), [prompts/browser-script-tool-description.md:7-16]()

## Browser Modes And Ownership

The driver supports more than one browser source. This is the key BYOC shape: the user can attach to an existing local Chromium-family browser, let Rust launch a managed Chromium, connect to an external CDP endpoint, or start a Browser Use cloud browser. Ownership decides what Rust may safely stop or restart.

Sources: [crates/browser-use-browser/src/lib.rs:56-92](), [crates/browser-use-browser/src/lib.rs:438-468](), [crates/browser-use-browser/src/lib.rs:630-686]()

| Mode | How it connects | Owner | Safe actions |
|---|---|---:|---|
| `local` | Finds a running browser exposing `DevToolsActivePort` or a known local CDP port | External | Rust attaches, but does not kill the user browser |
| `managed` | Launches Chromium with a temp or explicit automation profile | Rust | Rust can stop or restart it |
| `remote-cdp` | Connects to an external DevTools HTTP URL or websocket | External | Rust reconnects to the endpoint, but does not own the browser |
| `remote-cloud` | Starts Browser Use cloud through API and connects to its CDP URL | Rust | Rust can stop the cloud browser it created |

`browser status --json` exposes this ownership back to the agent, including `safety.can_restart_browser`, `safety.can_close_browser`, and `safety.can_stop_remote`. `browser runtime ownership --json` gives a more direct safe-action view before stopping anything.

Sources: [crates/browser-use-browser/src/lib.rs:630-686](), [prompts/browser-tool-description.md:59-72](), [prompts/browser-tool-description.md:80-87]()

## CDP Is The Wire

CDP, the Chrome DevTools Protocol, is the wire between the terminal and the browser. Rust opens a websocket, sends messages with incrementing ids, optionally includes a CDP `sessionId`, and waits for the matching response. If a CDP call fails, Rust records the error, classifies it, clears the live connection, and remembers the last target/session ids for recovery.

Sources: [crates/browser-use-browser/src/lib.rs:1183-1201](), [crates/browser-use-browser/src/lib.rs:1280-1324](), [crates/browser-use-browser/src/lib.rs:1350-1395]()

```rust
// crates/browser-use-browser/src/lib.rs
fn call(&mut self, method: &str, session_id: Option<&str>, params: Value) -> Result<Value> {
    let id = self.next_id;
    self.next_id += 1;
    let mut message = json!({ "id": id, "method": method, "params": params });
    if let Some(session_id) = session_id {
        message["sessionId"] = Value::String(session_id.to_string());
    }
    self.socket.send(Message::Text(serde_json::to_string(&message)?))?;
    // waits until the response with the same id arrives
}
```

The browser-script bridge is deliberately narrow. Python sends JSON requests like `{"kind":"cdp","method":"Page.navigate"}` to a localhost bridge. Rust temporarily removes the session from the registry while handling the bridge request, runs the CDP call, then puts the session back. That prevents two page scripts from mutating the same CDP session at the same instant.

Sources: [crates/browser-use-browser/src/lib.rs:2637-2692](), [crates/browser-use-browser/src/lib.rs:2694-2721](), [crates/browser-use-browser/src/lib.rs:2723-2780]()

## Tabs, Targets, And The Invisible-Tab Problem

Chrome exposes tabs and some internal surfaces as CDP targets. The driver tries to attach to a real page target first. If no real page exists, Rust creates an `about:blank` tab and attaches to that. This avoids a common failure where automation accidentally attaches to `chrome://omnibox-popup.top-chrome/`, a tiny invisible page target.

Sources: [crates/browser-use-browser/src/lib.rs:1203-1225](), [crates/browser-use-browser/src/lib.rs:2978-2988](), [prompts/interaction-skills/connection.md:3-15]()

Python helpers make tab work explicit:

```python
# crates/browser-use-browser/src/browser_script_helpers.py
tabs = list_tabs(include_chrome=False)
tab = ensure_real_tab()
switch_tab(tab["target_id"])
goto_url("https://example.com")
```

`list_tabs()` reads `Target.getTargets`, `switch_tab()` activates and attaches to a target, and `new_tab()` creates a blank target first before navigating to avoid attach/load races.

Sources: [crates/browser-use-browser/src/browser_script_helpers.py:219-264](), [crates/browser-use-browser/src/browser_script_helpers.py:267-285](), [prompts/interaction-skills/connection.md:34-41]()

## Page Interaction Helpers

The helper layer turns CDP into simple page actions. `cdp()` is still available as the source of truth, but normal work uses helpers such as `js()`, `goto_url()`, `page_info()`, `wait_for_element()`, `screenshot()`, `click_at_xy()`, `fill_input()`, `press_key()`, `scroll()`, and `upload_file()`.

Sources: [crates/browser-use-browser/src/browser_script_helpers.py:21-45](), [crates/browser-use-browser/src/browser_script_helpers.py:167-202](), [crates/browser-use-browser/src/browser_script_helpers.py:299-356](), [crates/browser-use-browser/src/browser_script_helpers.py:369-541](), [prompts/browser-script-tool-description.md:17-59]()

| Helper group | Examples | What it really does |
|---|---|---|
| Raw protocol | `cdp()`, `cdp_batch()` | Sends CDP methods through Rust |
| DOM and JS | `js()`, `page_info()` | Runs `Runtime.evaluate` and returns JSON-like values |
| Navigation | `goto_url()`, `new_tab()`, `switch_tab()` | Uses `Page.navigate` and `Target.*` |
| Input | `click_at_xy()`, `type_text()`, `press_key()`, `fill_input()` | Dispatches CDP mouse and keyboard events |
| Evidence | `screenshot()`, `screenshot_clip()`, `copy_artifact()` | Writes files/images into the artifact result |

## Artifacts, Images, And Events

The Rust `browser_script` path scans the artifact directory and output directory before and after Python runs. New files are automatically reported as artifacts, and screenshots can be emitted as image records. This lets page scripts produce evidence without depending on a particular model provider.

Sources: [crates/browser-use-browser/src/lib.rs:2782-2825](), [crates/browser-use-browser/src/lib.rs:2858-2915](), [crates/browser-use-browser/src/lib.rs:2961-2973]()

There is also a longer-lived Python worker for the general Python tool surface. It sends JSON requests over stdin/stdout, streams host-helper events before the final response, copies artifacts into `files` or `images`, and records browser events such as `browser.state`, `browser.connected`, `browser.reconnected`, and `browser.target_changed`.

Sources: [crates/browser-use-python-worker/src/lib.rs:298-372](), [python/llm_browser_worker/worker.py:593-641](), [python/llm_browser_worker/worker.py:685-772](), [python/llm_browser_worker/worker.py:1495-1537](), [python/llm_browser_worker/worker.py:1570-1613]()

The core runtime records those events into the session store. Browser command events, browser-script response events, Python worker output, images, artifacts, and browser events all pass through explicit record functions rather than being hidden in transcript text.

Sources: [crates/browser-use-core/src/lib.rs:2928-2987](), [crates/browser-use-core/src/lib.rs:3263-3315](), [crates/browser-use-core/src/lib.rs:5109-5150]()

## Profiles And Cookies

Local profile handling is intentionally conservative. The browser tool can list local Chromium-family profiles with Rust filesystem discovery, inspect a selected profile through CDP, and return cookie domain/count/expiry summaries. It does not return raw cookie values by default.

Sources: [crates/browser-use-browser/src/lib.rs:1838-1876](), [crates/browser-use-browser/src/lib.rs:1878-1934](), [prompts/browser-tool-description.md:50-58]()

Cloud profile sync from local Chrome is not part of the current terminal release. The documented flow is: use local Chrome by attaching to an already-open browser, or use an existing Browser Use cloud profile by id/name. Do not assume local-to-cloud cookie copying works.

Sources: [prompts/interaction-skills/profile-sync.md:1-13](), [prompts/interaction-skills/profile-sync.md:14-26]()

## Recovery Model

Recovery is explicit because browser automation can go stale in several different ways. A websocket can drop, a target can close, a session id can change, or a user-owned browser can stop exposing CDP. The driver reports `next_step` values and safe actions instead of silently switching tabs, relaunching browsers, or killing external Chrome.

Sources: [crates/browser-use-browser/src/lib.rs:688-708](), [crates/browser-use-browser/src/lib.rs:1009-1074](), [crates/browser-use-browser/src/lib.rs:1096-1181](), [prompts/browser-tool-description.md:74-87]()

| Symptom | Likely state | Recovery path |
|---|---|---|
| No endpoint configured | `not-configured` | Connect local, managed, or remote |
| CDP port is stale | `stale-port` / `browser-closed` | Reopen Chrome/profile, then reconnect |
| Websocket dropped | `websocket-dropped` | `browser recover reconnect-websocket` |
| Current target disappeared | `target-gone` | List/switch tabs or open a new tab |
| Session id changed | `browser.reconnected` | Treat old JS object ids as stale |

Tests lock down this behavior: status includes recovery fields, browser events are transition-based rather than heartbeat spam, recovery without a configured endpoint fails without side effects, script timeouts become tool failures, and the bridge handles large JSON responses.

Sources: [crates/browser-use-browser/src/lib.rs:3136-3192](), [crates/browser-use-browser/src/lib.rs:3216-3277](), [crates/browser-use-browser/src/lib.rs:3279-3315]()

## Provider-Neutral Shape

The browser driver is not tied to a model vendor. The browser crate depends on general Rust libraries for JSON, HTTP, websockets, temp files, and opening local URLs; it does not depend on a model SDK. The Python worker crate likewise depends on process/JSON support, not a model provider.

Sources: [crates/browser-use-browser/Cargo.toml:7-15](), [crates/browser-use-python-worker/Cargo.toml:7-12]()

At the core runtime level, provider/model configuration is recorded separately from browser worker startup. The worker gets browser-mode and environment settings, while model provider information is appended as its own `model.config` event. That separation is what keeps BYOC/BYOK practical: bring your own browser or CDP endpoint, bring your own model/provider key elsewhere, and keep the browser interface the same.

Sources: [crates/browser-use-core/src/lib.rs:760-790](), [crates/browser-use-core/src/lib.rs:3132-3189]()

Remote Browser Use cloud is an optional browser backend, not a required model provider. It needs `BROWSER_USE_API_KEY` only for cloud browsers/profiles; local, managed, and external remote-CDP modes remain available without that cloud browser path.

Sources: [crates/browser-use-browser/src/lib.rs:893-958](), [crates/browser-use-browser/src/lib.rs:2600-2627](), [prompts/browser-tool-description.md:66-72]()

## Practical Mental Checklist

When adding or debugging browser work, ask these questions in order:

1. Is this lifecycle work or page work? Use `browser` for lifecycle, `browser_script` for page interaction.
2. Who owns the browser? Only Rust-owned managed/cloud browsers can be stopped or restarted by Rust.
3. Is the current target real and visible? If not, use `list_tabs()`, `ensure_real_tab()`, or `switch_tab()`.
4. Did the websocket, target, or session change? Treat old object ids as stale after reconnect/target-change events.
5. Is the evidence saved? Use screenshots, artifacts, and `audit_artifact()` when the result needs verification.

Sources: [prompts/browser-tool-description.md:1-4](), [prompts/browser-tool-description.md:132-132](), [prompts/browser-script-tool-description.md:61-72](), [python/llm_browser_worker/worker.py:988-1051](), [python/llm_browser_worker/worker.py:1053-1339]()

## Summary

The browser driver works because it keeps ownership clear. Rust holds browser lifecycle, CDP connection state, recovery state, and safe-action rules. Python helpers perform page-level work and package evidence as text, events, artifacts, and images. That boundary keeps the browser layer portable across local browsers, managed Chromium, external CDP endpoints, and optional cloud browsers without forcing one model vendor or one hosted execution path.

Sources: [crates/browser-use-browser/src/lib.rs:1-6](), [crates/browser-use-browser/src/lib.rs:988-1006](), [crates/browser-use-browser/src/lib.rs:2637-2780](), [prompts/browser-script-tool-description.md:7-16]()
