# The Agent Loop and Its Tools

> The agent loop is the careful helper: it asks a model provider for the next move, exposes browser and filesystem tools, records events, and stops only when a done, failure, or cancellation path is reached.

- Repository: browser-use/terminal
- GitHub: https://github.com/browser-use/terminal
- Human wiki: https://grok-wiki.com/public/wiki/browser-use-terminal-686510dbe50c
- Complete Markdown: https://grok-wiki.com/public/wiki/browser-use-terminal-686510dbe50c/llms-full.txt

## Source Files

- `crates/browser-use-core/src/lib.rs`
- `crates/browser-use-core/src/tools/mod.rs`
- `crates/browser-use-core/src/tools/command.rs`
- `crates/browser-use-core/src/tools/files.rs`
- `crates/browser-use-providers/src/lib.rs`
- `crates/browser-use-protocol/src/lib.rs`
- `prompts/browser-agent-system.md`
- `prompts/python-tool-description.md`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [crates/browser-use-core/src/lib.rs](crates/browser-use-core/src/lib.rs)
- [crates/browser-use-core/src/tools/mod.rs](crates/browser-use-core/src/tools/mod.rs)
- [crates/browser-use-core/src/tools/command.rs](crates/browser-use-core/src/tools/command.rs)
- [crates/browser-use-core/src/tools/files.rs](crates/browser-use-core/src/tools/files.rs)
- [crates/browser-use-providers/src/lib.rs](crates/browser-use-providers/src/lib.rs)
- [crates/browser-use-protocol/src/lib.rs](crates/browser-use-protocol/src/lib.rs)
- [prompts/browser-agent-system.md](prompts/browser-agent-system.md)
- [prompts/browser-script-tool-description.md](prompts/browser-script-tool-description.md)
- [prompts/python-tool-description.md](prompts/python-tool-description.md)
</details>

# The Agent Loop and Its Tools

This page explains the agent loop: the part of `browser-use/terminal` that asks a model provider what to do next, gives that model a controlled set of browser and local tools, records what happened, and stops only through a terminal path like `done`, failure, or cancellation.

Context note: the requested Compound Engineering page-shape guidance was used as synthesis guidance only. This checkout did not expose local `STRATEGY.md` or `docs/solutions/**` files, so implementation claims below stay grounded in repository code and prompts.

## The Simple Mental Model

Think of the agent as a careful helper with a notebook. On each turn it rereads the notebook, asks the model for the next move, runs only the tools the repo registered, writes the result back into the notebook, and repeats. The notebook is the session event stream; the tool menu is `ToolRegistry`; the model is any implementation of `ModelProvider`.

```text
User task + prior events
        |
        v
ProviderTurn { messages, tools }
        |
        v
ModelEvent stream: text, usage, tool calls, done
        |
        v
Tool dispatch + event recording
        |
        v
done / next turn / failed / cancelled
```

Sources: [crates/browser-use-core/src/lib.rs:796-852](), [crates/browser-use-core/src/lib.rs:920-1039](), [crates/browser-use-protocol/src/lib.rs:56-148]()

## Sessions, Events, And Status

A session has an id, cwd, artifact root, status, and timestamps. Status is intentionally small: `created`, `running`, `done`, `failed`, or `cancelled`. Only `created` and `running` are active. Events are generic records with sequence number, session id, event type, and JSON payload, so the runtime can store model deltas, tool starts, browser state, artifacts, failures, and final answers without hard-coding a separate table for each event kind.

The protocol crate also defines the shared contract for tools and model output: `ToolSpec`, `ToolCall`, `ToolResult`, image attachments, usage accounting, and `ModelEvent`. This is the language the loop, providers, UI projections, and tools all share.

Sources: [crates/browser-use-protocol/src/lib.rs:4-65](), [crates/browser-use-protocol/src/lib.rs:79-148](), [crates/browser-use-protocol/src/lib.rs:177-200]()

## Provider Neutrality: BYOC And BYOK Friendly

The loop does not call a single vendor API directly. It depends on a `ModelProvider` trait that accepts a `ProviderTurn` containing messages and tool specs, then returns model events. The core config can route to Codex, OpenAI, Anthropic, OpenRouter/OpenAI-compatible chat, fake, or none. API keys and base URLs come from store settings or environment variables, which keeps bring-your-own-key and compatible endpoint setups possible.

```rust
// crates/browser-use-providers/src/lib.rs
pub trait ModelProvider {
    fn provider_name(&self) -> &'static str { "unknown" }
    fn model_name(&self) -> &str { "unknown" }
    fn start_turn(&self, turn: ProviderTurn) -> Result<Vec<ModelEvent>>;
}
```

Sources: [crates/browser-use-providers/src/lib.rs:18-45](), [crates/browser-use-core/src/lib.rs:55-70](), [crates/browser-use-core/src/lib.rs:301-337](), [crates/browser-use-core/src/lib.rs:426-455](), [crates/browser-use-core/src/lib.rs:555-570]()

### Provider Adapters

| Adapter area | What it proves |
| --- | --- |
| OpenAI Responses | Uses a configurable base URL and sends tool specs when present. |
| OpenAI-compatible chat | Defaults to OpenRouter but accepts compatible base URLs and tool calling. |
| Anthropic Messages | Supports API key and OAuth-token style credentials with a configurable base URL. |
| Fake and Scripted | Let tests or local flows produce deterministic model events without a hosted provider. |

Sources: [crates/browser-use-providers/src/lib.rs:119-202](), [crates/browser-use-providers/src/lib.rs:205-295](), [crates/browser-use-providers/src/lib.rs:298-480](), [crates/browser-use-providers/src/lib.rs:47-116]()

## One Turn Of The Loop

At the start of a run, the core appends `session.status = running` and records the selected provider/model. For every turn up to `max_turns`, it checks cancellation, pulls in any external messages, normalizes and compacts context, prepares `ProviderTurn { messages, tools }`, and calls the provider with retry support.

Provider events are folded back into local state. Text deltas become `model.delta`; usage becomes `model.usage`; tool calls become `model.tool_call`. If the model produced no tool calls but did produce final text, the loop records `session.done`. If the model produced tool calls, the dispatcher runs them and appends tool-result messages for the next provider turn.

Sources: [crates/browser-use-core/src/lib.rs:778-852](), [crates/browser-use-core/src/lib.rs:920-959](), [crates/browser-use-core/src/lib.rs:967-1039](), [crates/browser-use-core/src/lib.rs:1116-1205]()

## The Registered Tool Surface

`ToolRegistry::browser_agent()` is the menu the model sees. It includes local command and file tools, browser runtime control, browser page scripting, completion, planning, image viewing, patching, and helper-agent coordination. The registry exposes `browser` and `browser_script`, and tests explicitly assert that the legacy `python` name is not exposed in the browser-agent tool specs, even though the dispatcher still has a compatibility path for a `python` call.

| Tool group | Examples | Main job |
| --- | --- | --- |
| Completion | `done` | End the user-facing task with text or a result file. |
| Browser runtime | `browser` | Connect, start, inspect, recover, and manage browser runtime. |
| Browser interaction | `browser_script` | Run Python against the Rust-held CDP connection. |
| Local workspace | `exec_command`, `write_stdin`, `read_file`, `search_files`, `list_files`, `apply_patch`, `view_image` | Inspect or change local files and processes. |
| Coordination | `update_plan`, `spawn_agent`, `wait_agent`, `send_input`, `send_message`, `followup_task`, `list_agents`, `close_agent` | Track longer work and manage helper sessions. |

Sources: [crates/browser-use-core/src/tools/mod.rs:6-78](), [crates/browser-use-core/src/tools/mod.rs:80-208](), [crates/browser-use-core/src/tools/mod.rs:311-375](), [crates/browser-use-core/src/tools/mod.rs:416-593](), [crates/browser-use-core/src/tools/mod.rs:652-662]()

## Browser Tools: Runtime Versus Page Work

The prompts draw a hard line between browser lifecycle and page interaction. `browser` is the control plane: status, connect, setup, doctor, recovery, profiles, logs, and ownership. `browser_script` is the page/data plane: navigation, inspection, clicks, typing, screenshots, downloads, uploads, network inspection, extraction, and browser-backed verification.

The current `browser_script` contract says each call starts a fresh Python process, Python variables do not persist across calls, browser/CDP state persists in Rust, helpers are preimported, and raw CDP is the fallback when helpers are incomplete. That keeps browser state durable while avoiding a hidden long-lived Python object model.

Sources: [prompts/browser-agent-system.md:1-14](), [prompts/browser-agent-system.md:25-43](), [prompts/browser-script-tool-description.md:1-15](), [prompts/browser-script-tool-description.md:17-72](), [crates/browser-use-core/src/lib.rs:2928-2987](), [crates/browser-use-core/src/lib.rs:3263-3321]()

## Local Tools And Event Recording

Local command execution can either finish immediately or return a command session id for later `write_stdin`. Both paths record `tool.started`, command-specific events, and `tool.finished`. File tools follow the same pattern through `run_file_tool`: record start, run the operation, record finish, or record `tool.failed`.

The file tools are intentionally practical: `read_file` supports line and byte limits; `search_files` prefers `rg --json` and falls back when needed; `list_files` respects ignore files; `view_image` records an image artifact; `apply_patch` parses and applies Codex-style patches while recording changed files.

Sources: [crates/browser-use-core/src/tools/command.rs:62-210](), [crates/browser-use-core/src/tools/command.rs:213-326](), [crates/browser-use-core/src/tools/files.rs:23-101](), [crates/browser-use-core/src/tools/files.rs:103-170](), [crates/browser-use-core/src/tools/files.rs:173-253](), [crates/browser-use-core/src/tools/files.rs:256-361](), [crates/browser-use-core/src/tools/files.rs:364-404]()

## Parallelism Is Narrow And Deliberate

The agent can run a batch of safe read-oriented tool calls in parallel, but not every tool is parallel-safe. The dispatcher groups adjacent parallel-capable calls, records `tool.batch_started`, runs each in a thread with its own store handle, records individual batch results, then records `tool.batch_finished`. Parallel eligibility is limited to file reads/search/listing and known read-only commands. Browser work, mutation, stdin, patching, planning, and helper coordination stay ordered because they touch shared state.

Sources: [crates/browser-use-core/src/lib.rs:2429-2565](), [crates/browser-use-core/src/lib.rs:2699-2724](), [prompts/browser-agent-system.md:16-20](), [crates/browser-use-core/src/tools/mod.rs:281-309]()

## Completion, Failure, And Cancellation

There are three normal ways out of the loop. `done` validates that it has either a non-empty result or a readable `result_file`, optionally records the file as an artifact, appends `session.done`, and returns a finished dispatch outcome. Failures append `session.failed`, either when the run exceeds the provider turn budget or when an unrecovered error reaches the run wrapper. Cancellation is checked before and after provider/tool work; cancelled sessions finish the run as `cancelled`.

The protocol projection code reads those events back into user-visible state. `result_from_events` prefers the latest `session.done` or helper completion, while `failure_from_events` reads the latest `session.failed`.

Sources: [crates/browser-use-core/src/lib.rs:1035-1113](), [crates/browser-use-core/src/lib.rs:1447-1467](), [crates/browser-use-core/src/lib.rs:3417-3517](), [crates/browser-use-protocol/src/lib.rs:262-327](), [crates/browser-use-protocol/src/lib.rs:419-430]()

## Why This Shape Matters

The useful boundary is simple: providers decide the next move, tools execute bounded local or browser actions, and events make the whole run reconstructable. Because providers are adapters around `ModelProvider`, the architecture can stay portable across hosted APIs, compatible endpoints, stored credentials, environment-provided keys, and deterministic fake providers. The loop is not vendor-owned; it is event-owned.

Sources: [crates/browser-use-providers/src/lib.rs:18-45](), [crates/browser-use-core/src/lib.rs:301-337](), [crates/browser-use-protocol/src/lib.rs:56-148]()
