# Explain It Simply — What Is RLM Code?

> RLM Code in plain language: what problem it solves, the one core idea to keep, and what you will find when you open the repo for the first time.

- Repository: SuperagenticAI/rlm-code
- GitHub: https://github.com/SuperagenticAI/rlm-code
- Human wiki: https://grok-wiki.com/public/wiki/superagenticai-rlm-code-8e144acefc91
- Complete Markdown: https://grok-wiki.com/public/wiki/superagenticai-rlm-code-8e144acefc91/llms-full.txt

## Source Files

- `README.md`
- `pyproject.toml`
- `rlm_code/main.py`
- `rlm_code/__main__.py`
- `rlm_code/commands/run_command.py`
- `rlm_code/commands/slash_commands.py`

---

<details>
<summary>Relevant source files</summary>

The following files were used as context for generating this wiki page:

- [README.md](README.md)
- [pyproject.toml](pyproject.toml)
- [rlm_code/main.py](rlm_code/main.py)
- [rlm_code/commands/run_command.py](rlm_code/commands/run_command.py)
- [rlm_code/commands/slash_commands.py](rlm_code/commands/slash_commands.py)
- [rlm_code/rlm/pure_rlm_environment.py](rlm_code/rlm/pure_rlm_environment.py)
- [rlm_code/rlm/runner.py](rlm_code/rlm/runner.py)
- [rlm_code/rlm/environments.py](rlm_code/rlm/environments.py)
- [rlm_code/rlm/benchmarks.py](rlm_code/rlm/benchmarks.py)
- [rlm_code/rlm/termination.py](rlm_code/rlm/termination.py)
- [rlm_code/rlm/trajectory.py](rlm_code/rlm/trajectory.py)
- [rlm_code/sandbox/runtimes/docker_runtime.py](rlm_code/sandbox/runtimes/docker_runtime.py)
</details>

# Explain It Simply — What Is RLM Code?

RLM Code is a command-line research tool for running, benchmarking, and replaying LLM-powered agents that solve tasks by writing and executing code — iteration by iteration — rather than trying to process everything in one large prompt. It wraps the **Recursive Language Models (RLM)** algorithm from a 2025 research paper in an interactive terminal UI with built-in evaluation, trajectory replay, and support for any LLM provider you bring.

This page is for someone opening the repository for the first time who wants to understand the one core idea before exploring the code, and know what each major part of the project does.

---

## The Problem RLM Solves

Think about what normally happens when you ask an LLM to analyze a large document. You paste the whole thing into the prompt and hope the model holds all the details in mind simultaneously. For anything bigger than the model's context window — a long codebase, a 500-page PDF, a large JSONL trace file — this breaks down: details get dropped, the model loses focus, and token costs explode.

**RLM's answer:** don't put the document in the prompt. Instead, store it as a Python variable in a sandboxed REPL, give the model just a short description (length, a preview), and let it write code to read and process the data in manageable chunks. The model sees only what its code surfaces, builds up intermediate results in buffer variables, and finally calls `FINAL("my answer")` to terminate.

This is token-efficient and scales to inputs far larger than any model's context window.

---

## The One Core Idea: A REPL Loop With a Context Variable

The system prompt injected in `rlm_code/rlm/pure_rlm_environment.py` makes the design explicit:

```
The REPL environment is initialized with:
1. A `context` variable that contains extremely important information about your query.
2. A `llm_query` function that allows you to query an LLM inside your REPL environment.
3. A `llm_query_batched` function for concurrent multi-prompt queries.
4. A `SHOW_VARS()` function ...
5. print() statements to view output and continue reasoning.
6. A `buffers` list for accumulating intermediate findings across iterations.
```

The model writes Python, the REPL runs it, the model sees the output, and the cycle repeats until it calls `FINAL(answer)`. That is the entire algorithm.

Sources: [rlm_code/rlm/pure_rlm_environment.py:186-198]()

---

## What Happens When You Run a Task

```text
/rlm run "Summarize the key findings" context=report.pdf steps=6
```

Here is what the loop does at each step:

```text
┌──────────────────────────────────────────────────────┐
│  User task:  "Summarize the key findings"            │
│  context =  <large document stored as variable>      │
└───────────────────────┬──────────────────────────────┘
                        │
              ┌─────────▼─────────┐
              │  LLM proposes     │  ← model writes Python code
              │  next action      │    e.g. context[:5000]
              └─────────┬─────────┘
                        │
              ┌─────────▼─────────┐
              │  Sandbox REPL     │  ← Docker / local runtime
              │  executes code    │    executes the code safely
              └─────────┬─────────┘
                        │
              ┌─────────▼─────────┐
              │  Observation      │  ← output of the code is fed
              │  fed back to LLM  │    back as next prompt context
              └─────────┬─────────┘
                        │
              ┌─────────▼─────────┐
              │  Repeat or        │  ← repeat until FINAL() called
              │  FINAL(answer)    │    or step/timeout limit hit
              └───────────────────┘
```

The runner loop in `rlm_code/rlm/runner.py` drives this cycle. Each iteration is logged as a `TrajectoryEventType` (e.g. `ITERATION_CODE`, `ITERATION_OUTPUT`, `LLM_RESPONSE`) to a JSONL file for replay and analysis.

Sources: [rlm_code/rlm/runner.py:1-10](), [rlm_code/rlm/trajectory.py:33-50]()

---

## What You Find When You Open the Repo

### Top-level structure

```
rlm_code/
├── main.py             ← CLI entry point; launches Textual TUI
├── commands/           ← /run, slash_commands handlers
├── rlm/                ← Core RLM engine
│   ├── runner.py       ← Drives the context→action→exec→reward loop
│   ├── pure_rlm_environment.py  ← Paper-faithful REPL + security sandbox
│   ├── environments.py ← GenericRLMEnvironment, DSPyCodingRLMEnvironment, TraceAnalysisEnvironment
│   ├── benchmarks.py   ← Preset benchmark cases (pure_rlm_smoke, dspy_quick, …)
│   ├── benchmark_manager.py     ← Runs benches, scores, compares
│   ├── termination.py  ← FINAL(), FINAL_VAR(), SUBMIT() control-flow
│   ├── trajectory.py   ← JSONL trace logging + replay
│   ├── policies/       ← Reward, termination, action, compaction policies
│   └── frameworks/     ← DSPy, Pydantic AI, Google ADK adapters
├── ui/                 ← Textual TUI (tabs, chat input, Research view)
├── models/             ← LLM provider connectors (Anthropic, OpenAI, Gemini, Ollama)
├── sandbox/            ← Sandboxed execution backends
│   ├── superbox.py     ← Runtime selector with fallback chain
│   └── runtimes/       ← Docker, local, Monty, cloud (Daytona, E2B, Modal), Apple Container
├── harness/            ← Tool-using coding agent harness (/harness run …)
└── mcp/                ← MCP server and client for tool integration
```

Sources: [README.md:382-390](), [rlm_code/rlm/environments.py:122-145]()

---

## The Three Execution Modes

| Mode | Command | What it does |
|------|---------|-------------|
| **Pure RLM** | `/rlm run "task" env=pure_rlm` | Paper-faithful: context as variable, `llm_query()`, `FINAL()` termination |
| **DSPy Coding** | `/rlm run "task" env=dspy` | Writes DSPy modules; uses Docker REPL with verifier scoring |
| **Harness / Coding Agent** | `/harness run "task" steps=8` | Tool-using loop (like Claude Code); supports MCP, reads/writes project files |

The `pure_rlm` environment enforces strict security: `eval()`, `exec()`, `subprocess`, `os.system()`, `__import__()`, and several other builtins are statically blocked before code is run in the REPL.

Sources: [rlm_code/rlm/pure_rlm_environment.py:143-182]()

---

## The Termination Contract

The LLM must call one of these functions from within its REPL code to end a run:

```python
# Direct string or dict answer
FINAL("Here is my answer")

# Reference a REPL variable you built up
FINAL_VAR("results_buffer")

# Typed multi-field output (DSPy-style)
SUBMIT(answer="...", confidence=0.9)
```

Each raises a Python exception (`FinalOutput`, `SubmitOutput`) that the runner catches, records, and uses as the run's final result. This means the LLM cannot accidentally exit early with an incomplete answer — it must use the explicit termination API.

Sources: [rlm_code/rlm/termination.py:17-55]()

---

## Benchmarks and Comparison

RLM Code ships with named preset benchmark packs:

| Preset | Description |
|--------|-------------|
| `pure_rlm_smoke` | 3 cases testing the paper-compliant RLM mode |
| `dspy_quick` | 3 DSPy coding loop smoke tests |
| `oolong_style` | 4 long-context benchmarks (paper-compatible) |
| `paradigm_comparison` | Side-by-side RLM vs CodeAct vs Traditional |
| `token_efficiency` | Token efficiency comparison benchmarks |

You run them with `/rlm bench preset=<name>`, compare two runs with `/rlm bench report candidate=latest baseline=previous`, and replay any individual run step-by-step with `/rlm replay <run_id>`.

Sources: [rlm_code/rlm/benchmarks.py:26-40]()

---

## Provider and Sandbox Flexibility (BYOK/BYOC)

RLM Code is provider-neutral. You connect whichever LLM you have access to:

```
/connect anthropic claude-opus-4-6
/connect openai gpt-5.3-codex
/connect gemini gemini-2.5-flash
/connect ollama llama3.2        ← free, no API key, runs locally
```

The `[llm-all]` install extra pulls the Anthropic, OpenAI, and Google client libraries; `[tui]` adds Textual for the terminal UI. Each extra is optional.

Sandbox execution is similarly flexible. The `superbox.py` runtime selector tries Docker first, then falls back through Daytona, E2B, Modal, Monty, and a local command runtime — configurable in `rlm_config.yaml`:

```yaml
sandbox:
  runtime: docker
  superbox_auto_fallback: true
  superbox_fallback_runtimes: [docker, daytona, e2b]
```

Sources: [pyproject.toml:79-93](), [README.md:60-70](), [README.md:352-363]()

---

## Safety Guardrails Built In

Two layers of safety are enforced before the project even runs code:

1. **Directory safety check** (`rlm_code/main.py`): On startup, RLM Code refuses to run from your home directory, `~/Desktop`, `~/Documents`, `/System`, `/usr`, or other sensitive paths. This prevents an agent from accidentally scanning personal files.

2. **Code pattern scanner** (`rlm_code/rlm/pure_rlm_environment.py`): Before executing any LLM-written code in the REPL, a static scanner blocks `os.system()`, `subprocess`, `eval()`, `exec()`, `__import__()`, `globals()`, and several other escape hatches.

For cost and runtime bounds, every run accepts `steps=N timeout=S budget=B` parameters, and `/rlm abort all` cancels any active run cooperatively.

Sources: [rlm_code/main.py:25-62](), [rlm_code/rlm/pure_rlm_environment.py:145-182]()

---

## Closing Summary

RLM Code is a research playground for the Recursive Language Models paradigm: instead of pasting data into a prompt, the LLM interacts with data as a Python variable through a secure REPL loop, calling `FINAL()` when done. The repository delivers this as a terminal UI application with built-in benchmarks, trajectory logging, multi-provider support, and a pluggable sandbox layer — making it practical to run experiments, compare models and approaches, and replay agent behavior step by step. The core loop lives in `rlm_code/rlm/runner.py` and `rlm_code/rlm/pure_rlm_environment.py`, which together implement the paper's context-as-variable, `llm_query()`, and `FINAL()` termination semantics.

Sources: [rlm_code/rlm/pure_rlm_environment.py:1-13]()
