# MiroFish Socratic Technical Wiki

> A first-principles map of MiroFish: how uploaded seed material becomes a knowledge graph, agent society, live simulation, and report-driven interaction. This structure is shaped by the requested bundled Compound Engineering wiki workflow, while repository code remains the implementation source of truth; no STRATEGY.md, docs/solutions, or generated wiki context was present in this checkout.

## Context Links

- [Agent index](https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/llms.txt)
- [Human interactive wiki](https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9)
- [GitHub repository](https://github.com/666ghj/MiroFish)

## Repository Metadata

- Repository: 666ghj/MiroFish

- Generated: 2026-05-24T18:12:30.327Z
- Updated: 2026-05-24T18:13:22.607Z
- Runtime: Codex CLI
- Format: Socratic Exploration
- Pages: 5

## Page Index

- 01. [The First Question](https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/01-the-first-question.md) - What is the smallest version of MiroFish: a user question, some seed files, and a provider-neutral runtime that can turn them into a simulated social world?
- 02. [Why Is Uploading Files Not Enough?](https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/02-why-is-uploading-files-not-enough.md) - The graph-building page asks why raw documents must become ontology, chunks, tasks, and Zep graph memory before they can support simulation.
- 03. [How Does a Graph Become a Society?](https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/03-how-does-a-graph-become-a-society.md) - The preparation stage reframes entities as agents: which nodes are eligible, what profile fields OASIS needs, and where LLM-generated configuration becomes platform-specific behavior without hard-coding one model provider.
- 04. [Why Is Running the Simulation a Process Boundary?](https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/04-why-is-running-the-simulation-a-process-boundary.md) - The run stage asks why live simulation leaves Flask request handling: subprocesses, file-based IPC, action logs, SQLite traces, and optional graph-memory updates create the observable boundary.
- 05. [What Can You Now Ask the Sandbox?](https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/05-what-can-you-now-ask-the-sandbox.md) - The closing reframe: once graph memory, simulated traces, and agent interviews exist, the report system can reason with tools, assemble evidence section by section, and keep the UI portable across file, repository, or catalog skill sources rather than a single hosted provider.

## Source File Index

- `backend/app/__init__.py`
- `backend/app/api/graph.py`
- `backend/app/api/report.py`
- `backend/app/api/simulation.py`
- `backend/app/config.py`
- `backend/app/models/project.py`
- `backend/app/models/task.py`
- `backend/app/services/graph_builder.py`
- `backend/app/services/oasis_profile_generator.py`
- `backend/app/services/ontology_generator.py`
- `backend/app/services/report_agent.py`
- `backend/app/services/simulation_config_generator.py`
- `backend/app/services/simulation_ipc.py`
- `backend/app/services/simulation_manager.py`
- `backend/app/services/simulation_runner.py`
- `backend/app/services/text_processor.py`
- `backend/app/services/zep_entity_reader.py`
- `backend/app/services/zep_graph_memory_updater.py`
- `backend/app/services/zep_tools.py`
- `backend/app/utils/file_parser.py`
- `backend/pyproject.toml`
- `backend/run.py`
- `backend/scripts/action_logger.py`
- `backend/scripts/run_parallel_simulation.py`
- `backend/scripts/run_reddit_simulation.py`
- `backend/scripts/run_twitter_simulation.py`
- `backend/scripts/test_profile_format.py`
- `frontend/src/api/report.js`
- `frontend/src/api/simulation.js`
- `frontend/src/components/Step1GraphBuild.vue`
- `frontend/src/components/Step2EnvSetup.vue`
- `frontend/src/components/Step3Simulation.vue`
- `frontend/src/components/Step4Report.vue`
- `frontend/src/components/Step5Interaction.vue`
- `frontend/src/router/index.js`
- `frontend/src/views/InteractionView.vue`
- `package.json`
- `README.md`

---

## 01. The First Question

> What is the smallest version of MiroFish: a user question, some seed files, and a provider-neutral runtime that can turn them into a simulated social world?

- Page Markdown: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/01-the-first-question.md
- Generated: 2026-05-24T18:11:10.794Z

### Source Files

- `README.md`
- `package.json`
- `backend/pyproject.toml`
- `backend/run.py`
- `backend/app/__init__.py`
- `backend/app/config.py`
- `frontend/src/router/index.js`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [package.json](package.json)
- [backend/pyproject.toml](backend/pyproject.toml)
- [backend/run.py](backend/run.py)
- [backend/app/__init__.py](backend/app/__init__.py)
- [backend/app/config.py](backend/app/config.py)
- [backend/app/api/graph.py](backend/app/api/graph.py)
- [backend/app/api/simulation.py](backend/app/api/simulation.py)
- [backend/app/models/project.py](backend/app/models/project.py)
- [backend/app/services/simulation_manager.py](backend/app/services/simulation_manager.py)
- [backend/app/services/simulation_runner.py](backend/app/services/simulation_runner.py)
- [backend/app/utils/llm_client.py](backend/app/utils/llm_client.py)
- [backend/app/services/ontology_generator.py](backend/app/services/ontology_generator.py)
- [backend/app/services/simulation_config_generator.py](backend/app/services/simulation_config_generator.py)
- [frontend/src/router/index.js](frontend/src/router/index.js)
- [frontend/src/views/Home.vue](frontend/src/views/Home.vue)
- [frontend/src/views/MainView.vue](frontend/src/views/MainView.vue)
- [frontend/src/components/Step1GraphBuild.vue](frontend/src/components/Step1GraphBuild.vue)
- [frontend/src/components/Step2EnvSetup.vue](frontend/src/components/Step2EnvSetup.vue)
- [frontend/src/api/graph.js](frontend/src/api/graph.js)
- [frontend/src/api/simulation.js](frontend/src/api/simulation.js)
</details>

# The First Question

What is the smallest useful MiroFish? Not the full promise of prediction, reporting, and deep interaction, but the first irreducible loop: a user asks a simulation question, uploads seed files, and a backend turns that material into agents, memory, configuration, and a runnable social simulation.

This page treats that first loop as an architecture pattern. Repository code is the source of truth. The selected Compound Engineering profile shaped the page as concept plus workflow plus QA review, but no `STRATEGY.md` or `docs/solutions/**` source was present in this checkout, so no prior local strategy or solved-problem note is cited.

## What Is The Simplest Version?

The product-level answer is already small: upload seed materials and describe the prediction requirement. The README says MiroFish extracts real-world seed information, builds a parallel digital world, and returns a prediction report plus an interactive simulated world; its workflow begins with graph building, then environment setup, then simulation. Sources: [README.md:27-32](), [README.md:86-92]()

The implementation makes that smaller still. The home screen only requires two inputs before moving forward: at least one `.pdf`, `.md`, or `.txt` file, and a non-empty `simulationRequirement`. It stores that pending upload locally and routes to `/process/new`, where backend calls begin. Sources: [frontend/src/views/Home.vue:136-149](), [frontend/src/views/Home.vue:238-240](), [frontend/src/views/Home.vue:297-310](), [frontend/src/router/index.js:9-25]()

So the smallest version is:

```text
seed files + simulation question
        |
        v
ontology + graph memory
        |
        v
agent profiles + simulation_config.json
        |
        v
OASIS runner process with twitter/reddit/parallel mode
```

## Where Does The First Question Enter The System?

The first question enters as `simulation_requirement`. The frontend posts files plus that requirement to `POST /api/graph/ontology/generate`; the backend requires both a simulation requirement and uploaded files, saves them into a project, extracts text, and calls the ontology generator. Sources: [frontend/src/api/graph.js:3-18](), [backend/app/api/graph.py:120-148](), [backend/app/api/graph.py:153-222]()

The important design choice is that MiroFish does not ask the frontend to keep passing the whole prompt, documents, ontology, and graph state between screens. It creates a server-side `Project` with files, extracted text, ontology, graph id, chunk settings, and the original `simulation_requirement`. Sources: [backend/app/models/project.py:1-4](), [backend/app/models/project.py:26-73](), [backend/app/models/project.py:133-174](), [backend/app/models/project.py:274-290]()

### Minimal Input Contract

| Input | Required? | Evidence | Why it matters |
|---|---:|---|---|
| Seed files | Yes | Backend rejects missing uploads and accepts configured extensions. | Gives the world its source material. |
| Simulation question | Yes | Backend rejects empty `simulation_requirement`. | Defines what the world is being built to explore. |
| Project name | No | Defaults to `Unnamed Project`. | Labels the stored project. |
| Additional context | No | Passed to ontology generation only when present. | Lets the user bias interpretation without changing files. |

Sources: [backend/app/api/graph.py:153-173](), [backend/app/config.py:38-45](), [backend/app/api/graph.py:175-222]()

## What Has To Exist Before Agents Can Run?

A user question is not enough. The system first converts documents into a graph-backed social substrate. The `/api/graph/build` endpoint requires an existing project, generated ontology, extracted text, and a Zep API key. It then chunks text, creates a graph, sets ontology, adds text batches, waits for processing, fetches graph data, and marks the project as `graph_completed`. Sources: [backend/app/api/graph.py:260-296](), [backend/app/api/graph.py:308-363](), [backend/app/api/graph.py:389-493]()

The frontend treats this as a background task. `MainView.vue` starts graph building, polls task status every two seconds, refreshes graph data, and advances the phase when the task completes. Sources: [frontend/src/views/MainView.vue:234-253](), [frontend/src/views/MainView.vue:276-349]()

That means the first question becomes usable only after it has a graph id:

```text
Project
  project_id
  simulation_requirement
  extracted_text.txt
  ontology
  graph_id       <- required before simulation creation
```

Sources: [backend/app/models/project.py:35-50](), [backend/app/api/simulation.py:197-224]()

## Where Does Complexity Become Necessary?

Complexity starts when graph entities must become social agents. `POST /api/simulation/create` creates a `SimulationState` with project id, graph id, platform toggles, counts, generated config status, and runtime status. Sources: [backend/app/api/simulation.py:163-229](), [backend/app/services/simulation_manager.py:25-35](), [backend/app/services/simulation_manager.py:43-112](), [backend/app/services/simulation_manager.py:194-228]()

Then `POST /api/simulation/prepare` performs the expensive work asynchronously. It checks for already prepared files, retrieves the project question and extracted text, previews entity counts from the graph, creates a task, and runs preparation in a background thread. Sources: [backend/app/api/simulation.py:359-399](), [backend/app/api/simulation.py:424-490](), [backend/app/api/simulation.py:507-625]()

Inside `SimulationManager.prepare_simulation`, the system reads filtered Zep entities, generates OASIS profiles, writes Reddit JSON and Twitter CSV profile files, generates simulation parameters, writes `simulation_config.json`, and sets the simulation to `ready`. Sources: [backend/app/services/simulation_manager.py:230-302](), [backend/app/services/simulation_manager.py:329-374](), [backend/app/services/simulation_manager.py:384-443]()

### Preparation Outputs

| Output file/state | Produced by | Purpose |
|---|---|---|
| `state.json` | `SimulationManager._save_simulation_state` | Durable simulation lifecycle state. |
| `reddit_profiles.json` | `OasisProfileGenerator.save_profiles` via manager | Agent profiles for Reddit-style simulation. |
| `twitter_profiles.csv` | `OasisProfileGenerator.save_profiles` via manager | Agent profiles in OASIS Twitter CSV format. |
| `simulation_config.json` | `SimulationConfigGenerator.generate_config` | Time, event, platform, and per-agent behavior config. |

Sources: [backend/app/services/simulation_manager.py:145-155](), [backend/app/services/simulation_manager.py:361-425](), [backend/app/api/simulation.py:240-312]()

## What Makes The Runtime Provider-Neutral?

The LLM boundary is provider-neutral by API shape, not by absence of dependencies. Configuration uses `LLM_API_KEY`, `LLM_BASE_URL`, and `LLM_MODEL_NAME`; README explicitly frames this as any LLM API compatible with the OpenAI SDK format. The `LLMClient` passes the configured key and base URL into the OpenAI client, so BYOK is already represented by environment variables. Sources: [README.md:115-127](), [backend/app/config.py:30-37](), [backend/app/utils/llm_client.py:17-33]()

```python
# backend/app/utils/llm_client.py
self.api_key = api_key or Config.LLM_API_KEY
self.base_url = base_url or Config.LLM_BASE_URL
self.model = model or Config.LLM_MODEL_NAME
self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
```

Sources: [backend/app/utils/llm_client.py:17-33]()

The current implementation is not fully vendor-agnostic across every boundary. It depends on Zep Cloud for graph memory and OASIS/CAMEL for social simulation packages. For BYOC/BYOK architecture, the LLM side is portable through OpenAI-compatible configuration; the memory and simulation sides would need explicit adapter interfaces if users must bring a different graph store or simulation engine. Sources: [backend/pyproject.toml:11-35](), [backend/app/config.py:35-49](), [backend/app/api/graph.py:286-296](), [backend/app/services/simulation_runner.py:387-420]()

## How Does The Question Become A World?

Ontology generation is the first model-mediated translation. `OntologyGenerator` takes document texts, the simulation requirement, and optional context, then asks for entity and relationship types suitable for social-media public opinion simulation. It requires entity type names in English PascalCase, relationship names in English upper snake case, and returns JSON. Sources: [backend/app/services/ontology_generator.py:29-89](), [backend/app/services/ontology_generator.py:176-227](), [backend/app/services/ontology_generator.py:231-260]()

Simulation configuration is the second translation. `SimulationConfigGenerator` turns the question, document text, graph id, and entities into time settings, event settings, per-agent activity settings, and platform settings. It stores model and base URL metadata in the generated config, which is useful for auditability without hard-coding a provider into the product flow. Sources: [backend/app/services/simulation_config_generator.py:1-11](), [backend/app/services/simulation_config_generator.py:147-197](), [backend/app/services/simulation_config_generator.py:200-241](), [backend/app/services/simulation_config_generator.py:243-277]()

## How Does It Run?

The run boundary is deliberately process-based. `POST /api/simulation/start` validates `simulation_id`, platform, optional `max_rounds`, optional graph memory updates, and ready state. It then calls `SimulationRunner.start_simulation`, updates simulation status to `running`, and returns the run state. Sources: [backend/app/api/simulation.py:1451-1527](), [backend/app/api/simulation.py:1528-1627]()

`SimulationRunner` loads `simulation_config.json`, calculates total rounds from total simulated hours and minutes per round, chooses one of `run_twitter_simulation.py`, `run_reddit_simulation.py`, or `run_parallel_simulation.py`, and launches it as a subprocess in the simulation directory. It writes the main log to `simulation.log` and monitors platform-specific `actions.jsonl` files. Sources: [backend/app/services/simulation_runner.py:313-360](), [backend/app/services/simulation_runner.py:387-479](), [backend/app/services/simulation_runner.py:481-547]()

The frontend exposes the same abstraction: after preparation, it lets users keep auto-generated rounds or apply a `max_rounds` cap, then moves to the simulation run step. Sources: [frontend/src/components/Step2EnvSetup.vue:438-525](), [frontend/src/components/Step2EnvSetup.vue:694-707](), [frontend/src/components/Step2EnvSetup.vue:743-758]()

## What Would Break If An Abstraction Disappeared?

| Abstraction | If removed | Evidence |
|---|---|---|
| `ProjectManager` | The frontend would need to carry extracted text, ontology, graph id, and files across calls. | Project state is persisted server-side. |
| `TaskManager`-style async flow | Graph build and simulation preparation would block request/response paths. | Both graph build and prepare run in background threads and are polled. |
| OpenAI-compatible LLM config | Users could not bring compatible model endpoints with their own keys. | LLM key/base/model are env-driven. |
| Zep graph boundary | Entity reading, graph build, and optional graph memory updates lose their current storage substrate. | Graph APIs require `ZEP_API_KEY` and graph ids. |
| Script runner boundary | The web backend would have to embed OASIS execution directly. | Runner starts scripts as subprocesses and monitors logs. |

Sources: [backend/app/models/project.py:101-130](), [backend/app/api/graph.py:364-513](), [backend/app/api/simulation.py:490-612](), [backend/app/config.py:30-37](), [backend/app/services/simulation_runner.py:416-448]()

## The First Question, Reframed

MiroFish’s smallest architecture is not “an AI answer generator.” It is a compiler from a user’s question and seed files into a runnable social world: ontology, graph memory, agent profiles, behavior parameters, event seeds, and a process-managed simulation. The current LLM boundary supports BYOK/BYOC through OpenAI-compatible environment configuration, while graph memory and simulation execution are concrete Zep and OASIS choices that would need adapters for full vendor-agnostic substitution. Sources: [README.md:86-92](), [backend/app/config.py:30-49](), [backend/app/services/simulation_manager.py:230-248](), [backend/app/services/simulation_runner.py:196-205]()

---

## 02. Why Is Uploading Files Not Enough?

> The graph-building page asks why raw documents must become ontology, chunks, tasks, and Zep graph memory before they can support simulation.

- Page Markdown: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/02-why-is-uploading-files-not-enough.md
- Generated: 2026-05-24T18:10:49.702Z

### Source Files

- `backend/app/api/graph.py`
- `backend/app/services/ontology_generator.py`
- `backend/app/services/graph_builder.py`
- `backend/app/services/text_processor.py`
- `backend/app/utils/file_parser.py`
- `backend/app/models/project.py`
- `backend/app/models/task.py`
- `frontend/src/components/Step1GraphBuild.vue`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [backend/app/api/graph.py](backend/app/api/graph.py)
- [backend/app/services/ontology_generator.py](backend/app/services/ontology_generator.py)
- [backend/app/services/graph_builder.py](backend/app/services/graph_builder.py)
- [backend/app/services/text_processor.py](backend/app/services/text_processor.py)
- [backend/app/utils/file_parser.py](backend/app/utils/file_parser.py)
- [backend/app/models/project.py](backend/app/models/project.py)
- [backend/app/models/task.py](backend/app/models/task.py)
- [frontend/src/components/Step1GraphBuild.vue](frontend/src/components/Step1GraphBuild.vue)
- [backend/app/api/simulation.py](backend/app/api/simulation.py)
- [backend/app/services/simulation_manager.py](backend/app/services/simulation_manager.py)
- [backend/app/services/simulation_runner.py](backend/app/services/simulation_runner.py)
- [backend/app/services/zep_graph_memory_updater.py](backend/app/services/zep_graph_memory_updater.py)
- [backend/app/config.py](backend/app/config.py)
- [backend/app/utils/llm_client.py](backend/app/utils/llm_client.py)
</details>

# Why Is Uploading Files Not Enough?

Uploading files only gives MiroFish bytes and filenames. The simulation layer needs something stronger: a project with durable extracted text, an ontology that defines which actors and relationships matter, chunked episodes that can be ingested into a graph, task state that tracks long-running work, and a graph id that later simulation steps can read from or update.

The central question is simple: if the goal is social simulation, what must be true before agents can act? The code’s answer is that documents must be converted from passive source material into a structured, queryable memory substrate.

Sources: [backend/app/api/graph.py:122-247](), [backend/app/api/graph.py:260-522](), [frontend/src/components/Step1GraphBuild.vue:108-168]()

## What is the simplest version?

The simplest version would be: upload a PDF or text file, then ask a model to simulate from that text. MiroFish does not stop there because the rest of the product is not just summarization. The graph build step must preserve reusable state across endpoints, expose progress, produce graph nodes and edges, and hand a `graph_id` into simulation creation.

That is why `generate_ontology` creates a project, saves uploaded files, extracts and preprocesses text, persists the extracted text, generates ontology, and moves the project to `ONTOLOGY_GENERATED`. The upload is only the first operation in that sequence.

```python
# backend/app/api/graph.py
project = ProjectManager.create_project(name=project_name)
project.simulation_requirement = simulation_requirement
...
text = FileParser.extract_text(file_info["path"])
text = TextProcessor.preprocess_text(text)
...
ProjectManager.save_extracted_text(project.project_id, all_text)
...
project.ontology = {
    "entity_types": ontology.get("entity_types", []),
    "edge_types": ontology.get("edge_types", [])
}
project.status = ProjectStatus.ONTOLOGY_GENERATED
```

Sources: [backend/app/api/graph.py:175-235](), [backend/app/models/project.py:17-24](), [backend/app/models/project.py:133-174](), [backend/app/models/project.py:275-290]()

## What does a raw file lack?

A raw file does not answer the operational questions the simulator asks later.

| Need | Why upload alone is insufficient | Where the repo encodes it |
| --- | --- | --- |
| Text extraction | The backend accepts files, but graph building consumes text. PDF, Markdown, and text files must be parsed first. | `FileParser.extract_text` branches by extension. |
| Simulation intent | The ontology is generated from documents plus `simulation_requirement`, not documents alone. | `/ontology/generate` requires `simulation_requirement`. |
| Entity schema | Simulation needs actors that can speak or interact, not arbitrary concepts. | Ontology prompt constrains entity types. |
| Relationship schema | Graph edges need source/target-compatible relation types. | Ontology prompt and validation normalize edge definitions. |
| Persistent context | Later endpoints use `project_id`; they do not re-upload all source files. | `ProjectManager` persists project metadata and extracted text. |
| Progress and failure handling | Graph building is long-running and external-service-backed. | `TaskManager` tracks `pending`, `processing`, `completed`, and `failed`. |

Sources: [backend/app/utils/file_parser.py:61-108](), [backend/app/api/graph.py:153-173](), [backend/app/services/ontology_generator.py:29-173](), [backend/app/models/project.py:26-73](), [backend/app/models/task.py:16-53]()

## Why must documents become ontology?

What would break if ontology disappeared? The graph builder would have chunks of text, but no domain contract for what counts as an entity or edge. The ontology generator explicitly asks for social-media-simulation-friendly actors: people, companies, organizations, government bodies, media outlets, platforms, or representative groups. It rejects abstract concepts, topics, and attitudes as entity types because those cannot act as simulated social accounts.

The generated ontology is not accepted blindly. The service normalizes entity names into PascalCase, relation names into uppercase, fixes source/target references after renaming, deduplicates entity types, caps entity and edge types at ten each, and ensures fallback `Person` and `Organization` types exist.

Sources: [backend/app/services/ontology_generator.py:41-57](), [backend/app/services/ontology_generator.py:91-130](), [backend/app/services/ontology_generator.py:277-398]()

## Why must text become chunks?

A graph ingestion service cannot reliably receive one arbitrary-length project document as a single meaningful unit. MiroFish splits extracted text into overlapping chunks before sending it to the graph service. Chunking preserves local context while making each submitted unit small enough to process.

The split function also tries to cut at sentence or paragraph boundaries before falling back to raw character ranges. That means chunking is not just a transport trick; it shapes what evidence is available to graph extraction.

Sources: [backend/app/services/text_processor.py:17-34](), [backend/app/utils/file_parser.py:161-202](), [backend/app/api/graph.py:392-403]()

## Why must graph building become a task?

Graph construction calls an external graph service, submits multiple batches, waits for processing, then fetches graph data. That is too long and failure-prone to model as a simple synchronous upload response. The API creates a task, marks the project as `GRAPH_BUILDING`, starts a daemon thread, updates progress, and eventually stores completion or failure state.

The guard clauses also show why ordering matters. A project still in `CREATED` state cannot build a graph because ontology has not been generated. A project already building cannot start another build unless forced.

```python
# backend/app/api/graph.py
if project.status == ProjectStatus.CREATED:
    return jsonify({
        "success": False,
        "error": t('api.ontologyNotGenerated')
    }), 400
```

Sources: [backend/app/api/graph.py:316-337](), [backend/app/api/graph.py:364-423](), [backend/app/api/graph.py:447-522](), [backend/app/models/task.py:75-164]()

## Why must chunks become Zep graph memory?

In the current implementation, `GraphBuilderService` is the boundary between MiroFish project state and Zep standalone graphs. It creates a graph id, dynamically turns ontology definitions into Zep entity and edge models, submits chunks as text episodes, waits until episodes are processed, and reads back nodes and edges.

```text
Upload files
  -> extracted_text.txt in ProjectManager
  -> ontology: entity_types + edge_types
  -> chunked text episodes
  -> Zep graph_id
  -> simulation creation and profile/config generation
  -> optional graph memory updates during simulation run
```

The result is not just stored text. `get_graph_data` returns graph nodes with labels, summaries, attributes, and timestamps, plus edges with facts, source/target node ids and names, attributes, temporal fields, and episode references. That is the shape downstream simulation features can query and display.

Sources: [backend/app/services/graph_builder.py:193-203](), [backend/app/services/graph_builder.py:205-292](), [backend/app/services/graph_builder.py:294-345](), [backend/app/services/graph_builder.py:347-401](), [backend/app/services/graph_builder.py:426-501]()

## How does the UI expose this pipeline?

`Step1GraphBuild.vue` presents the workflow as three stages: ontology generation, graph/RAG build, and simulation creation. The first card shows generated entity and relation types and lets the user inspect descriptions, attributes, examples, and relation source/target connections. The second card reports graph node, edge, and schema-type counts. The third card only creates the simulation after both `project_id` and `graph_id` exist.

That UI is a product-facing explanation of the same backend constraint: users are not waiting for “upload” to finish; they are waiting for source material to become a simulation-ready graph.

Sources: [frontend/src/components/Step1GraphBuild.vue:4-105](), [frontend/src/components/Step1GraphBuild.vue:108-168](), [frontend/src/components/Step1GraphBuild.vue:213-228](), [frontend/src/components/Step1GraphBuild.vue:252-257]()

## Where does simulation depend on the graph?

Simulation creation requires `project_id` and either an explicit `graph_id` or a `graph_id` already saved on the project. If neither is available, the API returns `graphNotBuilt`. Once created, the simulation state stores both ids.

Later preparation uses `state.graph_id` to initialize profile generation and to pass graph context into profile generation. Configuration generation also receives the same `graph_id`, simulation requirement, original document text, and filtered entities. At run time, graph memory updates are optional, but if enabled they require `graph_id` and create a `ZepGraphMemoryUpdater`.

Sources: [backend/app/api/simulation.py:166-224](), [backend/app/services/simulation_manager.py:194-228](), [backend/app/services/simulation_manager.py:315-347](), [backend/app/services/simulation_manager.py:403-410](), [backend/app/services/simulation_runner.py:316-384]()

## What does “memory” add after the initial graph?

The initial graph is built from documents. Runtime graph memory is different: it can add simulation activities back into the graph as text. `ZepGraphMemoryUpdater` owns a queue, platform buffers for Twitter and Reddit activity, a worker thread, and sends combined activity text to the graph with `client.graph.add`.

So the graph is both seed context and, when enabled, a place to record simulated activity. Uploading files cannot provide that evolving memory loop.

Sources: [backend/app/services/zep_graph_memory_updater.py:232-269](), [backend/app/services/zep_graph_memory_updater.py:275-291](), [backend/app/services/zep_graph_memory_updater.py:407-424](), [backend/app/services/zep_graph_memory_updater.py:490-510]()

## Provider-neutral architecture note

The current code is partly BYOK-friendly: LLM access is configured through `LLM_API_KEY`, `LLM_BASE_URL`, and `LLM_MODEL_NAME`, and `LLMClient` uses an OpenAI-compatible client with injectable key, base URL, and model. That supports multiple OpenAI-compatible providers without changing the ontology generator interface.

The graph store is less abstract today: `GraphBuilderService` imports and instantiates Zep directly, and graph build endpoints require `ZEP_API_KEY`. A vendor-agnostic evolution should preserve the same product stages while putting graph operations behind an interface such as `create_graph`, `set_ontology`, `add_text_batches`, `wait_for_ingestion`, `get_graph_data`, and `add_memory_event`. That keeps the Grok-Wiki or skill-source layer portable across file, repository, or catalog inputs, and keeps BYOC/BYOK decisions at adapter/config boundaries rather than inside the upload workflow.

Sources: [backend/app/config.py:30-45](), [backend/app/config.py:67-74](), [backend/app/utils/llm_client.py:17-33](), [backend/app/services/graph_builder.py:13-19](), [backend/app/services/graph_builder.py:46-52]()

## Source-context note

The requested knowledge profile mentioned generated wiki context, solved-problem notes under `docs/solutions/**`, and `STRATEGY.md` when present. In this repository checkout, the focused source pass found the implementation files above but did not find `STRATEGY.md` or `docs/solutions/**`, so this page treats repository code as the source of truth and does not claim prior strategy or solution-note evidence.

Sources: [backend/app/api/graph.py:120-235](), [backend/app/services/graph_builder.py:40-52]()

## Summary

Uploading files is not enough because MiroFish is not building a file viewer. It is building a simulation substrate. Files become extracted text; extracted text plus simulation intent becomes ontology; ontology plus chunks becomes a graph; graph construction becomes a tracked task; and the resulting `graph_id` becomes the handle that simulation creation, profile generation, configuration generation, and optional runtime memory updates use. Without those transformations, the simulator would have documents, but not actors, relationships, progress state, or graph memory. Sources: [backend/app/api/graph.py:348-493](), [backend/app/api/simulation.py:211-224](), [backend/app/services/zep_graph_memory_updater.py:414-418]()

---

## 03. How Does a Graph Become a Society?

> The preparation stage reframes entities as agents: which nodes are eligible, what profile fields OASIS needs, and where LLM-generated configuration becomes platform-specific behavior without hard-coding one model provider.

- Page Markdown: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/03-how-does-a-graph-become-a-society.md
- Generated: 2026-05-24T18:10:56.458Z

### Source Files

- `backend/app/api/simulation.py`
- `backend/app/services/simulation_manager.py`
- `backend/app/services/zep_entity_reader.py`
- `backend/app/services/oasis_profile_generator.py`
- `backend/app/services/simulation_config_generator.py`
- `backend/scripts/test_profile_format.py`
- `frontend/src/components/Step2EnvSetup.vue`
- `frontend/src/api/simulation.js`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [backend/app/api/simulation.py](backend/app/api/simulation.py)
- [backend/app/services/simulation_manager.py](backend/app/services/simulation_manager.py)
- [backend/app/services/zep_entity_reader.py](backend/app/services/zep_entity_reader.py)
- [backend/app/services/oasis_profile_generator.py](backend/app/services/oasis_profile_generator.py)
- [backend/app/services/simulation_config_generator.py](backend/app/services/simulation_config_generator.py)
- [backend/app/config.py](backend/app/config.py)
- [backend/app/utils/llm_client.py](backend/app/utils/llm_client.py)
- [backend/scripts/test_profile_format.py](backend/scripts/test_profile_format.py)
- [frontend/src/components/Step2EnvSetup.vue](frontend/src/components/Step2EnvSetup.vue)
- [frontend/src/api/simulation.js](frontend/src/api/simulation.js)
</details>

# How Does a Graph Become a Society?

A graph becomes a society only after MiroFish answers three practical questions: which graph nodes are eligible to become agents, what profile fields OASIS needs to run them, and which generated parameters turn static profiles into platform behavior. The preparation stage is where that translation happens.

This page follows the bundled Compound Engineering wiki guidance for page shape, but repository code is the source of truth for implementation claims. No `STRATEGY.md` or `docs/solutions/**` source was found in the focused scan, so this page does not cite product strategy or solved-problem notes as evidence.

## What Problem Exists?

The source graph is not yet a runnable simulation. A raw node can be a default graph artifact, a person, an institution, or an event-adjacent concept. OASIS needs agent profile files, per-agent behavior settings, event triggers, platform knobs, and a simulation state that says the environment is ready.

MiroFish solves this by splitting preparation into an API orchestration layer, a graph reader, a profile generator, and a simulation config generator. The lifecycle starts with `POST /api/simulation/create`, continues through asynchronous `POST /api/simulation/prepare`, and becomes ready only after profile files plus `simulation_config.json` exist and state says configuration was generated.  
Sources: [backend/app/api/simulation.py:165-229](), [backend/app/api/simulation.py:240-356](), [backend/app/services/simulation_manager.py:230-248]()

```text
Zep graph nodes
  -> eligible EntityNode objects
  -> OASIS agent profiles
  -> generated activity/event/platform config
  -> ready simulation directory
```

## What Is the Simplest Version?

The simplest society builder would keep only nodes with meaningful domain labels, assign each one a username and persona, and save the files expected by the simulator. MiroFish implements exactly that baseline before adding LLM enrichment.

Eligibility is label-based: a node must have at least one label other than the generic `"Entity"` or `"Node"`. If the caller supplies `defined_entity_types`, the node must match one of those labels. When enrichment is enabled, the reader also attaches related edges and neighbor node summaries, so profile generation can see context instead of just names.  
Sources: [backend/app/services/zep_entity_reader.py:22-51](), [backend/app/services/zep_entity_reader.py:215-331]()

```python
# backend/app/services/zep_entity_reader.py
custom_labels = [l for l in labels if l not in ["Entity", "Node"]]

if not custom_labels:
    continue

if defined_entity_types:
    matching_labels = [l for l in custom_labels if l in defined_entity_types]
    if not matching_labels:
        continue
```

## Where Does Complexity Become Necessary?

Complexity appears when graph nodes need to behave differently as social actors. A student, a media outlet, and a university cannot share the same behavior profile. The preparation manager therefore makes graph reading only the first stage, then generates profiles, writes platform-specific profile files, generates simulation parameters, and marks the state `READY`.

The manager also fails early when no eligible entities are found, which is the correct boundary: no agents means no society.  
Sources: [backend/app/services/simulation_manager.py:272-303](), [backend/app/services/simulation_manager.py:304-443]()

| Stage | Owner | Output |
|---|---|---|
| Read graph | `ZepEntityReader` | `FilteredEntities` with `entities_count` and `entity_types` |
| Generate profiles | `OasisProfileGenerator` | `reddit_profiles.json` and/or `twitter_profiles.csv` |
| Generate config | `SimulationConfigGenerator` | `simulation_config.json` |
| Persist readiness | `SimulationManager` | `state.json` with `status = ready` |

## Which Profile Fields Does OASIS Need?

MiroFish’s internal profile object includes common identity fields (`user_id`, `user_name`, `name`, `bio`, `persona`), social counters, demographic/persona metadata, topic interests, and source graph references. It then converts the same internal profile into different OASIS-facing formats for Reddit and Twitter.  
Sources: [backend/app/services/oasis_profile_generator.py:29-140]()

Reddit profiles are JSON records with `user_id`, `username`, `name`, `bio`, `persona`, `karma`, `created_at`, and normalized OASIS-required demographic fields such as `age`, `gender`, `mbti`, and `country`. Twitter profiles are CSV rows with `user_id`, `name`, `username`, `user_char`, and `description`; `user_char` merges `bio` and `persona` for internal agent prompting.  
Sources: [backend/app/services/oasis_profile_generator.py:1047-1119](), [backend/app/services/oasis_profile_generator.py:1121-1193]()

The test script documents an older/expected-format check for Twitter CSV and Reddit JSON and is useful as validation context, but the current generator is more authoritative because it writes the production files used by preparation.  
Sources: [backend/scripts/test_profile_format.py:20-123](), [backend/scripts/test_profile_format.py:130-159]()

## How Does LLM Output Become Behavior?

The LLM does not directly run the simulation. It produces structured JSON that is parsed into dataclasses: time configuration, event configuration, agent activity configuration, and platform configuration. This distinction matters: generation is advisory, but the parsed config becomes platform-specific runtime input.

`SimulationConfigGenerator` builds a context from the simulation requirement, entity summaries, and original document text. It then generates time config, event config, batched agent configs, assigns initial posts to matching agent types, and finally attaches Twitter/Reddit platform configs when enabled.  
Sources: [backend/app/services/simulation_config_generator.py:147-197](), [backend/app/services/simulation_config_generator.py:243-379](), [backend/app/services/simulation_config_generator.py:381-432]()

```mermaid
flowchart LR
  subgraph API["API layer"]
    Create["/api/simulation/create"]
    Prepare["/api/simulation/prepare"]
    Status["/api/simulation/prepare/status"]
  end

  subgraph Prep["Preparation services"]
    Reader["ZepEntityReader"]
    Profiles["OasisProfileGenerator"]
    ConfigGen["SimulationConfigGenerator"]
    Manager["SimulationManager"]
  end

  subgraph Files["Simulation directory"]
    State["state.json"]
    Reddit["reddit_profiles.json"]
    Twitter["twitter_profiles.csv"]
    SimConfig["simulation_config.json"]
  end

  Create --> Manager
  Prepare --> Manager
  Status --> State
  Manager --> Reader
  Manager --> Profiles
  Manager --> ConfigGen
  Profiles --> Reddit
  Profiles --> Twitter
  ConfigGen --> SimConfig
  Manager --> State
```

Sources: [backend/app/api/simulation.py:359-625](), [backend/app/services/simulation_manager.py:329-443](), [backend/app/api/simulation.py:642-745]()

## What Makes It Platform-Specific Without Hard-Coding One Model Provider?

Platform-specific behavior is encoded as data, not as a model-provider branch. Twitter and Reddit differ in profile file format and platform configuration weights, while the LLM provider is selected through `LLM_API_KEY`, `LLM_BASE_URL`, and `LLM_MODEL_NAME`. The implementation uses the OpenAI-compatible Python client, so it is provider-neutral at the OpenAI-compatible API boundary, though not a generic SDK abstraction.  
Sources: [backend/app/config.py:30-37](), [backend/app/services/oasis_profile_generator.py:181-199](), [backend/app/services/simulation_config_generator.py:225-241](), [backend/app/utils/llm_client.py:17-33]()

| Concern | Current implementation | BYOC/BYOK implication |
|---|---|---|
| Model key | `LLM_API_KEY` from environment | User can bring their own key |
| Model endpoint | `LLM_BASE_URL` from environment | Any OpenAI-compatible endpoint can be configured |
| Model name | `LLM_MODEL_NAME` from environment | Model selection is config-driven |
| Graph source | `ZEP_API_KEY` and Zep client | Current graph memory provider is Zep-specific |
| Platform output | JSON/CSV plus config objects | OASIS behavior is file/config-driven, not provider-driven |

## How Does the Frontend See the Society Forming?

The UI treats preparation as a live process. It starts `prepareSimulation`, receives a task id and expected entity count, polls task status every two seconds, polls profiles every three seconds, and switches to config polling when the backend reaches the configuration stage. Once config exists, the UI shows agent counts, time settings, agent behavior cards, platform recommendation weights, narrative direction, hot topics, and initial posts mapped to agent ids.  
Sources: [frontend/src/api/simulation.js:11-68](), [frontend/src/components/Step2EnvSetup.vue:64-180](), [frontend/src/components/Step2EnvSetup.vue:182-345](), [frontend/src/components/Step2EnvSetup.vue:771-899](), [frontend/src/components/Step2EnvSetup.vue:955-1016]()

The frontend also computes default run rounds from generated `time_config` rather than a hard-coded value, while still letting the user override the run length before starting the simulation.  
Sources: [frontend/src/components/Step2EnvSetup.vue:438-525](), [frontend/src/components/Step2EnvSetup.vue:694-707](), [frontend/src/components/Step2EnvSetup.vue:743-758]()

## What Would Break If This Abstraction Disappeared?

If eligibility lived inside profile generation, default graph nodes could become agents accidentally. If profile generation wrote only one platform format, OASIS would lose either Reddit JSON or Twitter CSV compatibility. If LLM output were allowed to bypass parsing, invalid JSON, invalid `poster_type`, or overlarge activation counts could leak into runtime behavior.

The code guards these boundaries by filtering labels before generation, normalizing/generating profile defaults, retrying and repairing JSON responses, clamping time activation counts to available agents, and mapping event `poster_type` values to real agent ids with fallbacks.  
Sources: [backend/app/services/zep_entity_reader.py:252-331](), [backend/app/services/oasis_profile_generator.py:524-581](), [backend/app/services/simulation_config_generator.py:434-533](), [backend/app/services/simulation_config_generator.py:611-644](), [backend/app/services/simulation_config_generator.py:728-811]()

## Closing Summary

A MiroFish graph becomes a society when labeled graph entities are filtered, enriched with relationship context, transformed into OASIS-compatible profiles, paired with generated time/event/agent/platform behavior, and persisted as a ready simulation directory. The architecture is BYOK-friendly for LLMs through environment-configured OpenAI-compatible endpoints, while the current graph reader remains Zep-specific and should be treated as the provider boundary for any future BYOC graph-memory work.  
Sources: [backend/app/services/simulation_manager.py:230-443](), [backend/app/config.py:30-49]()

---

## 04. Why Is Running the Simulation a Process Boundary?

> The run stage asks why live simulation leaves Flask request handling: subprocesses, file-based IPC, action logs, SQLite traces, and optional graph-memory updates create the observable boundary.

- Page Markdown: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/04-why-is-running-the-simulation-a-process-boundary.md
- Generated: 2026-05-24T18:11:10.972Z

### Source Files

- `backend/app/services/simulation_runner.py`
- `backend/app/services/simulation_ipc.py`
- `backend/app/services/zep_graph_memory_updater.py`
- `backend/scripts/run_parallel_simulation.py`
- `backend/scripts/run_twitter_simulation.py`
- `backend/scripts/run_reddit_simulation.py`
- `backend/scripts/action_logger.py`
- `frontend/src/components/Step3Simulation.vue`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [backend/app/services/simulation_runner.py](backend/app/services/simulation_runner.py)
- [backend/app/services/simulation_ipc.py](backend/app/services/simulation_ipc.py)
- [backend/app/services/zep_graph_memory_updater.py](backend/app/services/zep_graph_memory_updater.py)
- [backend/app/api/simulation.py](backend/app/api/simulation.py)
- [backend/scripts/run_parallel_simulation.py](backend/scripts/run_parallel_simulation.py)
- [backend/scripts/run_twitter_simulation.py](backend/scripts/run_twitter_simulation.py)
- [backend/scripts/run_reddit_simulation.py](backend/scripts/run_reddit_simulation.py)
- [backend/scripts/action_logger.py](backend/scripts/action_logger.py)
- [frontend/src/components/Step3Simulation.vue](frontend/src/components/Step3Simulation.vue)
- [frontend/src/api/simulation.js](frontend/src/api/simulation.js)
</details>

# Why Is Running the Simulation a Process Boundary?

A simulation run is not just a Flask endpoint doing work. It is a handoff from request/response code into a long-lived Python worker process that owns the OASIS environments, writes observable files, accepts later commands through file IPC, and leaves traces for the API and UI to read.

The useful question is: what would break if the simulation stayed inside the Flask request? The answer is visible in the code: the run can outlive the HTTP request, expose progress through durable files, keep an environment alive for interviews after the main loop, terminate as a process group, and optionally stream actions into graph memory without coupling that work to the web server thread.

Context note: this page uses the provided Compound Engineering wiki guidance as page-shape metadata. In this checkout, no `STRATEGY.md` or `docs/solutions/**` source files were present in the focused inventory, so repository code remains the source of truth.

## What is the simplest version?

The simplest version would be: `/api/simulation/start` receives a `simulation_id`, loads `simulation_config.json`, runs the loop inline, and returns when done. The implementation chooses a different shape. The Flask route validates `simulation_id`, `platform`, `max_rounds`, `force`, and optional graph-memory settings, then delegates to `SimulationRunner.start_simulation(...)` and returns a state object that includes `runner_status`, platform flags, and `process_pid`.

That tells us the product contract is asynchronous: starting a run is not the same as completing a run. The UI also treats start as a transition into polling, not as a blocking operation. `Step3Simulation.vue` sends `platform: 'parallel'`, `force: true`, and `enable_graph_memory_update: true`, then starts status and detail polling after receiving the process id.

Sources: [backend/app/api/simulation.py:1451-1627](), [frontend/src/components/Step3Simulation.vue:382-425](), [frontend/src/api/simulation.js:79-109]()

## Where does the boundary actually sit?

The boundary sits between Flask-managed orchestration and script-managed simulation state.

`SimulationRunner.start_simulation` chooses one of three scripts: `run_twitter_simulation.py`, `run_reddit_simulation.py`, or `run_parallel_simulation.py`. It builds a command using the current Python interpreter, passes `--config`, optionally passes `--max-rounds`, sets the working directory to the simulation directory, redirects stdout/stderr to `simulation.log`, starts a new process session, stores the child PID, and launches a monitor thread.

```python
# backend/app/services/simulation_runner.py
cmd = [
    sys.executable,
    script_path,
    "--config", config_path,
]
process = subprocess.Popen(
    cmd,
    cwd=sim_dir,
    stdout=main_log_file,
    stderr=subprocess.STDOUT,
    text=True,
    encoding='utf-8',
    bufsize=1,
    env=env,
    start_new_session=True,
)
```

The request path owns validation and launch. The worker process owns the live OASIS environment and the simulation loop. The monitor thread bridges them by reading files the worker emits.

Sources: [backend/app/services/simulation_runner.py:313-479](), [backend/app/services/simulation_runner.py:482-547]()

## What crosses the boundary?

The crossing is deliberately file-shaped.

| Boundary artifact | Written by | Read by | Purpose |
|---|---|---|---|
| `simulation_config.json` | preparation flow before run | runner and worker script | input contract for the run |
| `simulation.log` | child process stdout and log manager | runner on failure, humans | main process log and failure tail |
| `run_state.json` | `SimulationRunner` monitor | API status endpoints | durable observable run state |
| `twitter/actions.jsonl` | action logger in worker | monitor/API/detail UI | platform action stream |
| `reddit/actions.jsonl` | action logger in worker | monitor/API/detail UI | platform action stream |
| `twitter_simulation.db` / `reddit_simulation.db` | OASIS environment | worker helpers and API endpoints | raw SQLite traces and post/comment data |
| `env_status.json` | IPC handler in worker | Flask IPC client/API | whether the live environment can accept commands |
| `ipc_commands/*.json` and `ipc_responses/*.json` | Flask and worker | worker and Flask | request/response IPC for interviews and close-env |

The cleanup list is also evidence of the boundary. A forced restart deletes run state, logs, action JSONL files, platform SQLite databases, and `env_status.json`, but not the original config or profile files.

Sources: [backend/app/services/simulation_runner.py:299-310](), [backend/app/services/simulation_runner.py:487-516](), [backend/app/services/simulation_runner.py:1102-1181](), [backend/scripts/action_logger.py:1-13]()

## Why are action logs separate from SQLite traces?

The worker uses SQLite as the platform trace store and JSONL as the API-facing action stream.

Inside the parallel worker, each platform creates its own OASIS environment with its own database path. After each environment step, the worker queries new rows from the SQLite `trace` table using `rowid`, normalizes action names, enriches action context, and writes selected actions through `PlatformActionLogger`. That logger appends one JSON object per line to `twitter/actions.jsonl` or `reddit/actions.jsonl`, including normal actions and event markers such as `round_start`, `round_end`, `simulation_start`, and `simulation_end`.

The monitor thread then reads the JSONL streams incrementally by file position. Event rows update progress and completion flags; action rows become `AgentAction` objects and optionally feed graph memory.

Sources: [backend/scripts/run_parallel_simulation.py:657-746](), [backend/scripts/run_parallel_simulation.py:1101-1288](), [backend/scripts/action_logger.py:22-117](), [backend/app/services/simulation_runner.py:583-691]()

## How does the system keep a finished simulation interactive?

The worker does not necessarily exit when the main loop ends. The parallel script defaults into a wait-for-commands mode, creates a `ParallelIPCHandler`, marks `env_status.json` as `alive`, and repeatedly polls command files every half second. It supports `interview`, `batch_interview`, and `close_env`.

Flask uses `SimulationIPCClient` for the other side. It writes a command JSON file with a UUID, waits for a matching response JSON file, and removes both files after success. Before sending interview commands, `SimulationRunner` checks `env_status.json`; closing the environment is a graceful command, distinct from `/stop`, which terminates the process.

Sources: [backend/scripts/run_parallel_simulation.py:1595-1634](), [backend/scripts/run_parallel_simulation.py:217-299](), [backend/scripts/run_parallel_simulation.py:560-601](), [backend/app/services/simulation_ipc.py:95-187](), [backend/app/services/simulation_runner.py:1373-1489](), [backend/app/services/simulation_runner.py:1610-1656]()

```mermaid
flowchart LR
  subgraph UI["UI"]
    Step3["Step3Simulation.vue"]
  end

  subgraph Flask["Flask API and Runner"]
    Start["/api/simulation/start"]
    Runner["SimulationRunner"]
    IPCClient["SimulationIPCClient"]
    Monitor["monitor thread"]
  end

  subgraph Worker["Simulation subprocess"]
    Parallel["run_parallel_simulation.py"]
    Twitter["Twitter OASIS env"]
    Reddit["Reddit OASIS env"]
    IPCHandler["ParallelIPCHandler"]
  end

  subgraph Files["Simulation directory"]
    Config["simulation_config.json"]
    State["run_state.json"]
    Actions["twitter/reddit/actions.jsonl"]
    DB["*_simulation.db trace tables"]
    EnvStatus["env_status.json"]
    IPCFiles["ipc_commands / ipc_responses"]
    MainLog["simulation.log"]
  end

  subgraph Optional["Optional graph memory"]
    ZepUpdater["ZepGraphMemoryUpdater"]
    Graph["graph.add text episodes"]
  end

  Step3 --> Start
  Start --> Runner
  Runner -->|Popen| Parallel
  Runner --> Config
  Parallel --> Twitter
  Parallel --> Reddit
  Parallel --> MainLog
  Twitter --> DB
  Reddit --> DB
  Parallel --> Actions
  Monitor --> Actions
  Monitor --> State
  IPCClient --> IPCFiles
  IPCHandler --> IPCFiles
  IPCHandler --> EnvStatus
  Monitor --> ZepUpdater
  ZepUpdater --> Graph
```

Sources: [backend/app/services/simulation_runner.py:387-469](), [backend/scripts/run_parallel_simulation.py:1540-1618](), [backend/app/services/simulation_ipc.py:102-187](), [backend/app/services/zep_graph_memory_updater.py:340-418]()

## What does `/stop` mean versus `close-env`?

There are two shutdown paths because there are two different problems.

`/stop` is process control. `SimulationRunner.stop_simulation` moves the run state to `STOPPING`, terminates the subprocess tree, then marks the run `STOPPED`. On Unix, the runner sends signals to the process group created by `start_new_session=True`; on Windows, it uses `taskkill` for the process tree. This is the right tool when the run itself must stop.

`close-env` is protocol control. It sends a `close_env` IPC command so a worker already in wait mode can exit cleanly. The API documentation explicitly distinguishes it from `/stop`: `/stop` forcefully terminates the process, while close-env asks the simulation to shut down the environment and exit.

Sources: [backend/app/services/simulation_runner.py:720-822](), [backend/app/api/simulation.py:2649-2708](), [backend/app/services/simulation_runner.py:1610-1656]()

## Where does graph memory fit?

Graph memory is optional and sits on the Flask-side observation path, not inside the OASIS loop itself. When `/start` asks for graph-memory updates, the API resolves a `graph_id` from simulation or project state and passes it to `SimulationRunner.start_simulation`. The runner creates a `ZepGraphMemoryUpdater` before launching the subprocess. As the monitor reads action JSONL rows, it calls `graph_updater.add_activity_from_dict(...)`.

The updater is another asynchronous boundary: it owns a queue, a daemon worker thread, per-platform buffers, batching, retry behavior, and a final flush on stop. It filters `DO_NOTHING`, converts action dictionaries into natural-language activity episodes, and sends batches with `client.graph.add(...)`.

This is provider-specific in the current implementation because the class imports `zep_cloud.client.Zep` and requires `ZEP_API_KEY`. Architecturally, however, the boundary is already portable: the simulation process emits provider-neutral action JSONL, and the graph-memory adapter consumes that observable stream. A BYOC/BYOK-friendly extension would keep `actions.jsonl` as the source event contract and swap the memory sink behind an adapter rather than coupling OASIS or Flask routes to one hosted memory provider.

Sources: [backend/app/api/simulation.py:1584-1623](), [backend/app/services/simulation_runner.py:372-385](), [backend/app/services/simulation_runner.py:603-684](), [backend/app/services/zep_graph_memory_updater.py:202-246](), [backend/app/services/zep_graph_memory_updater.py:275-308](), [backend/app/services/zep_graph_memory_updater.py:340-418](), [backend/app/config.py:30-37]()

## What does the UI observe?

The UI does not subscribe to the child process directly. It starts the run, receives the backend state, then polls two HTTP endpoints:

- `GET /api/simulation/{id}/run-status` for coarse progress, current rounds, platform completion, and action counts.
- `GET /api/simulation/{id}/run-status/detail` for the action list used by the live activity display.

The backend implements those endpoints by reading `SimulationRunner` state and action files. The Vue component separately detects per-platform round changes and final completion. That keeps the browser out of the process boundary; it sees HTTP resources, not subprocess pipes or SQLite handles.

Sources: [backend/app/api/simulation.py:1705-1752](), [backend/app/api/simulation.py:1763-1838](), [frontend/src/components/Step3Simulation.vue:492-585](), [frontend/src/api/simulation.js:95-109]()

## What would break if the boundary disappeared?

If the run stayed inside Flask request handling, several implemented behaviors lose their natural owner:

| Current behavior | Why the process boundary helps |
|---|---|
| Long simulation loop | The request returns immediately with `process_pid` while the worker continues. |
| Durable progress | `run_state.json` and JSONL files survive beyond one request handler. |
| Platform-parallel execution | The worker can `asyncio.gather(...)` Twitter and Reddit runs while Flask remains responsive. |
| Post-run interviews | The worker keeps OASIS environments alive and accepts IPC commands after the main loop. |
| Force stop | The runner can terminate a process group instead of trying to interrupt in-request Python state. |
| Failure inspection | `simulation.log` captures stdout/stderr and the runner can attach the tail to failed state. |
| Optional graph updates | The monitor can transform observed actions into graph-memory updates without changing the simulation loop. |

The code is not saying subprocesses are the only possible design. It is saying this implementation treats the simulation as a separately observable runtime with file contracts between layers.

Sources: [backend/app/services/simulation_runner.py:426-469](), [backend/app/services/simulation_runner.py:501-547](), [backend/scripts/run_parallel_simulation.py:1583-1590](), [backend/scripts/run_parallel_simulation.py:1595-1634](), [backend/app/services/simulation_runner.py:720-822]()

## Closing summary

Running the simulation is a process boundary because the live simulation has a different lifecycle than a Flask request. Flask validates, launches, monitors, serves status, and sends file IPC commands. The subprocess owns OASIS environments, SQLite trace generation, action JSONL emission, and optional wait-mode interactivity. The shared simulation directory is the contract that makes the boundary observable, restartable, and portable enough to support BYOC/BYOK-oriented adapters around model config and graph-memory sinks. Sources: [backend/app/services/simulation_runner.py:196-205](), [backend/scripts/run_parallel_simulation.py:1-26](), [backend/scripts/action_logger.py:1-13]()

---

## 05. What Can You Now Ask the Sandbox?

> The closing reframe: once graph memory, simulated traces, and agent interviews exist, the report system can reason with tools, assemble evidence section by section, and keep the UI portable across file, repository, or catalog skill sources rather than a single hosted provider.

- Page Markdown: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/pages/05-what-can-you-now-ask-the-sandbox.md
- Generated: 2026-05-24T18:12:30.323Z

### Source Files

- `backend/app/api/report.py`
- `backend/app/services/report_agent.py`
- `backend/app/services/zep_tools.py`
- `backend/app/api/simulation.py`
- `frontend/src/components/Step4Report.vue`
- `frontend/src/components/Step5Interaction.vue`
- `frontend/src/api/report.js`
- `frontend/src/views/InteractionView.vue`

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [backend/app/api/report.py](backend/app/api/report.py)
- [backend/app/services/report_agent.py](backend/app/services/report_agent.py)
- [backend/app/services/zep_tools.py](backend/app/services/zep_tools.py)
- [backend/app/api/simulation.py](backend/app/api/simulation.py)
- [backend/app/config.py](backend/app/config.py)
- [backend/app/utils/llm_client.py](backend/app/utils/llm_client.py)
- [frontend/src/components/Step4Report.vue](frontend/src/components/Step4Report.vue)
- [frontend/src/components/Step5Interaction.vue](frontend/src/components/Step5Interaction.vue)
- [frontend/src/api/report.js](frontend/src/api/report.js)
- [frontend/src/api/simulation.js](frontend/src/api/simulation.js)
- [frontend/src/views/InteractionView.vue](frontend/src/views/InteractionView.vue)
</details>

# What Can You Now Ask the Sandbox?

This page reframes the end state of MiroFish: after a project has graph memory, a completed simulation, generated report sections, and live agent profiles, the sandbox is no longer just a run log. It becomes a question-answering workbench where a report agent can inspect graph facts, reason through simulated outcomes, and ask simulated agents for first-person responses.

The repository code remains the source of truth here. The selected Compound Engineering knowledge profile shaped the page as reusable wiki knowledge, but no `STRATEGY.md` or `docs/solutions/**` files were present in the inspected repository file list, so this page does not claim prior strategy or solved-problem notes as evidence.

## What Problem Exists?

The product problem is simple: a simulation produces too much raw evidence for a user to inspect manually. The backend therefore turns a `simulation_id` into a report task, validates that the simulation, project, graph, and simulation requirement exist, creates a `report_id`, and runs generation in a background thread. The frontend then watches logs and sections rather than waiting for one final blob.

Sources: [backend/app/api/report.py:25-192](), [frontend/src/components/Step4Report.vue:2020-2165]()

The important reframe is that report generation is not only "write a summary." It is an evidence assembly loop:

```text
simulation_id
  -> graph_id + simulation_requirement
  -> ReportAgent
  -> outline
  -> section_01.md, section_02.md, ...
  -> full_report.md
  -> interaction workbench
```

Sources: [backend/app/services/report_agent.py:1532-1550](), [backend/app/services/report_agent.py:1630-1729]()

## What Is The Simplest Version?

The simplest useful sandbox question is: "What happened in this simulation?" MiroFish answers that by using graph context to plan a report outline, then by generating each section independently. The planning prompt receives the simulation requirement, graph statistics, entity types, active entity count, and related facts, then asks the model for a JSON outline. If outline planning fails, the code falls back to a three-section default report shape.

Sources: [backend/app/services/zep_tools.py:890-941](), [backend/app/services/report_agent.py:1160-1219]()

The persisted report model is intentionally file-shaped: metadata, outline, progress, per-section Markdown, logs, and the final assembled report all live under the report folder. That makes partial output inspectable and recoverable.

Sources: [backend/app/services/report_agent.py:1884-1955](), [backend/app/services/report_agent.py:2200-2228](), [backend/app/services/report_agent.py:2447-2497]()

## Where Does Complexity Become Necessary?

Complexity becomes necessary once the system must answer "why?" and "what evidence supports that?" The `ReportAgent` uses a ReACT loop for each section: ask for tool calls, execute only the first tool call in a turn, feed the observation back, require enough tool use before accepting a final answer, and log each step. The loop also handles malformed responses, such as a model returning both a tool call and a final answer in the same message.

Sources: [backend/app/services/report_agent.py:1221-1309](), [backend/app/services/report_agent.py:1324-1402](), [backend/app/services/report_agent.py:1432-1530]()

The available tools map to different evidence questions:

| User question | Tool path | What it contributes |
| --- | --- | --- |
| "What are the strongest findings?" | `insight_forge` | Breaks a complex question into sub-queries, searches graph facts, extracts entities, and builds relationship chains. |
| "What changed over time?" | `panorama_search` | Separates active facts from historical or expired facts. |
| "Can you verify this one claim?" | `quick_search` | Runs a direct graph search and returns the most relevant facts. |
| "What would affected agents say?" | `interview_agents` | Selects simulated agents, generates interview questions, and calls the simulation interview API. |

Sources: [backend/app/services/report_agent.py:919-954](), [backend/app/services/zep_tools.py:945-1090](), [backend/app/services/zep_tools.py:1145-1270](), [backend/app/services/zep_tools.py:1272-1325]()

## What Can You Ask Now?

Once the report exists, the user can ask the sandbox questions that combine four evidence surfaces:

1. The report content already generated for the simulation.
2. The Zep graph and its facts, nodes, edges, entity summaries, and timelines.
3. Simulated agent profiles loaded from the simulation artifacts.
4. Live or batch interviews against the running simulation environment.

Sources: [backend/app/services/report_agent.py:1766-1865](), [backend/app/services/zep_tools.py:1505-1549](), [backend/app/api/simulation.py:990-1075](), [backend/app/api/simulation.py:2142-2325]()

Good sandbox questions are therefore not generic prompts. They are evidence-seeking prompts:

- "Which simulated groups amplified the issue, and what graph facts support that?"
- "Where did the report rely on current facts versus historical facts?"
- "Interview the most relevant agents about this risk and compare their answers."
- "Show me the relationship chain behind this conclusion."
- "Which section has the weakest evidence?"

The `/api/report/chat` endpoint supports this mode by reconstructing the simulation and project context, resolving the graph, creating a `ReportAgent`, and returning the agent's response, tool calls, and source queries. The frontend calls that endpoint through `chatWithReport`.

Sources: [backend/app/api/report.py:472-556](), [frontend/src/api/report.js:45-51](), [frontend/src/components/Step5Interaction.vue:682-710]()

## How Does The UI Keep Evidence Visible?

Step 4 is the report-generation cockpit. It renders the planned outline, generated sections, metrics, timeline events, tool calls, tool results, model responses, and console output. It polls `agent-log` every two seconds and `console-log` every 1.5 seconds, extracting `planning_complete`, `section_start`, `section_complete`, and `report_complete` into visible UI state.

Sources: [frontend/src/components/Step4Report.vue:7-65](), [frontend/src/components/Step4Report.vue:88-140](), [frontend/src/components/Step4Report.vue:2024-2165]()

Step 5 is the interaction workbench. It defaults to report-agent chat, but can switch to a selected simulated agent or a survey tab. It loads the completed report from agent logs, loads realtime Reddit profiles for the simulation, and routes messages either to the report agent or to the batch interview API.

Sources: [frontend/src/components/Step5Interaction.vue:421-456](), [frontend/src/components/Step5Interaction.vue:501-542](), [frontend/src/components/Step5Interaction.vue:645-760](), [frontend/src/components/Step5Interaction.vue:876-961]()

`InteractionView` keeps the graph and interaction panel separate. It resolves `reportId -> simulationId -> project -> graph`, then displays the graph panel next to `Step5Interaction`. That separation is what lets the workbench ask questions without hiding the graph evidence behind the chat surface.

Sources: [frontend/src/views/InteractionView.vue:38-61](), [frontend/src/views/InteractionView.vue:145-218]()

## How Provider-Neutral Is This?

The model client is OpenAI-compatible, but the actual API key, base URL, and model name come from environment configuration. That means the architecture is BYOK-friendly and can point at any provider that speaks the same chat-completions shape. It is not fully vendor-agnostic yet, because graph memory is explicitly backed by `zep_cloud.client.Zep` and `ZEP_API_KEY`.

```python
# backend/app/config.py
LLM_API_KEY = os.environ.get('LLM_API_KEY')
LLM_BASE_URL = os.environ.get('LLM_BASE_URL', 'https://api.openai.com/v1')
LLM_MODEL_NAME = os.environ.get('LLM_MODEL_NAME', 'gpt-4o-mini')
ZEP_API_KEY = os.environ.get('ZEP_API_KEY')
```

Sources: [backend/app/config.py:30-37](), [backend/app/config.py:66-74](), [backend/app/utils/llm_client.py:17-33](), [backend/app/services/zep_tools.py:425-440]()

For a Grok-Wiki or skill-pack integration, the portable boundary should be above the repository code: treat file, repository, and catalog skill sources as interchangeable knowledge inputs that shape planning and QA, not as a hosted-provider dependency. MiroFish already has useful internal boundaries to build on: frontend API wrappers, report IDs, simulation IDs, graph IDs, and injectable `llm_client` / `zep_tools` constructor arguments. The missing abstraction is a formal graph-memory adapter beyond Zep.

Sources: [frontend/src/api/report.js:1-51](), [frontend/src/api/simulation.js:35-50](), [frontend/src/api/simulation.js:171-177](), [backend/app/services/report_agent.py:884-910]()

## What Would Break If The Abstractions Disappeared?

If per-section files disappeared, the UI could no longer show section completion incrementally, and failure recovery would be weaker. If structured `agent_log.jsonl` disappeared, Step 4 and Step 5 would lose their source of outline, section, tool-call, and completion events. If the interview API disappeared, "ask the agents" would collapse into ordinary model speculation instead of querying the simulation environment.

Sources: [backend/app/services/report_agent.py:258-291](), [backend/app/services/report_agent.py:2019-2064](), [frontend/src/components/Step4Report.vue:2033-2065](), [backend/app/api/simulation.py:2142-2248]()

The closing answer is: you can now ask the sandbox for evidence-backed interpretation, not just output. The report agent can plan, search, inspect timelines, interview agents, assemble sections, persist evidence, and reopen that evidence in an interactive UI. Keep future integrations provider-neutral by making skill packs and model endpoints replaceable inputs, while preserving the repository's current report, graph, simulation, and file-backed evidence contracts. Sources: [backend/app/services/report_agent.py:1532-1729](), [frontend/src/views/InteractionView.vue:38-61]()

---
