# The First Question

> What is the smallest version of MiroFish: a user question, some seed files, and a provider-neutral runtime that can turn them into a simulated social world?

- Repository: 666ghj/MiroFish
- GitHub: https://github.com/666ghj/MiroFish
- Human wiki: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9
- Complete Markdown: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/llms-full.txt

## Source Files

- `README.md`
- `package.json`
- `backend/pyproject.toml`
- `backend/run.py`
- `backend/app/__init__.py`
- `backend/app/config.py`
- `frontend/src/router/index.js`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [package.json](package.json)
- [backend/pyproject.toml](backend/pyproject.toml)
- [backend/run.py](backend/run.py)
- [backend/app/__init__.py](backend/app/__init__.py)
- [backend/app/config.py](backend/app/config.py)
- [backend/app/api/graph.py](backend/app/api/graph.py)
- [backend/app/api/simulation.py](backend/app/api/simulation.py)
- [backend/app/models/project.py](backend/app/models/project.py)
- [backend/app/services/simulation_manager.py](backend/app/services/simulation_manager.py)
- [backend/app/services/simulation_runner.py](backend/app/services/simulation_runner.py)
- [backend/app/utils/llm_client.py](backend/app/utils/llm_client.py)
- [backend/app/services/ontology_generator.py](backend/app/services/ontology_generator.py)
- [backend/app/services/simulation_config_generator.py](backend/app/services/simulation_config_generator.py)
- [frontend/src/router/index.js](frontend/src/router/index.js)
- [frontend/src/views/Home.vue](frontend/src/views/Home.vue)
- [frontend/src/views/MainView.vue](frontend/src/views/MainView.vue)
- [frontend/src/components/Step1GraphBuild.vue](frontend/src/components/Step1GraphBuild.vue)
- [frontend/src/components/Step2EnvSetup.vue](frontend/src/components/Step2EnvSetup.vue)
- [frontend/src/api/graph.js](frontend/src/api/graph.js)
- [frontend/src/api/simulation.js](frontend/src/api/simulation.js)
</details>

# The First Question

What is the smallest useful MiroFish? Not the full promise of prediction, reporting, and deep interaction, but the first irreducible loop: a user asks a simulation question, uploads seed files, and a backend turns that material into agents, memory, configuration, and a runnable social simulation.

This page treats that first loop as an architecture pattern. Repository code is the source of truth. The selected Compound Engineering profile shaped the page as concept plus workflow plus QA review, but no `STRATEGY.md` or `docs/solutions/**` source was present in this checkout, so no prior local strategy or solved-problem note is cited.

## What Is The Simplest Version?

The product-level answer is already small: upload seed materials and describe the prediction requirement. The README says MiroFish extracts real-world seed information, builds a parallel digital world, and returns a prediction report plus an interactive simulated world; its workflow begins with graph building, then environment setup, then simulation. Sources: [README.md:27-32](), [README.md:86-92]()

The implementation makes that smaller still. The home screen only requires two inputs before moving forward: at least one `.pdf`, `.md`, or `.txt` file, and a non-empty `simulationRequirement`. It stores that pending upload locally and routes to `/process/new`, where backend calls begin. Sources: [frontend/src/views/Home.vue:136-149](), [frontend/src/views/Home.vue:238-240](), [frontend/src/views/Home.vue:297-310](), [frontend/src/router/index.js:9-25]()

So the smallest version is:

```text
seed files + simulation question
        |
        v
ontology + graph memory
        |
        v
agent profiles + simulation_config.json
        |
        v
OASIS runner process with twitter/reddit/parallel mode
```

## Where Does The First Question Enter The System?

The first question enters as `simulation_requirement`. The frontend posts files plus that requirement to `POST /api/graph/ontology/generate`; the backend requires both a simulation requirement and uploaded files, saves them into a project, extracts text, and calls the ontology generator. Sources: [frontend/src/api/graph.js:3-18](), [backend/app/api/graph.py:120-148](), [backend/app/api/graph.py:153-222]()

The important design choice is that MiroFish does not ask the frontend to keep passing the whole prompt, documents, ontology, and graph state between screens. It creates a server-side `Project` with files, extracted text, ontology, graph id, chunk settings, and the original `simulation_requirement`. Sources: [backend/app/models/project.py:1-4](), [backend/app/models/project.py:26-73](), [backend/app/models/project.py:133-174](), [backend/app/models/project.py:274-290]()

### Minimal Input Contract

| Input | Required? | Evidence | Why it matters |
|---|---:|---|---|
| Seed files | Yes | Backend rejects missing uploads and accepts configured extensions. | Gives the world its source material. |
| Simulation question | Yes | Backend rejects empty `simulation_requirement`. | Defines what the world is being built to explore. |
| Project name | No | Defaults to `Unnamed Project`. | Labels the stored project. |
| Additional context | No | Passed to ontology generation only when present. | Lets the user bias interpretation without changing files. |

Sources: [backend/app/api/graph.py:153-173](), [backend/app/config.py:38-45](), [backend/app/api/graph.py:175-222]()

## What Has To Exist Before Agents Can Run?

A user question is not enough. The system first converts documents into a graph-backed social substrate. The `/api/graph/build` endpoint requires an existing project, generated ontology, extracted text, and a Zep API key. It then chunks text, creates a graph, sets ontology, adds text batches, waits for processing, fetches graph data, and marks the project as `graph_completed`. Sources: [backend/app/api/graph.py:260-296](), [backend/app/api/graph.py:308-363](), [backend/app/api/graph.py:389-493]()

The frontend treats this as a background task. `MainView.vue` starts graph building, polls task status every two seconds, refreshes graph data, and advances the phase when the task completes. Sources: [frontend/src/views/MainView.vue:234-253](), [frontend/src/views/MainView.vue:276-349]()

That means the first question becomes usable only after it has a graph id:

```text
Project
  project_id
  simulation_requirement
  extracted_text.txt
  ontology
  graph_id       <- required before simulation creation
```

Sources: [backend/app/models/project.py:35-50](), [backend/app/api/simulation.py:197-224]()

## Where Does Complexity Become Necessary?

Complexity starts when graph entities must become social agents. `POST /api/simulation/create` creates a `SimulationState` with project id, graph id, platform toggles, counts, generated config status, and runtime status. Sources: [backend/app/api/simulation.py:163-229](), [backend/app/services/simulation_manager.py:25-35](), [backend/app/services/simulation_manager.py:43-112](), [backend/app/services/simulation_manager.py:194-228]()

Then `POST /api/simulation/prepare` performs the expensive work asynchronously. It checks for already prepared files, retrieves the project question and extracted text, previews entity counts from the graph, creates a task, and runs preparation in a background thread. Sources: [backend/app/api/simulation.py:359-399](), [backend/app/api/simulation.py:424-490](), [backend/app/api/simulation.py:507-625]()

Inside `SimulationManager.prepare_simulation`, the system reads filtered Zep entities, generates OASIS profiles, writes Reddit JSON and Twitter CSV profile files, generates simulation parameters, writes `simulation_config.json`, and sets the simulation to `ready`. Sources: [backend/app/services/simulation_manager.py:230-302](), [backend/app/services/simulation_manager.py:329-374](), [backend/app/services/simulation_manager.py:384-443]()

### Preparation Outputs

| Output file/state | Produced by | Purpose |
|---|---|---|
| `state.json` | `SimulationManager._save_simulation_state` | Durable simulation lifecycle state. |
| `reddit_profiles.json` | `OasisProfileGenerator.save_profiles` via manager | Agent profiles for Reddit-style simulation. |
| `twitter_profiles.csv` | `OasisProfileGenerator.save_profiles` via manager | Agent profiles in OASIS Twitter CSV format. |
| `simulation_config.json` | `SimulationConfigGenerator.generate_config` | Time, event, platform, and per-agent behavior config. |

Sources: [backend/app/services/simulation_manager.py:145-155](), [backend/app/services/simulation_manager.py:361-425](), [backend/app/api/simulation.py:240-312]()

## What Makes The Runtime Provider-Neutral?

The LLM boundary is provider-neutral by API shape, not by absence of dependencies. Configuration uses `LLM_API_KEY`, `LLM_BASE_URL`, and `LLM_MODEL_NAME`; README explicitly frames this as any LLM API compatible with the OpenAI SDK format. The `LLMClient` passes the configured key and base URL into the OpenAI client, so BYOK is already represented by environment variables. Sources: [README.md:115-127](), [backend/app/config.py:30-37](), [backend/app/utils/llm_client.py:17-33]()

```python
# backend/app/utils/llm_client.py
self.api_key = api_key or Config.LLM_API_KEY
self.base_url = base_url or Config.LLM_BASE_URL
self.model = model or Config.LLM_MODEL_NAME
self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
```

Sources: [backend/app/utils/llm_client.py:17-33]()

The current implementation is not fully vendor-agnostic across every boundary. It depends on Zep Cloud for graph memory and OASIS/CAMEL for social simulation packages. For BYOC/BYOK architecture, the LLM side is portable through OpenAI-compatible configuration; the memory and simulation sides would need explicit adapter interfaces if users must bring a different graph store or simulation engine. Sources: [backend/pyproject.toml:11-35](), [backend/app/config.py:35-49](), [backend/app/api/graph.py:286-296](), [backend/app/services/simulation_runner.py:387-420]()

## How Does The Question Become A World?

Ontology generation is the first model-mediated translation. `OntologyGenerator` takes document texts, the simulation requirement, and optional context, then asks for entity and relationship types suitable for social-media public opinion simulation. It requires entity type names in English PascalCase, relationship names in English upper snake case, and returns JSON. Sources: [backend/app/services/ontology_generator.py:29-89](), [backend/app/services/ontology_generator.py:176-227](), [backend/app/services/ontology_generator.py:231-260]()

Simulation configuration is the second translation. `SimulationConfigGenerator` turns the question, document text, graph id, and entities into time settings, event settings, per-agent activity settings, and platform settings. It stores model and base URL metadata in the generated config, which is useful for auditability without hard-coding a provider into the product flow. Sources: [backend/app/services/simulation_config_generator.py:1-11](), [backend/app/services/simulation_config_generator.py:147-197](), [backend/app/services/simulation_config_generator.py:200-241](), [backend/app/services/simulation_config_generator.py:243-277]()

## How Does It Run?

The run boundary is deliberately process-based. `POST /api/simulation/start` validates `simulation_id`, platform, optional `max_rounds`, optional graph memory updates, and ready state. It then calls `SimulationRunner.start_simulation`, updates simulation status to `running`, and returns the run state. Sources: [backend/app/api/simulation.py:1451-1527](), [backend/app/api/simulation.py:1528-1627]()

`SimulationRunner` loads `simulation_config.json`, calculates total rounds from total simulated hours and minutes per round, chooses one of `run_twitter_simulation.py`, `run_reddit_simulation.py`, or `run_parallel_simulation.py`, and launches it as a subprocess in the simulation directory. It writes the main log to `simulation.log` and monitors platform-specific `actions.jsonl` files. Sources: [backend/app/services/simulation_runner.py:313-360](), [backend/app/services/simulation_runner.py:387-479](), [backend/app/services/simulation_runner.py:481-547]()

The frontend exposes the same abstraction: after preparation, it lets users keep auto-generated rounds or apply a `max_rounds` cap, then moves to the simulation run step. Sources: [frontend/src/components/Step2EnvSetup.vue:438-525](), [frontend/src/components/Step2EnvSetup.vue:694-707](), [frontend/src/components/Step2EnvSetup.vue:743-758]()

## What Would Break If An Abstraction Disappeared?

| Abstraction | If removed | Evidence |
|---|---|---|
| `ProjectManager` | The frontend would need to carry extracted text, ontology, graph id, and files across calls. | Project state is persisted server-side. |
| `TaskManager`-style async flow | Graph build and simulation preparation would block request/response paths. | Both graph build and prepare run in background threads and are polled. |
| OpenAI-compatible LLM config | Users could not bring compatible model endpoints with their own keys. | LLM key/base/model are env-driven. |
| Zep graph boundary | Entity reading, graph build, and optional graph memory updates lose their current storage substrate. | Graph APIs require `ZEP_API_KEY` and graph ids. |
| Script runner boundary | The web backend would have to embed OASIS execution directly. | Runner starts scripts as subprocesses and monitors logs. |

Sources: [backend/app/models/project.py:101-130](), [backend/app/api/graph.py:364-513](), [backend/app/api/simulation.py:490-612](), [backend/app/config.py:30-37](), [backend/app/services/simulation_runner.py:416-448]()

## The First Question, Reframed

MiroFish’s smallest architecture is not “an AI answer generator.” It is a compiler from a user’s question and seed files into a runnable social world: ontology, graph memory, agent profiles, behavior parameters, event seeds, and a process-managed simulation. The current LLM boundary supports BYOK/BYOC through OpenAI-compatible environment configuration, while graph memory and simulation execution are concrete Zep and OASIS choices that would need adapters for full vendor-agnostic substitution. Sources: [README.md:86-92](), [backend/app/config.py:30-49](), [backend/app/services/simulation_manager.py:230-248](), [backend/app/services/simulation_runner.py:196-205]()
