# How Does a Graph Become a Society?

> The preparation stage reframes entities as agents: which nodes are eligible, what profile fields OASIS needs, and where LLM-generated configuration becomes platform-specific behavior without hard-coding one model provider.

- Repository: 666ghj/MiroFish
- GitHub: https://github.com/666ghj/MiroFish
- Human wiki: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9
- Complete Markdown: https://grok-wiki.com/public/wiki/666ghj-mirofish-5af7beba06b9/llms-full.txt

## Source Files

- `backend/app/api/simulation.py`
- `backend/app/services/simulation_manager.py`
- `backend/app/services/zep_entity_reader.py`
- `backend/app/services/oasis_profile_generator.py`
- `backend/app/services/simulation_config_generator.py`
- `backend/scripts/test_profile_format.py`
- `frontend/src/components/Step2EnvSetup.vue`
- `frontend/src/api/simulation.js`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [backend/app/api/simulation.py](backend/app/api/simulation.py)
- [backend/app/services/simulation_manager.py](backend/app/services/simulation_manager.py)
- [backend/app/services/zep_entity_reader.py](backend/app/services/zep_entity_reader.py)
- [backend/app/services/oasis_profile_generator.py](backend/app/services/oasis_profile_generator.py)
- [backend/app/services/simulation_config_generator.py](backend/app/services/simulation_config_generator.py)
- [backend/app/config.py](backend/app/config.py)
- [backend/app/utils/llm_client.py](backend/app/utils/llm_client.py)
- [backend/scripts/test_profile_format.py](backend/scripts/test_profile_format.py)
- [frontend/src/components/Step2EnvSetup.vue](frontend/src/components/Step2EnvSetup.vue)
- [frontend/src/api/simulation.js](frontend/src/api/simulation.js)
</details>

# How Does a Graph Become a Society?

A graph becomes a society only after MiroFish answers three practical questions: which graph nodes are eligible to become agents, what profile fields OASIS needs to run them, and which generated parameters turn static profiles into platform behavior. The preparation stage is where that translation happens.

This page follows the bundled Compound Engineering wiki guidance for page shape, but repository code is the source of truth for implementation claims. No `STRATEGY.md` or `docs/solutions/**` source was found in the focused scan, so this page does not cite product strategy or solved-problem notes as evidence.

## What Problem Exists?

The source graph is not yet a runnable simulation. A raw node can be a default graph artifact, a person, an institution, or an event-adjacent concept. OASIS needs agent profile files, per-agent behavior settings, event triggers, platform knobs, and a simulation state that says the environment is ready.

MiroFish solves this by splitting preparation into an API orchestration layer, a graph reader, a profile generator, and a simulation config generator. The lifecycle starts with `POST /api/simulation/create`, continues through asynchronous `POST /api/simulation/prepare`, and becomes ready only after profile files plus `simulation_config.json` exist and state says configuration was generated.  
Sources: [backend/app/api/simulation.py:165-229](), [backend/app/api/simulation.py:240-356](), [backend/app/services/simulation_manager.py:230-248]()

```text
Zep graph nodes
  -> eligible EntityNode objects
  -> OASIS agent profiles
  -> generated activity/event/platform config
  -> ready simulation directory
```

## What Is the Simplest Version?

The simplest society builder would keep only nodes with meaningful domain labels, assign each one a username and persona, and save the files expected by the simulator. MiroFish implements exactly that baseline before adding LLM enrichment.

Eligibility is label-based: a node must have at least one label other than the generic `"Entity"` or `"Node"`. If the caller supplies `defined_entity_types`, the node must match one of those labels. When enrichment is enabled, the reader also attaches related edges and neighbor node summaries, so profile generation can see context instead of just names.  
Sources: [backend/app/services/zep_entity_reader.py:22-51](), [backend/app/services/zep_entity_reader.py:215-331]()

```python
# backend/app/services/zep_entity_reader.py
custom_labels = [l for l in labels if l not in ["Entity", "Node"]]

if not custom_labels:
    continue

if defined_entity_types:
    matching_labels = [l for l in custom_labels if l in defined_entity_types]
    if not matching_labels:
        continue
```

## Where Does Complexity Become Necessary?

Complexity appears when graph nodes need to behave differently as social actors. A student, a media outlet, and a university cannot share the same behavior profile. The preparation manager therefore makes graph reading only the first stage, then generates profiles, writes platform-specific profile files, generates simulation parameters, and marks the state `READY`.

The manager also fails early when no eligible entities are found, which is the correct boundary: no agents means no society.  
Sources: [backend/app/services/simulation_manager.py:272-303](), [backend/app/services/simulation_manager.py:304-443]()

| Stage | Owner | Output |
|---|---|---|
| Read graph | `ZepEntityReader` | `FilteredEntities` with `entities_count` and `entity_types` |
| Generate profiles | `OasisProfileGenerator` | `reddit_profiles.json` and/or `twitter_profiles.csv` |
| Generate config | `SimulationConfigGenerator` | `simulation_config.json` |
| Persist readiness | `SimulationManager` | `state.json` with `status = ready` |

## Which Profile Fields Does OASIS Need?

MiroFish’s internal profile object includes common identity fields (`user_id`, `user_name`, `name`, `bio`, `persona`), social counters, demographic/persona metadata, topic interests, and source graph references. It then converts the same internal profile into different OASIS-facing formats for Reddit and Twitter.  
Sources: [backend/app/services/oasis_profile_generator.py:29-140]()

Reddit profiles are JSON records with `user_id`, `username`, `name`, `bio`, `persona`, `karma`, `created_at`, and normalized OASIS-required demographic fields such as `age`, `gender`, `mbti`, and `country`. Twitter profiles are CSV rows with `user_id`, `name`, `username`, `user_char`, and `description`; `user_char` merges `bio` and `persona` for internal agent prompting.  
Sources: [backend/app/services/oasis_profile_generator.py:1047-1119](), [backend/app/services/oasis_profile_generator.py:1121-1193]()

The test script documents an older/expected-format check for Twitter CSV and Reddit JSON and is useful as validation context, but the current generator is more authoritative because it writes the production files used by preparation.  
Sources: [backend/scripts/test_profile_format.py:20-123](), [backend/scripts/test_profile_format.py:130-159]()

## How Does LLM Output Become Behavior?

The LLM does not directly run the simulation. It produces structured JSON that is parsed into dataclasses: time configuration, event configuration, agent activity configuration, and platform configuration. This distinction matters: generation is advisory, but the parsed config becomes platform-specific runtime input.

`SimulationConfigGenerator` builds a context from the simulation requirement, entity summaries, and original document text. It then generates time config, event config, batched agent configs, assigns initial posts to matching agent types, and finally attaches Twitter/Reddit platform configs when enabled.  
Sources: [backend/app/services/simulation_config_generator.py:147-197](), [backend/app/services/simulation_config_generator.py:243-379](), [backend/app/services/simulation_config_generator.py:381-432]()

```mermaid
flowchart LR
  subgraph API["API layer"]
    Create["/api/simulation/create"]
    Prepare["/api/simulation/prepare"]
    Status["/api/simulation/prepare/status"]
  end

  subgraph Prep["Preparation services"]
    Reader["ZepEntityReader"]
    Profiles["OasisProfileGenerator"]
    ConfigGen["SimulationConfigGenerator"]
    Manager["SimulationManager"]
  end

  subgraph Files["Simulation directory"]
    State["state.json"]
    Reddit["reddit_profiles.json"]
    Twitter["twitter_profiles.csv"]
    SimConfig["simulation_config.json"]
  end

  Create --> Manager
  Prepare --> Manager
  Status --> State
  Manager --> Reader
  Manager --> Profiles
  Manager --> ConfigGen
  Profiles --> Reddit
  Profiles --> Twitter
  ConfigGen --> SimConfig
  Manager --> State
```

Sources: [backend/app/api/simulation.py:359-625](), [backend/app/services/simulation_manager.py:329-443](), [backend/app/api/simulation.py:642-745]()

## What Makes It Platform-Specific Without Hard-Coding One Model Provider?

Platform-specific behavior is encoded as data, not as a model-provider branch. Twitter and Reddit differ in profile file format and platform configuration weights, while the LLM provider is selected through `LLM_API_KEY`, `LLM_BASE_URL`, and `LLM_MODEL_NAME`. The implementation uses the OpenAI-compatible Python client, so it is provider-neutral at the OpenAI-compatible API boundary, though not a generic SDK abstraction.  
Sources: [backend/app/config.py:30-37](), [backend/app/services/oasis_profile_generator.py:181-199](), [backend/app/services/simulation_config_generator.py:225-241](), [backend/app/utils/llm_client.py:17-33]()

| Concern | Current implementation | BYOC/BYOK implication |
|---|---|---|
| Model key | `LLM_API_KEY` from environment | User can bring their own key |
| Model endpoint | `LLM_BASE_URL` from environment | Any OpenAI-compatible endpoint can be configured |
| Model name | `LLM_MODEL_NAME` from environment | Model selection is config-driven |
| Graph source | `ZEP_API_KEY` and Zep client | Current graph memory provider is Zep-specific |
| Platform output | JSON/CSV plus config objects | OASIS behavior is file/config-driven, not provider-driven |

## How Does the Frontend See the Society Forming?

The UI treats preparation as a live process. It starts `prepareSimulation`, receives a task id and expected entity count, polls task status every two seconds, polls profiles every three seconds, and switches to config polling when the backend reaches the configuration stage. Once config exists, the UI shows agent counts, time settings, agent behavior cards, platform recommendation weights, narrative direction, hot topics, and initial posts mapped to agent ids.  
Sources: [frontend/src/api/simulation.js:11-68](), [frontend/src/components/Step2EnvSetup.vue:64-180](), [frontend/src/components/Step2EnvSetup.vue:182-345](), [frontend/src/components/Step2EnvSetup.vue:771-899](), [frontend/src/components/Step2EnvSetup.vue:955-1016]()

The frontend also computes default run rounds from generated `time_config` rather than a hard-coded value, while still letting the user override the run length before starting the simulation.  
Sources: [frontend/src/components/Step2EnvSetup.vue:438-525](), [frontend/src/components/Step2EnvSetup.vue:694-707](), [frontend/src/components/Step2EnvSetup.vue:743-758]()

## What Would Break If This Abstraction Disappeared?

If eligibility lived inside profile generation, default graph nodes could become agents accidentally. If profile generation wrote only one platform format, OASIS would lose either Reddit JSON or Twitter CSV compatibility. If LLM output were allowed to bypass parsing, invalid JSON, invalid `poster_type`, or overlarge activation counts could leak into runtime behavior.

The code guards these boundaries by filtering labels before generation, normalizing/generating profile defaults, retrying and repairing JSON responses, clamping time activation counts to available agents, and mapping event `poster_type` values to real agent ids with fallbacks.  
Sources: [backend/app/services/zep_entity_reader.py:252-331](), [backend/app/services/oasis_profile_generator.py:524-581](), [backend/app/services/simulation_config_generator.py:434-533](), [backend/app/services/simulation_config_generator.py:611-644](), [backend/app/services/simulation_config_generator.py:728-811]()

## Closing Summary

A MiroFish graph becomes a society when labeled graph entities are filtered, enriched with relationship context, transformed into OASIS-compatible profiles, paired with generated time/event/agent/platform behavior, and persisted as a ready simulation directory. The architecture is BYOK-friendly for LLMs through environment-configured OpenAI-compatible endpoints, while the current graph reader remains Zep-specific and should be treated as the provider boundary for any future BYOC graph-memory work.  
Sources: [backend/app/services/simulation_manager.py:230-443](), [backend/app/config.py:30-49]()