Agent-readable wiki
Presenton Plain-Language Wiki
Presenton is a fully self-hosted, open-source AI presentation generator. You give it a topic or a document, it calls an LLM of your choice, and hands back a polished, editable slide deck — no SaaS subscription required.
Pages
- Explain It Simply: What Is Presenton?What this app does in one sentence, the simplest useful analogy, and the three ideas every reader should hold on to before diving deeper.
- Bring Your Own Brain: How LLM Providers WorkPresenton does not pick an AI for you. This page explains the 15 supported providers, how the app selects a default model per provider, and what BYOK (bring your own key) means in practice.
- From Prompt to Slides: The Generation PipelineStep by step: how a user's topic travels through outline generation, layout selection, per-slide content calls, image fetching, and final assembly — all coordinated by the FastAPI backend.
- What You Can Feed It: Documents, PDFs & ImagesPresenton can read uploaded PDFs, Office files, images (via OCR), and plain text to ground the presentation in real content. This page covers the document loader, LiteParse integration, and image generation service.
- Slide Templates & Layouts: How Designs Get ChosenPresenton ships with named design templates (general, pitch-deck, Education, Code, etc.) and per-slide layouts. The LLM picks a layout index for each slide based on content rules baked into the system prompt.
- The Python Backend: FastAPI ServerThe FastAPI server is the engine room: it hosts the REST + SSE API, manages the SQLite/Alembic database, runs background export tasks, validates LLM responses, and serves static assets. Every LLM call, slide write, and export goes through here.
- The Browser UI: Next.js FrontendThe Next.js app is the face the user sees: a dashboard to manage decks, an outline editor, a live slide editor with Tiptap rich text, and an export flow (PDF, PPTX, Google Slides). Routes map directly to stages of the generation workflow.
- The Desktop App: Electron WrapperThe Electron app bundles the FastAPI backend (as a sidecar process) and the Next.js frontend into a single installable for Mac, Windows, and Linux — no Docker needed. This page explains how the build scripts wire everything together.
Complete Markdown
# Presenton Plain-Language Wiki
> Presenton is a fully self-hosted, open-source AI presentation generator. You give it a topic or a document, it calls an LLM of your choice, and hands back a polished, editable slide deck — no SaaS subscription required.
## Context Links
- [Agent index](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/llms.txt)
- [Human interactive wiki](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc)
- [GitHub repository](https://github.com/presenton/presenton)
## Repository Metadata
- Repository: presenton/presenton
- Generated: 2026-05-24T05:28:45.844Z
- Updated: 2026-05-24T05:29:07.660Z
- Runtime: Claude Code
- Format: Explain Like I'm 5
- Pages: 8
## Page Index
- 01. [Explain It Simply: What Is Presenton?](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/01-explain-it-simply-what-is-presenton.md) - What this app does in one sentence, the simplest useful analogy, and the three ideas every reader should hold on to before diving deeper.
- 02. [Bring Your Own Brain: How LLM Providers Work](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/02-bring-your-own-brain-how-llm-providers-work.md) - Presenton does not pick an AI for you. This page explains the 15 supported providers, how the app selects a default model per provider, and what BYOK (bring your own key) means in practice.
- 03. [From Prompt to Slides: The Generation Pipeline](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/03-from-prompt-to-slides-the-generation-pipeline.md) - Step by step: how a user's topic travels through outline generation, layout selection, per-slide content calls, image fetching, and final assembly — all coordinated by the FastAPI backend.
- 04. [What You Can Feed It: Documents, PDFs & Images](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/04-what-you-can-feed-it-documents-pdfs-images.md) - Presenton can read uploaded PDFs, Office files, images (via OCR), and plain text to ground the presentation in real content. This page covers the document loader, LiteParse integration, and image generation service.
- 05. [Slide Templates & Layouts: How Designs Get Chosen](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/05-slide-templates-layouts-how-designs-get-chosen.md) - Presenton ships with named design templates (general, pitch-deck, Education, Code, etc.) and per-slide layouts. The LLM picks a layout index for each slide based on content rules baked into the system prompt.
- 06. [The Python Backend: FastAPI Server](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/06-the-python-backend-fastapi-server.md) - The FastAPI server is the engine room: it hosts the REST + SSE API, manages the SQLite/Alembic database, runs background export tasks, validates LLM responses, and serves static assets. Every LLM call, slide write, and export goes through here.
- 07. [The Browser UI: Next.js Frontend](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/07-the-browser-ui-next.js-frontend.md) - The Next.js app is the face the user sees: a dashboard to manage decks, an outline editor, a live slide editor with Tiptap rich text, and an export flow (PDF, PPTX, Google Slides). Routes map directly to stages of the generation workflow.
- 08. [The Desktop App: Electron Wrapper](https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/08-the-desktop-app-electron-wrapper.md) - The Electron app bundles the FastAPI backend (as a sidecar process) and the Next.js frontend into a single installable for Mac, Windows, and Linux — no Docker needed. This page explains how the build scripts wire everything together.
## Source File Index
- `docker-compose.yml`
- `electron/app/main.ts`
- `electron/build_nextjs_resources.js`
- `electron/build.js`
- `electron/copy_fastapi_assets.js`
- `electron/package.json`
- `package.json`
- `README.md`
- `servers/fastapi/api/lifespan.py`
- `servers/fastapi/api/main.py`
- `servers/fastapi/api/middlewares.py`
- `servers/fastapi/api/v1/ppt/background_tasks.py`
- `servers/fastapi/api/v1/ppt/endpoints/ollama.py`
- `servers/fastapi/api/v1/ppt/endpoints/presentation.py`
- `servers/fastapi/api/v1/ppt/endpoints/theme_generate.py`
- `servers/fastapi/api/v1/ppt/router.py`
- `servers/fastapi/constants/documents.py`
- `servers/fastapi/constants/llm.py`
- `servers/fastapi/constants/presentation.py`
- `servers/fastapi/constants/supported_ollama_models.py`
- `servers/fastapi/enums/llm_provider.py`
- `servers/fastapi/services/concurrent_service.py`
- `servers/fastapi/services/database.py`
- `servers/fastapi/services/document_conversion_service.py`
- `servers/fastapi/services/documents_loader.py`
- `servers/fastapi/services/export_task_service.py`
- `servers/fastapi/services/image_generation_service.py`
- `servers/fastapi/services/liteparse_service.py`
- `servers/fastapi/utils/llm_calls/generate_presentation_structure.py`
- `servers/fastapi/utils/model_availability.py`
- `servers/fastapi/utils/outline_utils.py`
- `servers/fastapi/utils/ppt_utils.py`
- `servers/fastapi/utils/theme_utils.py`
- `servers/nextjs/app/(export)/pdf-maker/page.tsx`
- `servers/nextjs/app/(presentation-generator)/(dashboard)/dashboard/page.tsx`
- `servers/nextjs/app/(presentation-generator)/components/EditableLayoutWrapper.tsx`
- `servers/nextjs/app/(presentation-generator)/components/PresentationRender.tsx`
- `servers/nextjs/app/(presentation-generator)/outline/page.tsx`
- `servers/nextjs/app/(presentation-generator)/presentation/page.tsx`
- `servers/nextjs/app/presentation-templates`
- `start.js`
- `VISION.md`
---
## 01. Explain It Simply: What Is Presenton?
> What this app does in one sentence, the simplest useful analogy, and the three ideas every reader should hold on to before diving deeper.
- Page Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/01-explain-it-simply-what-is-presenton.md
- Generated: 2026-05-24T05:20:19.222Z
### Source Files
- `README.md`
- `VISION.md`
- `docker-compose.yml`
- `start.js`
- `package.json`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [VISION.md](VISION.md)
- [docker-compose.yml](docker-compose.yml)
- [start.js](start.js)
- [servers/fastapi/api/v1/ppt/endpoints/presentation.py](servers/fastapi/api/v1/ppt/endpoints/presentation.py)
- [servers/fastapi/models/generate_presentation_request.py](servers/fastapi/models/generate_presentation_request.py)
- [servers/fastapi/enums/llm_provider.py](servers/fastapi/enums/llm_provider.py)
- [servers/fastapi/enums/image_provider.py](servers/fastapi/enums/image_provider.py)
- [servers/fastapi/mcp_server.py](servers/fastapi/mcp_server.py)
- [Dockerfile](Dockerfile)
</details>
# Explain It Simply: What Is Presenton?
This page gives a first-principles orientation to Presenton for anyone who just cloned the repo or stumbled across it. It answers the one-sentence question, offers the simplest analogy that holds up under inspection, and pins down the three ideas you must carry into every other page of this wiki.
Presenton is an **open-source, self-hosted AI presentation generator**: you give it text or documents, it calls an LLM and an image provider of your choice, and it hands you back a finished PowerPoint (or PDF) slide deck — all running on your own machine or server, with none of your data sent to a SaaS platform.
---
## The One-Sentence Summary
Presenton is a **bring-your-own-model (BYOM) document engine** that turns a prompt or uploaded file into a styled, exportable slide deck, and exposes the same workflow as a REST API, an MCP server, and a desktop app.
Sources: [README.md:18-34](), [VISION.md:1-17]()
---
## The Simplest Useful Analogy
Think of Presenton as a **smart print shop that you own**.
You bring the raw material — a topic sentence, a PDF report, a CSV of sales figures — and the shop turns it into a finished, designed document. But unlike a cloud print shop, this one runs entirely on your premises:
- You choose which printing press to use (OpenAI, Google Gemini, a local Ollama model, AWS Bedrock, Azure, Anthropic, Fireworks, Together AI, LM Studio, or any OpenAI-compatible endpoint).
- You choose the photo supplier for slide images (DALL-E 3, Gemini Flash, Pexels, Pixabay, ComfyUI, or Open WebUI).
- The finished product (a `.pptx` or `.pdf` file) belongs to you, stored in a volume you mount.
- You can swap press, designer, and template at any time without changing anything else.
The analogy maps directly to code: the `LLM` env variable picks the text press ([`servers/fastapi/enums/llm_provider.py:4-19`]()), the `IMAGE_PROVIDER` variable picks the photo supplier ([`servers/fastapi/enums/image_provider.py:4-13`]()), and the `template` field in the API request picks the design ([`servers/fastapi/models/generate_presentation_request.py:29-30`]()).
---
## The Three Ideas to Hold On To
### 1. Model-Agnostic by Design (BYOK / BYOM)
Presenton makes no assumptions about which AI provider you use. The `LLMProvider` enum lists 15 backends, from `openai` and `anthropic` to `ollama` and `custom` (any OpenAI-compatible URL):
```python
# servers/fastapi/enums/llm_provider.py:4-19
class LLMProvider(Enum):
OLLAMA = "ollama"
OPENAI = "openai"
GOOGLE = "google"
VERTEX = "vertex"
AZURE = "azure"
BEDROCK = "bedrock"
OPENROUTER = "openrouter"
FIREWORKS = "fireworks"
TOGETHER = "together"
CEREBRAS = "cerebras"
ANTHROPIC = "anthropic"
LITELLM = "litellm"
LMSTUDIO = "lmstudio"
CUSTOM = "custom"
CODEX = "codex"
```
Image generation is equally flexible — nine providers are enumerated in `ImageProvider`, covering AI-generated images (DALL-E 3, Gemini Flash, ComfyUI) and stock photo APIs (Pexels, Pixabay) ([`servers/fastapi/enums/image_provider.py:4-13`]()). Switching providers is a one-line environment variable change with no code edits.
Sources: [servers/fastapi/enums/llm_provider.py:1-19](), [servers/fastapi/enums/image_provider.py:1-13]()
---
### 2. Three Layers Work Together Inside One Container
The runtime is not a monolith and not a microservice cluster — it is **three processes inside one container**, orchestrated by a single Node.js supervisor:
```text
┌───────────────────────────── Docker container (port 80) ──────────────────────────────┐
│ │
│ nginx (reverse proxy, port 80) │
│ │ │
│ ├──► Next.js (port 3000) ── UI + server-side API proxying │
│ │ │
│ └──► FastAPI (port 8000) ── presentation logic, LLM calls, DB, export │
│ │
│ FastMCP (port 8001) ── MCP server, auto-generated from FastAPI's OpenAPI spec │
│ │
│ start.js ─ launches and supervises all three; exits when any process exits │
└────────────────────────────────────────────────────────────────────────────────────────┘
│
└── ./app_data/ (mounted volume — presentations, exports, fonts, uploads)
```
`start.js` is the entrypoint. It writes `userConfig.json` from environment variables, ensures `app_data/` directories exist with correct permissions, starts nginx, spawns FastAPI (`python server.py --port 8000`) and Next.js in parallel, and waits for both readiness signals before printing the startup banner. If either process exits, the container exits ([`start.js:468-588`]()).
The MCP server (`mcp_server.py`) is generated automatically from the FastAPI OpenAPI spec using `FastMCP.from_openapi`, meaning any API endpoint is immediately accessible over the Model Context Protocol without a separate implementation ([`servers/fastapi/mcp_server.py:38-42`]()).
Sources: [start.js:28-44](), [start.js:468-588](), [servers/fastapi/mcp_server.py:14-57]()
---
### 3. A Structured Generation Pipeline, Not Just Prompt → File
Presenton does not simply dump a prompt into an LLM and parse the output as a slide deck. The generation endpoint chains multiple steps:
| Step | What happens | Key file |
|---|---|---|
| **Outline** | LLM generates a structured outline (title, sections, per-slide bullet points) | `utils/llm_calls/generate_presentation_outlines.py` |
| **Structure** | Outline is mapped to a concrete slide-layout index sequence | `utils/llm_calls/generate_presentation_structure.py` |
| **Slide content** | Each slide's content is generated from its outline item and layout type | `utils/llm_calls/generate_slide_content.py` |
| **Assets** | Images are fetched or generated per slide; icons are resolved | `utils/process_slides.py`, `services/image_generation_service.py` |
| **Export** | Final deck is rendered to PPTX or PDF | `utils/export_utils.py` |
The `GeneratePresentationRequest` model exposes the user-facing knobs for this pipeline: `content`, `n_slides`, `tone` (default/casual/professional/funny/educational/sales_pitch), `verbosity` (concise/standard/text-heavy), `web_search`, `language`, `template`, and `export_as` ([`servers/fastapi/models/generate_presentation_request.py:8-47`]()).
The same generation request is reachable over three surfaces simultaneously:
- **Web UI** — served by Next.js, calls FastAPI internally
- **REST API** — `POST /api/v1/ppt/presentation/generate` with HTTP Basic auth
- **MCP** — any MCP-compatible AI agent can call the same endpoint through `mcp_server.py`
Sources: [servers/fastapi/models/generate_presentation_request.py:8-47](), [servers/fastapi/api/v1/ppt/endpoints/presentation.py:1-80]()
---
## How the Pieces Fit: Architecture at a Glance
```mermaid
flowchart LR
subgraph Input["Input surfaces"]
Browser["Browser / Next.js UI"]
API["REST API client"]
MCPClient["MCP agent"]
end
subgraph Core["FastAPI backend (port 8000)"]
PresentationEndpoint["/api/v1/ppt/presentation/generate"]
OutlineGen["Outline generation"]
StructureGen["Structure mapping"]
SlideGen["Slide content generation"]
ImageFetch["Image & icon resolution"]
ExportUtil["PPTX / PDF export"]
end
subgraph Providers["External / local providers"]
LLMProvider["LLM\n(OpenAI · Gemini · Ollama · Anthropic · …)"]
ImgProvider["Image\n(DALL-E · Pexels · ComfyUI · …)"]
end
subgraph Storage["app_data volume"]
DB["SQLite / Postgres DB"]
Files["exports · images · fonts"]
end
Browser --> PresentationEndpoint
API --> PresentationEndpoint
MCPClient -->|"FastMCP port 8001"| PresentationEndpoint
PresentationEndpoint --> OutlineGen --> StructureGen --> SlideGen --> ImageFetch --> ExportUtil
OutlineGen & SlideGen --> LLMProvider
ImageFetch --> ImgProvider
ExportUtil --> Files
PresentationEndpoint --> DB
```
Sources: [start.js:28-44](), [servers/fastapi/api/v1/ppt/endpoints/presentation.py:1-80](), [docker-compose.yml:1-14]()
---
## Deployment Shapes
Presenton ships in three mutually compatible forms:
| Form | Command / entry | Best for |
|---|---|---|
| **Docker** | `docker run … ghcr.io/presenton/presenton:latest` | Zero-dependency self-hosting |
| **Docker Compose** | `docker compose up production` | Team or cloud deployment with `.env` config |
| **Electron desktop app** | `npm run dev` in `electron/` | Local development, offline use |
In all three forms, the same `start.js` entrypoint runs the same FastAPI + Next.js + MCP stack. The Electron app simply embeds this stack inside a native desktop window ([README.md:161-191]()).
GPU support (for local Ollama models) is a single flag: `--gpus=all` on Docker, or `START_OLLAMA=true` to install Ollama at container startup ([docker-compose.yml:96-110]()).
Sources: [README.md:161-212](), [docker-compose.yml:96-110](), [start.js:268-277]()
---
## Summary
Presenton is an open-source, self-hosted AI presentation engine — a structured pipeline (outline → layout → content → images → export) wrapped in a web UI, REST API, and MCP server, all running inside a single Docker container. Its defining commitment is provider neutrality: any LLM and any image backend can be swapped by changing one environment variable, with no code changes and no data leaving your infrastructure. Every other concept in this wiki — templates, memory, chat editing, export formats, authentication — is built on top of these three pillars: model-agnosticism, a three-process runtime, and a multi-step structured generation pipeline.
Sources: [VISION.md:3-17](), [servers/fastapi/enums/llm_provider.py:4-19]()
---
## 02. Bring Your Own Brain: How LLM Providers Work
> Presenton does not pick an AI for you. This page explains the 15 supported providers, how the app selects a default model per provider, and what BYOK (bring your own key) means in practice.
- Page Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/02-bring-your-own-brain-how-llm-providers-work.md
- Generated: 2026-05-24T05:21:13.219Z
### Source Files
- `servers/fastapi/enums/llm_provider.py`
- `servers/fastapi/constants/llm.py`
- `servers/fastapi/utils/model_availability.py`
- `servers/fastapi/constants/supported_ollama_models.py`
- `servers/fastapi/api/v1/ppt/endpoints/ollama.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [servers/fastapi/enums/llm_provider.py](servers/fastapi/enums/llm_provider.py)
- [servers/fastapi/constants/llm.py](servers/fastapi/constants/llm.py)
- [servers/fastapi/utils/llm_provider.py](servers/fastapi/utils/llm_provider.py)
- [servers/fastapi/utils/model_availability.py](servers/fastapi/utils/model_availability.py)
- [servers/fastapi/utils/get_env.py](servers/fastapi/utils/get_env.py)
- [servers/fastapi/utils/available_models.py](servers/fastapi/utils/available_models.py)
- [servers/fastapi/utils/ollama.py](servers/fastapi/utils/ollama.py)
- [servers/fastapi/constants/supported_ollama_models.py](servers/fastapi/constants/supported_ollama_models.py)
- [servers/fastapi/api/v1/ppt/endpoints/ollama.py](servers/fastapi/api/v1/ppt/endpoints/ollama.py)
</details>
# Bring Your Own Brain: How LLM Providers Work
Presenton does not bundle an AI model and does not pick one for you. Instead, it reads a single environment variable — `LLM` — and routes every generation request to whichever AI provider and model you configured. This design is sometimes called BYOK (Bring Your Own Key) or BYOB (Bring Your Own Brain): you own the relationship with the AI vendor, you supply the credentials, and Presenton just calls it.
This page explains the 15 supported providers, how the application picks a default model when you do not specify one, what credentials each provider requires, and how the Ollama local-inference path works in more detail.
---
## The Provider Enum: All 15 Names in One Place
Every provider the app can talk to is listed in the `LLMProvider` Python enum:
```python
# servers/fastapi/enums/llm_provider.py
class LLMProvider(Enum):
OLLAMA = "ollama"
OPENAI = "openai"
GOOGLE = "google"
VERTEX = "vertex"
AZURE = "azure"
BEDROCK = "bedrock"
OPENROUTER = "openrouter"
FIREWORKS = "fireworks"
TOGETHER = "together"
CEREBRAS = "cerebras"
ANTHROPIC = "anthropic"
LITELLM = "litellm"
LMSTUDIO = "lmstudio"
CUSTOM = "custom"
CODEX = "codex"
```
Sources: [servers/fastapi/enums/llm_provider.py:4-19]()
The string value of each member — `"openai"`, `"ollama"`, and so on — is what you write in the `LLM` environment variable. The function `get_llm_provider()` reads `os.getenv("LLM")` and converts it to this enum; if the value does not match any member it raises an HTTP 500 with a human-readable error message listing all valid choices.
Sources: [servers/fastapi/utils/llm_provider.py:44-56](), [servers/fastapi/utils/get_env.py:49-51]()
---
## How the App Selects an Active Provider
The flow is simple: one environment variable, one enum lookup, done.
```text
Environment FastAPI startup
───────────────── ──────────────────────────────────────────────
LLM=anthropic ──► get_llm_provider_env() → "anthropic"
LLMProvider("anthropic") → LLMProvider.ANTHROPIC
get_model() → "claude-sonnet-4-20250514"
(or ANTHROPIC_MODEL if set)
```
`get_model()` in `utils/llm_provider.py` is the single function that resolves the active model name at runtime. It checks the provider-specific environment variable first (e.g. `ANTHROPIC_MODEL`), then falls back to the hardcoded default constant.
Sources: [servers/fastapi/utils/llm_provider.py:119-164]()
---
## Default Models Per Provider
When you set the `LLM` variable but do not supply a model name, Presenton uses these hardcoded defaults:
| Provider | Default model | Environment variable to override |
|---|---|---|
| `openai` | `gpt-4.1` | `OPENAI_MODEL` |
| `google` | `models/gemini-2.5-flash` | `GOOGLE_MODEL` |
| `vertex` | `gemini-2.5-flash` | `VERTEX_MODEL` |
| `azure` | `gpt-4.1` | `AZURE_OPENAI_MODEL` or `AZURE_OPENAI_DEPLOYMENT` |
| `bedrock` | `us.anthropic.claude-3-5-haiku-20241022-v1:0` | `BEDROCK_MODEL` |
| `openrouter` | `openai/gpt-4o` | `OPENROUTER_MODEL` |
| `fireworks` | `accounts/fireworks/models/llama-v3p1-8b-instruct` | `FIREWORKS_MODEL` |
| `together` | `openai/gpt-oss-20b` | `TOGETHER_MODEL` |
| `cerebras` | `llama-3.3-70b` | `CEREBRAS_MODEL` |
| `anthropic` | `claude-sonnet-4-20250514` | `ANTHROPIC_MODEL` |
| `litellm` | `gpt-4.1` | `LITELLM_MODEL` |
| `lmstudio` | `openai/gpt-oss-20b` | `LMSTUDIO_MODEL` |
| `codex` | `gpt-5.2` | `CODEX_MODEL` |
| `ollama` | *(none — required)* | `OLLAMA_MODEL` |
| `custom` | *(none — required)* | `CUSTOM_MODEL` |
The two local-inference providers — `ollama` and `custom` — have no default because the app cannot know what models are installed on your machine. Their model variables are mandatory.
Sources: [servers/fastapi/constants/llm.py:1-17](), [servers/fastapi/utils/llm_provider.py:119-164]()
---
## What BYOK Means in Practice
BYOK is straightforward: **no credentials live inside Presenton**. Every API key is read from the process environment via `os.getenv(...)`. The app validates on startup (when `CAN_CHANGE_KEYS` is not `"false"`) that the required variables are present, and for several providers it goes further and queries the provider's models API to verify that the configured model actually exists in your account.
### Startup Validation by Provider
```
providers that check key presence only:
openai → OPENAI_API_KEY
google → GOOGLE_API_KEY
anthropic → ANTHROPIC_API_KEY
openrouter→ OPENROUTER_API_KEY
cerebras → CEREBRAS_API_KEY
providers that also validate model availability:
openai → queries https://api.openai.com/v1/models
google → queries genai.Client().models.list()
anthropic → queries https://api.anthropic.com/v1/models
fireworks → queries provider's /v1/models endpoint
together → queries provider's /v1/models endpoint
lmstudio → queries local server's /v1/models endpoint
custom → queries CUSTOM_LLM_URL/v1/models
providers with special credential forms:
azure → AZURE_OPENAI_API_KEY + AZURE_OPENAI_API_VERSION
+ (AZURE_OPENAI_ENDPOINT or AZURE_OPENAI_BASE_URL)
bedrock → BEDROCK_MODEL (required) +
BEDROCK_API_KEY OR (BEDROCK_AWS_ACCESS_KEY_ID + BEDROCK_AWS_SECRET_ACCESS_KEY)
vertex → VERTEX_API_KEY OR (VERTEX_PROJECT + optional VERTEX_LOCATION)
(combining both raises an error)
litellm → LITELLM_BASE_URL + LITELLM_MODEL (no live model check)
ollama → OLLAMA_MODEL present + model in supported list + auto-pull on startup
```
Sources: [servers/fastapi/utils/model_availability.py:64-241]()
The `CAN_CHANGE_KEYS` flag exists for deployment scenarios (e.g. Docker images with pre-baked keys) where runtime key swapping should be disabled. When it is set to `"false"`, the startup checks are skipped.
Sources: [servers/fastapi/utils/model_availability.py:65-66](), [servers/fastapi/utils/get_env.py:10-11]()
---
## How Model Validation Works
For providers with a compatible OpenAI-style API (Fireworks, Together, LM Studio, and the `custom` provider), Presenton calls a shared utility that normalizes the base URL and uses the `AsyncOpenAI` SDK to list available models:
```python
# servers/fastapi/utils/available_models.py:64-75
async def list_available_openai_compatible_models(url: str, api_key: str) -> list[str]:
url = normalize_openai_compatible_base_url(url)
if is_together_api_base_url(url):
return await list_together_models(url, api_key)
effective_key = (api_key or "").strip() or "EMPTY"
client = AsyncOpenAI(api_key=effective_key, base_url=url)
models = (await client.models.list()).data
...
```
Together's API returns a non-standard payload, so it gets a dedicated `aiohttp`-based fetcher. Google and Anthropic have their own model listing functions that call their respective SDKs directly.
Sources: [servers/fastapi/utils/available_models.py:1-98]()
---
## The Ollama Path: Local Inference Without an API Key
Ollama is the only provider that requires no external account. It connects to a local Ollama process (default: `http://localhost:11434`, overridable with `OLLAMA_URL`).
### Supported Models
The supported Ollama model list is a hardcoded dictionary of `OllamaModelMetadata` objects grouped by model family:
| Family | Example models | Size range |
|---|---|---|
| Llama 3/3.1/3.2/3.3/4 | `llama3:8b`, `llama3.1:70b`, `llama4:16x17b` | 1.3 GB – 245 GB |
| Gemma 3 | `gemma3:1b` – `gemma3:27b` | 815 MB – 17 GB |
| Gemma 4 | `gemma4:e2b` – `gemma4:31b` (with quantized variants) | 7.2 GB – 63 GB |
| DeepSeek R1 | `deepseek-r1:1.5b` – `deepseek-r1:671b` | 1.1 GB – 404 GB |
| Qwen 3 | `qwen3:0.6b` – `qwen3:235b` | 523 MB – 142 GB |
| Qwen 3.5 | `qwen3.5:2b` – `qwen3.5:122b` | 2.7 GB – 81 GB |
| GPT-OSS | `gpt-oss:20b`, `gpt-oss:120b` | 14 GB – 65 GB |
All families are merged into the single `SUPPORTED_OLLAMA_MODELS` dict that the API and startup validator read.
Sources: [servers/fastapi/constants/supported_ollama_models.py:4-311]()
### Pull-on-Startup
When the `ollama` provider is active and `CAN_CHANGE_KEYS` is not `"false"`, startup automatically pulls the model from the Ollama registry if it is not already present:
```python
# servers/fastapi/utils/model_availability.py:213-226
elif is_ollama_selected():
ollama_model = get_ollama_model_env()
if ollama_model not in SUPPORTED_OLLAMA_MODELS:
raise Exception(f"Model {ollama_model} is not supported")
async for event in pull_ollama_model(ollama_model):
print(event)
```
The pull streams NDJSON progress events from the Ollama API (`POST /api/pull`) and prints them to stdout. The REST endpoint (`GET /api/v1/ppt/ollama/model/pull`) exposes the same capability for on-demand pulls from the UI, tracking pull status in a database table (`OllamaPullStatus`) to survive across requests.
Sources: [servers/fastapi/utils/ollama.py:10-31](), [servers/fastapi/api/v1/ppt/endpoints/ollama.py:28-85]()
---
## The `custom` and `litellm` Escape Hatches
Two providers exist specifically to connect Presenton to infrastructure you control:
- **`custom`**: Point `CUSTOM_LLM_URL` at any server that speaks the OpenAI `/v1` protocol (vLLM, Ollama in API mode, a local proxy). Supply `CUSTOM_MODEL` and optionally `CUSTOM_LLM_API_KEY`. Startup queries `/v1/models` to confirm the model is reachable.
- **`litellm`**: Point `LITELLM_BASE_URL` at a running [LiteLLM](https://github.com/BerriAI/litellm) proxy. LiteLLM itself can fan out to hundreds of providers, so this path effectively extends Presenton's provider list to anything LiteLLM supports. A model name (`LITELLM_MODEL`) is required; no live model check is performed at startup because LiteLLM proxies vary in how they respond to `/v1/models`.
Sources: [servers/fastapi/utils/model_availability.py:178-197](), [servers/fastapi/utils/get_env.py:205-215]()
---
## Provider Architecture Overview
```text
┌─────────────────────────────────────────────────────────────┐
│ Environment │
│ LLM=<provider> *_API_KEY=<key> *_MODEL=<name> │
└────────────────────────┬────────────────────────────────────┘
│ os.getenv()
▼
┌─────────────────────────────────────────────────────────────┐
│ FastAPI startup (model_availability.py) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ get_llm_provider() → LLMProvider enum value │ │
│ │ Validate credentials present │ │
│ │ (optional) query provider /models API │ │
│ │ Ollama: auto-pull if not local │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────┬────────────────────────────────────┘
│ all OK → serve requests
▼
┌─────────────────────────────────────────────────────────────┐
│ Request path (utils/llm_provider.py → get_model()) │
│ env model var set? ─yes─► use it │
│ │ no │
│ └──────────────► DEFAULT_*_MODEL constant │
└────────────────────────┬────────────────────────────────────┘
│
┌──────────────┼──────────────────────────┐
▼ ▼ ▼
Cloud APIs Local servers Proxy layers
(OpenAI, Google, (Ollama :11434, (LiteLLM, OpenRouter,
Anthropic, LM Studio :1234, custom URL)
Bedrock, …) custom URL)
```
---
## Quick Reference: Required Environment Variables
| Provider (`LLM=`) | Required | Optional / Alternative |
|---|---|---|
| `openai` | `OPENAI_API_KEY` | `OPENAI_MODEL` |
| `google` | `GOOGLE_API_KEY` | `GOOGLE_MODEL` |
| `vertex` | `VERTEX_API_KEY` **or** `VERTEX_PROJECT` | `VERTEX_LOCATION`, `VERTEX_MODEL` |
| `azure` | `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_API_VERSION`, `AZURE_OPENAI_ENDPOINT` or `AZURE_OPENAI_BASE_URL` | `AZURE_OPENAI_MODEL`, `AZURE_OPENAI_DEPLOYMENT` |
| `bedrock` | `BEDROCK_MODEL` + (`BEDROCK_API_KEY` or access+secret key pair) | `BEDROCK_REGION`, `BEDROCK_AWS_SESSION_TOKEN` |
| `openrouter` | `OPENROUTER_API_KEY` | `OPENROUTER_MODEL` |
| `fireworks` | `FIREWORKS_API_KEY`, `FIREWORKS_MODEL` | `FIREWORKS_BASE_URL` |
| `together` | `TOGETHER_API_KEY`, `TOGETHER_MODEL` | `TOGETHER_BASE_URL` |
| `cerebras` | `CEREBRAS_API_KEY` | `CEREBRAS_MODEL`, `CEREBRAS_BASE_URL` |
| `anthropic` | `ANTHROPIC_API_KEY` | `ANTHROPIC_MODEL` |
| `litellm` | `LITELLM_BASE_URL`, `LITELLM_MODEL` | `LITELLM_API_KEY` |
| `lmstudio` | `LMSTUDIO_MODEL` | `LMSTUDIO_BASE_URL` (default: `localhost:1234`), `LMSTUDIO_API_KEY` |
| `ollama` | `OLLAMA_MODEL` (must be in supported list) | `OLLAMA_URL` (default: `localhost:11434`) |
| `custom` | `CUSTOM_LLM_URL`, `CUSTOM_MODEL` | `CUSTOM_LLM_API_KEY` |
| `codex` | OAuth tokens (`CODEX_ACCESS_TOKEN`, etc.) | `CODEX_MODEL` |
Sources: [servers/fastapi/utils/get_env.py:49-317](), [servers/fastapi/utils/model_availability.py:64-241]()
---
## Summary
Presenton treats the AI layer as a pure plug-in: the `LLM` environment variable selects one of 15 providers, credentials stay in environment variables you manage, and the app validates everything at startup before accepting requests. For most cloud providers a sensible default model is pre-configured so you only need one key to get started; for local inference with Ollama the app will even download the model for you. The `custom` and `litellm` providers extend this to any OpenAI-compatible server, making the provider list effectively unlimited. All of this is implemented in under 200 lines across `enums/llm_provider.py`, `constants/llm.py`, `utils/llm_provider.py`, and `utils/model_availability.py`.
Sources: [servers/fastapi/utils/llm_provider.py:119-164]()
---
## 03. From Prompt to Slides: The Generation Pipeline
> Step by step: how a user's topic travels through outline generation, layout selection, per-slide content calls, image fetching, and final assembly — all coordinated by the FastAPI backend.
- Page Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/03-from-prompt-to-slides-the-generation-pipeline.md
- Generated: 2026-05-24T05:21:05.724Z
### Source Files
- `servers/fastapi/api/v1/ppt/endpoints/presentation.py`
- `servers/fastapi/utils/llm_calls/generate_presentation_structure.py`
- `servers/fastapi/utils/outline_utils.py`
- `servers/fastapi/api/v1/ppt/background_tasks.py`
- `servers/fastapi/services/concurrent_service.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [servers/fastapi/api/v1/ppt/endpoints/presentation.py](servers/fastapi/api/v1/ppt/endpoints/presentation.py)
- [servers/fastapi/utils/llm_calls/generate_presentation_outlines.py](servers/fastapi/utils/llm_calls/generate_presentation_outlines.py)
- [servers/fastapi/utils/llm_calls/generate_presentation_structure.py](servers/fastapi/utils/llm_calls/generate_presentation_structure.py)
- [servers/fastapi/utils/llm_calls/generate_slide_content.py](servers/fastapi/utils/llm_calls/generate_slide_content.py)
- [servers/fastapi/utils/outline_utils.py](servers/fastapi/utils/outline_utils.py)
- [servers/fastapi/utils/process_slides.py](servers/fastapi/utils/process_slides.py)
- [servers/fastapi/services/concurrent_service.py](servers/fastapi/services/concurrent_service.py)
- [servers/fastapi/api/v1/ppt/background_tasks.py](servers/fastapi/api/v1/ppt/background_tasks.py)
</details>
# From Prompt to Slides: The Generation Pipeline
When a user types a topic and clicks "Generate," Presenton turns that plain-text prompt into a fully populated, visually designed slide deck. This page traces every step of that journey — from the first HTTP request to the final file written to disk — as it actually runs inside the FastAPI backend.
Understanding this pipeline matters if you want to extend Presenton (adding a new layout, swapping the LLM provider, tuning asset fetching), debug a broken generation, or just reason about where latency comes from.
---
## Overview: The Three Main Phases
The pipeline has three conceptually distinct phases that run in sequence:
```text
┌─────────────────────────────────────────────────────────────────────┐
│ Phase 1 – PLANNING │
│ Outline generation → TOC injection → Layout selection │
├─────────────────────────────────────────────────────────────────────┤
│ Phase 2 – CONTENT │
│ Per-slide content calls (batched, concurrent) │
├─────────────────────────────────────────────────────────────────────┤
│ Phase 3 – ASSEMBLY │
│ Asset fetching (images + icons) → DB persist → Export │
└─────────────────────────────────────────────────────────────────────┘
```
There are two entry points, both calling the same shared handler:
| Endpoint | Mode | Returns |
|---|---|---|
| `POST /presentation/generate` | Synchronous | `PresentationPathAndEditPath` immediately |
| `POST /presentation/generate/async` | Background task | Task record with a polling `id` |
Both resolve to `generate_presentation_handler()` in `presentation.py`.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:1018-1075]()
---
## Phase 1: Planning — Outlines, TOC, and Layout
### Step 1.1 — Validation and Setup
Before any LLM call, `check_if_api_request_is_valid()` enforces business rules:
- `content`, `slides_markdown`, or `files` must be present.
- `n_slides` must be between 1 and `MAX_NUMBER_OF_SLIDES`.
- If table-of-contents is requested, at least 3 slides are required.
- The `template` must be a known built-in name or a `custom-<uuid>` pointing to a real DB record.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:570-625]()
### Step 1.2 — Document Loading (optional)
When the request includes uploaded `files`, a `DocumentsLoader` extracts their text and concatenates it into `additional_context`. This context is passed verbatim to the outline prompt, letting the LLM draw from an uploaded PDF or DOCX without the user needing to paste the text manually.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:652-661]()
### Step 1.3 — Outline Generation via Streaming LLM Call
The outline step asks the LLM to produce a structured list of slide summaries — one per slide — using `generate_ppt_outline()`.
The function builds two prompt halves:
- **System prompt** (`get_system_prompt()`): Sets verbosity target (concise ≈ 20 words/slide, standard ≈ 40 words, text-heavy ≈ 60 words), tone, language, title-slide rules, and Markdown format requirements.
- **User prompt** (`get_user_prompt()`): Injects the user's topic, desired slide count, language, tone, today's date, and the extracted document context.
The call uses **streaming JSON** — the LLM sends tokens as they arrive and the backend accumulates them. If `web_search=True`, a `WebSearchTool` is attached so the model can fetch live facts before drafting slides.
```python
# generate_presentation_outlines.py:205-229
async for event in stream_generate_events(client, **get_generate_kwargs(
model=model,
messages=get_messages(...),
response_format=response_format,
tools=([WebSearchTool()] if use_search_tool else None),
stream=True,
)):
if getattr(event, "type", None) == "content":
yield event.chunk
```
The streamed text is parsed with `dirtyjson` (a lenient JSON parser), then validated against a `PresentationOutlineModel` with exactly `n_slides` entries.
Sources: [servers/fastapi/utils/llm_calls/generate_presentation_outlines.py:172-237](), [servers/fastapi/api/v1/ppt/endpoints/presentation.py:701-744]()
### Step 1.4 — Slide Count Adjustment for Table of Contents
If the user requested a TOC, the backend subtracts the number of TOC slides from the outline count so the final deck still totals the requested `n_slides`. The helpers in `outline_utils.py` do the math:
- `get_no_of_outlines_to_generate_for_n_slides()` — how many content outlines to request from the LLM when some slides will be TOC placeholders.
- `get_no_of_toc_required_for_n_outlines()` — how many TOC slides to insert, given the outline count.
- `get_presentation_outline_model_with_toc()` — inserts synthetic TOC `SlideOutlineModel` entries at the correct position (after the title slide) with page-number annotations.
Sources: [servers/fastapi/utils/outline_utils.py:44-137]()
### Step 1.5 — Layout Selection (`generate_presentation_structure`)
With outlines ready, the backend asks the LLM to assign a **slide layout index** to each outline — choosing from the slide templates available in the selected theme.
Two prompts exist:
- **Standard prompt** (`GET_MESSAGES_SYSTEM_PROMPT`): Encourages visual variety, content-driven layout choices (process → process layout, data → chart layout, etc.), and alternating adjacent layouts.
- **Markdown input prompt** (`STRUCTURE_FROM_SLIDES_MARKDOWN_SYSTEM_PROMPT`): Used when the user provided raw slide markdown instead of a topic; enforces stricter table-and-chart selection rules.
The response is a JSON array of integers — one layout index per slide — validated against a dynamically-built Pydantic schema that has exactly `n_slides` entries.
```python
# generate_presentation_structure.py:135-184
async def generate_presentation_structure(
presentation_outline, presentation_layout, instructions, using_slides_markdown
) -> PresentationStructureModel:
...
content = await generate_structured_with_schema_retries(
client, model, messages=messages,
response_format=response_format, ...
)
return PresentationStructureModel(**content)
```
If the layout is marked `ordered` (a fixed-sequence theme), the LLM step is skipped and indices are derived directly from the theme definition.
Sources: [servers/fastapi/utils/llm_calls/generate_presentation_structure.py:135-184](), [servers/fastapi/api/v1/ppt/endpoints/presentation.py:797-840]()
---
## Phase 2: Content — Per-Slide LLM Calls
### Step 2.1 — Batched Concurrent Content Generation
Each slide now has an outline (what to say) and a layout index (how to display it). The per-slide content step fills in the layout's schema with real, structured JSON.
Slides are processed in **batches of 10**. Within each batch, all content calls run concurrently via `asyncio.gather`:
```python
# presentation.py:881-898
batch_size = 10
for start in range(0, len(slide_layouts), batch_size):
end = min(start + batch_size, len(slide_layouts))
content_tasks = [
get_slide_content_from_type_and_outline(
slide_layouts[i], presentation_outlines.slides[i],
language_to_use, request.tone.value, request.verbosity.value, request.instructions,
)
for i in range(start, end)
]
batch_contents = await asyncio.gather(*content_tasks)
```
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:879-935]()
### Step 2.2 — Slide Content LLM Call
For each slide, `get_slide_content_from_type_and_outline()` drives a separate LLM call:
1. The slide layout's `json_schema` is stripped of asset placeholder fields (`__image_url__`, `__icon_url__`) — those will be filled later.
2. A `__speaker_note__` field (100–500 chars of plain text) is injected into the schema.
3. The LLM receives the slide's markdown outline plus the schema and must return JSON that matches it exactly.
```python
# generate_slide_content.py:172-187
response_schema = remove_fields_from_schema(
slide_layout.json_schema, ["__image_url__", "__icon_url__"]
)
response_schema = add_field_in_schema(response_schema, {
"__speaker_note__": {"type": "string", "minLength": 100, "maxLength": 500, ...}
}, True)
```
The call uses `generate_structured_with_schema_retries` — if the model returns malformed JSON, it retries automatically.
Sources: [servers/fastapi/utils/llm_calls/generate_slide_content.py:161-215]()
---
## Phase 3: Assembly — Assets, Persistence, and Export
### Step 3.1 — Placeholder Injection (Streaming Path)
In the streaming variant (`GET /presentation/stream/{id}`), slides are streamed to the frontend as they complete. To avoid stalling the stream while images load, the backend calls `process_slide_add_placeholder_assets()` immediately after each slide is generated. This writes `/static/images/placeholder.jpg` and `/static/icons/placeholder.svg` into the slide content so the UI can render something right away.
Sources: [servers/fastapi/utils/process_slides.py:220-239](), [servers/fastapi/api/v1/ppt/endpoints/presentation.py:432-433]()
### Step 3.2 — Asset Fetching (Images and Icons)
`process_slide_and_fetch_assets()` resolves every `__image_prompt__` and `__icon_query__` field in each slide's content dict:
- **Images**: If the outline already contained an image URL (parsed by `get_images_for_slides_from_outline()` using a regex that finds `.jpg/.png/.webp` links), that URL is used directly. Otherwise, `ImageGenerationService.generate_image()` is called with the prompt text.
- **Icons**: `ICON_FINDER_SERVICE.search_icons()` takes the icon query string and an `icon_weight` from the layout theme. The first result URL is written back into `__icon_url__`. If nothing is found, a static placeholder SVG is used.
All image and icon fetches within a slide run concurrently via `asyncio.gather`. Asset tasks for one batch start **while the next batch's LLM calls are still running**, overlapping I/O with compute:
```python
# presentation.py:923-935
asset_tasks = [
asyncio.create_task(
process_slide_and_fetch_assets(
image_generation_service, slide,
outline_image_urls=image_urls_for_batch[offset],
icon_weight=layout_model.icon_weight,
)
)
for offset, slide in enumerate(batch_slides)
]
async_assets_generation_tasks.extend(asset_tasks)
```
Sources: [servers/fastapi/utils/process_slides.py:16-90](), [servers/fastapi/api/v1/ppt/endpoints/presentation.py:923-944]()
### Step 3.3 — Database Persistence
Once all slides and assets are ready, everything is written to the database in a single commit:
```python
# presentation.py:950-953
sql_session.add(presentation)
sql_session.add_all(slides)
sql_session.add_all(generated_assets)
await sql_session.commit()
```
`ImageAsset` records are stored alongside the slides so the file paths survive server restarts.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:949-953]()
### Step 3.4 — Export
`export_presentation()` converts the stored presentation into the requested format (`pptx`, PDF, or others). It receives the presentation ID and a cookie header forwarded from the original request, so the export worker can authenticate against the same session.
The completed path and an edit URL (`/presentation?id=<uuid>`) are returned to the caller.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:961-971]()
### Step 3.5 — Webhook Notification
After success (or failure), `CONCURRENT_SERVICE.run_task()` fires a webhook in the background without blocking the response. The `ConcurrentService` wraps each call in an `asyncio.Task` and keeps a reference set to prevent garbage collection before the task completes.
```python
# concurrent_service.py:16-37
def run_task(self, delay, callable, *args, **kwargs):
async def wrapper():
if delay: await asyncio.sleep(delay)
await callable(*args, **kwargs)
task = asyncio.create_task(wrapper())
self._background_tasks.add(task)
task.add_done_callback(self.on_task_done)
```
Sources: [servers/fastapi/services/concurrent_service.py:6-40]()
---
## Concurrency Model
```text
Batch 1 LLM calls (10 slides) ──────┐
├─ asyncio.gather ─► Batch 1 slides ready
Batch 1 asset tasks (start now) ─────┘ │
│ (running in background)
Batch 2 LLM calls (10 slides) ──────┐ │
├─ asyncio.gather ─► Batch 2 slides ready
Batch 2 asset tasks (start now) ─────┘
...
await asyncio.gather(*all_asset_tasks) ← waits for all assets at the end
```
This design means image generation and icon fetching for slide batch N overlap with LLM content generation for slide batch N+1, keeping GPU/network I/O from becoming a sequential bottleneck.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:880-944]()
---
## The Streaming Alternative
The `GET /presentation/stream/{id}` endpoint offers a different delivery model. Instead of waiting for all slides, it yields Server-Sent Events (SSE) in real time:
| SSE event type | When sent |
|---|---|
| `chunk` (opening brace) | Before any slide |
| `chunk` (slide JSON) | After each slide's content is generated |
| `slide_assets` | After each slide's assets resolve |
| `chunk` (closing brace) | After all slides |
| `complete` | Full `PresentationWithSlides` payload |
Asset tasks fire immediately for each slide and their results are flushed to the client as soon as they resolve, even while later slides are still being generated.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:385-519]()
---
## Alternate Input: Slides Markdown
Instead of a topic, callers can submit `slides_markdown` — a list of pre-written Markdown strings, one per slide. The pipeline adapts:
- Outline generation is **skipped**; the markdown is wrapped directly into `SlideOutlineModel` instances.
- A different system prompt is used for layout selection that focuses on matching layouts to markdown structure (table detection, image detection, etc.).
- `get_images_for_slides_from_outline()` scans each markdown string for embedded image URLs and passes them into asset fetching so they are used instead of generated images.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:748-756](), [servers/fastapi/utils/outline_utils.py:184-205]()
---
## Two-Phase UI Flow (Create + Prepare + Stream)
The API also exposes a more interactive, step-by-step path for the UI:
1. `POST /presentation/create` — saves the raw request and returns a `PresentationModel` ID.
2. `POST /presentation/prepare` — takes user-edited outlines and a chosen layout, runs layout selection, inserts TOC entries, and saves the structure.
3. `GET /presentation/stream/{id}` — streams slide content and assets using the prepared structure.
This allows the UI to show the generated outlines and let the user revise them before content is committed, without re-running the expensive outline LLM call.
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:232-363](), [servers/fastapi/api/v1/ppt/endpoints/presentation.py:365-519]()
---
## Summary
A user's prompt travels through five LLM interactions (outline, layout selection, and one call per slide) coordinated entirely within a single async Python process. The FastAPI backend pipelines these calls to overlap network I/O — asset fetching runs concurrently with content generation for subsequent slides — so the total wall-clock time is far less than the sum of individual call latencies. The final output is a file on disk plus a database record that the frontend uses to render the editable presentation view. The full orchestration lives in `generate_presentation_handler()` at [servers/fastapi/api/v1/ppt/endpoints/presentation.py:628-989]().
---
## 04. What You Can Feed It: Documents, PDFs & Images
> Presenton can read uploaded PDFs, Office files, images (via OCR), and plain text to ground the presentation in real content. This page covers the document loader, LiteParse integration, and image generation service.
- Page Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/04-what-you-can-feed-it-documents-pdfs-images.md
- Generated: 2026-05-24T05:22:39.690Z
### Source Files
- `servers/fastapi/services/documents_loader.py`
- `servers/fastapi/services/liteparse_service.py`
- `servers/fastapi/services/image_generation_service.py`
- `servers/fastapi/services/document_conversion_service.py`
- `servers/fastapi/constants/documents.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [servers/fastapi/services/documents_loader.py](servers/fastapi/services/documents_loader.py)
- [servers/fastapi/services/liteparse_service.py](servers/fastapi/services/liteparse_service.py)
- [servers/fastapi/services/document_conversion_service.py](servers/fastapi/services/document_conversion_service.py)
- [servers/fastapi/services/image_generation_service.py](servers/fastapi/services/image_generation_service.py)
- [servers/fastapi/constants/documents.py](servers/fastapi/constants/documents.py)
</details>
# What You Can Feed It: Documents, PDFs & Images
When you ask Presenton to build a presentation, you are not limited to typing a topic into a text box. You can hand it real documents — a PDF report, a Word file, a spreadsheet, or even a scanned image — and it will extract the text content from those files to ground the generated slides in actual source material. This page explains exactly which file types are accepted, how each one is processed under the hood, and how the image generation system works as a separate concern.
## Accepted File Types
The full list of accepted formats is defined in a single constants file, so every API endpoint and upload validation step pulls from the same source.
**PDFs**
`.pdf`
**Plain text**
`.txt`
**Word-processor files (Office)**
`.doc`, `.docx`, `.docm`, `.odt`, `.rtf`
**Spreadsheets**
`.xls`, `.xlsx`, `.xlsm`, `.ods`, `.csv`, `.tsv`
**Presentations**
`.ppt`, `.pptx`, `.pptm`, `.odp`
**Images**
`.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.tif`, `.webp`, `.svg`
Each of these groups has a matching list of MIME types too, because some clients (particularly older browsers uploading legacy Office files) send a generic `application/octet-stream` content type. The server checks both the MIME type and the file extension to decide whether to accept the upload.
Sources: [servers/fastapi/constants/documents.py:1-82](servers/fastapi/constants/documents.py)
---
## How DocumentsLoader Routes Each File
`DocumentsLoader` is the entry point that decides what to do with each uploaded file. It inspects the file extension and dispatches to the appropriate handler:
```python
# servers/fastapi/services/documents_loader.py (simplified)
if extension in PDF_EXTENSIONS:
document, imgs = await self.load_pdf(file_path, load_text, load_images, temp_dir)
elif extension in TEXT_EXTENSIONS:
document = await self.load_text(file_path)
elif extension in OFFICE_EXTENSIONS:
document = await asyncio.to_thread(self.load_office_document, file_path, temp_dir)
elif extension in IMAGE_EXTENSIONS:
document = await asyncio.to_thread(self.load_image, file_path, temp_dir)
else:
document = await asyncio.to_thread(self._parse_with_liteparse, file_path)
```
Notice that `.txt` files skip the LiteParse pipeline entirely — they are read with a direct `open()` call since no conversion is needed. Everything else eventually passes through LiteParse, either directly or after a format-conversion step.
After extraction, every document passes through `clean_extracted_document_text()`, a function that strips any stray LiteParse JSON wrapper that may have been stored literally (for example, if the whole runner JSON line ended up as the document body). This is a resilience step that runs up to four passes until the text stabilises.
Sources: [servers/fastapi/services/documents_loader.py:197-240](servers/fastapi/services/documents_loader.py)
---
## The Two-Step Pipeline for Office Files and Images
Office documents (Word, PowerPoint, spreadsheets) and image files cannot be fed to LiteParse directly in their native formats. They go through a conversion step first.
```
┌────────────────────────────────────────────────────────┐
│ Uploaded file │
└──────────────┬────────────────────────┬───────────────┘
│ Office (.docx etc.) │ Image (.jpg etc.)
▼ ▼
LibreOffice (soffice) ImageMagick (magick/convert)
converts → .pdf converts → .png
│ │
└──────────┬─────────────┘
▼
LiteParse runner
(Node.js subprocess)
extracts text
as Markdown
```
**Office to PDF via LibreOffice**
`DocumentConversionService.convert_office_to_pdf()` calls `soffice --headless --convert-to pdf`. The binary path comes from the `SOFFICE_PATH` environment variable, falling back to `soffice` (Linux/macOS) or `soffice.exe` (Windows). If the expected `<stem>.pdf` file is not found after conversion, the service picks the newest `.pdf` in the output directory as a fallback.
Sources: [servers/fastapi/services/document_conversion_service.py:83-170](servers/fastapi/services/document_conversion_service.py)
**Image to PNG via ImageMagick**
`DocumentConversionService.convert_image_to_png()` calls `magick <input> <output>.png` (or `convert` on Linux systems where ImageMagick 6 is installed). The binary is resolved at startup: the `IMAGEMAGICK_BINARY` environment variable wins; otherwise the service probes for `magick` and then `convert` by running `<binary> -version`.
Sources: [servers/fastapi/services/document_conversion_service.py:172-241](servers/fastapi/services/document_conversion_service.py)
---
## LiteParse: The Core Text Extraction Engine
LiteParse is a Node.js package (`@llamaindex/liteparse`) that does the heavy lifting of turning PDFs and converted office documents into structured Markdown text, including OCR for scanned pages.
### How It Works
`LiteParseService.parse()` builds a command like:
```
node liteparse_runner.mjs \
--file /tmp/upload.pdf \
--ocr-enabled true \
--ocr-language eng \
--dpi 120 \
--num-workers 1 \
--python-bridge plain
```
The runner is a `.mjs` (ES module) file. The service looks for it in several candidate paths (Docker container path, repo resource directory, PyInstaller bundle layout) and picks the first one that exists.
There are two output modes, selected by the `LITEPARSE_RUNNER_OUTPUT` environment variable:
- **`plain` (default)** — stdout is the raw Markdown text. For large PDFs this avoids the overhead of JSON-encoding multi-megabyte strings.
- **`json`** — stdout is a single JSON line with `{ ok, text, filePath, pageCount }`. The service scans output lines in reverse to find the last valid JSON object, tolerating any stray log output before the payload.
### OCR Language Support
The language passed to LiteParse comes from the presentation language selected by the user. `presentation_language_to_ocr_code()` maps a human language label (e.g. `"French"`) to a Tesseract language code (e.g. `"fra"`). This means scanned documents in non-English languages can still be read correctly.
If an external OCR server is configured via `LITEPARSE_OCR_SERVER_URL`, the URL is appended to the command so LiteParse offloads recognition to that server instead of running Tesseract locally.
Sources: [servers/fastapi/services/liteparse_service.py:67-399](servers/fastapi/services/liteparse_service.py)
### Fallback Parser
If LiteParse fails (network issue, Node.js not installed, package missing), the loader tries an optional lightweight fallback:
```python
# servers/fastapi/services/documents_loader.py:27-30
try:
from services.lightweight_document_service import DocumentService as DocumentServiceCls
except Exception:
DocumentServiceCls = None
```
If the fallback also fails, a `500` HTTP error is raised with the original error message. This keeps the stack simple: try the best tool first, recover gracefully, then surface the failure clearly.
Sources: [servers/fastapi/services/documents_loader.py:303-330](servers/fastapi/services/documents_loader.py)
---
## PDF Page Images (for Vision-Based Slide Mapping)
Beyond text extraction, `DocumentsLoader` can also render each page of a PDF as a PNG image. This is used when the caller sets `load_images=True` and supplies a `temp_dir`. The implementation uses `pdfplumber` at 150 DPI:
```python
# servers/fastapi/services/documents_loader.py:332-341
with pdfplumber.open(file_path) as pdf:
for page in pdf.pages:
img = page.to_image(resolution=150)
image_path = os.path.join(temp_dir, f"page_{page.page_number}.png")
img.save(image_path)
images.append(image_path)
```
The resulting image paths are stored separately from the extracted text (`self._images` vs `self._documents`), so downstream code can pass page images to a vision model while still having the extracted text for context.
Sources: [servers/fastapi/services/documents_loader.py:332-347](servers/fastapi/services/documents_loader.py)
---
## Image Generation Service (Slide Illustrations)
`ImageGenerationService` is a separate concern from document ingestion. It does not read uploaded files — it generates new images to illustrate slides. It is provider-neutral by design: a single environment variable picks which backend to use, and the rest of the code is unchanged.
| Provider | Selection check | Backend |
|---|---|---|
| Pixabay | `is_pixabay_selected()` | Pixabay REST API (stock photos) |
| Pexels | `is_pixels_selected()` | Pexels REST API (stock photos) |
| DALL·E 3 | `is_dalle3_selected()` | OpenAI `dall-e-3` |
| GPT Image 1.5 | `is_gpt_image_1_5_selected()` | OpenAI `gpt-image-1.5` |
| Gemini Flash | `is_gemini_flash_selected()` | Google `gemini-2.5-flash-image` |
| NanoBanana Pro | `is_nanobanana_pro_selected()` | Google `gemini-3-pro-image-preview` |
| ComfyUI | `is_comfyui_selected()` | Self-hosted ComfyUI workflow API |
| Open WebUI | `is_open_webui_selected()` | OpenAI-compatible self-hosted endpoint |
| OpenAI-compatible | `is_openai_compatible_selected()` | Custom base URL + model |
| Disabled / none | fallthrough | Static placeholder image |
Stock photo providers (Pexels, Pixabay) receive only the image subject prompt; generative providers receive the full prompt including the slide theme. If any provider call fails, the service catches the exception and returns a placeholder image rather than failing the whole presentation generation.
Sources: [servers/fastapi/services/image_generation_service.py:41-122](servers/fastapi/services/image_generation_service.py)
### ComfyUI Workflow Injection
The ComfyUI integration is worth calling out: instead of a simple prompt-to-API call, it accepts a full ComfyUI workflow JSON (supplied via the `COMFYUI_WORKFLOW` environment variable). Presenton traverses the workflow graph to find the node titled `"Input Prompt"` and injects the generated prompt text into that node's text field. It then submits the workflow, polls the `/history/{prompt_id}` endpoint until the job completes, and downloads the resulting image.
Sources: [servers/fastapi/services/image_generation_service.py:401-540](servers/fastapi/services/image_generation_service.py)
---
## Summary
Presenton ingests uploaded files through a layered pipeline: plain text is read directly, Office files go through LibreOffice → LiteParse, images go through ImageMagick → LiteParse (with OCR), and PDFs go straight to LiteParse. Every path produces clean Markdown text that grounds the AI's slide generation in real document content. Image generation for slide illustrations is a completely separate, provider-swappable service that falls back gracefully to a placeholder when no backend is configured. The full set of accepted MIME types and extensions is declared once in `constants/documents.py` and reused everywhere, so adding a new format requires a change in only one place.
Sources: [servers/fastapi/constants/documents.py:67-82](servers/fastapi/constants/documents.py)
---
## 05. Slide Templates & Layouts: How Designs Get Chosen
> Presenton ships with named design templates (general, pitch-deck, Education, Code, etc.) and per-slide layouts. The LLM picks a layout index for each slide based on content rules baked into the system prompt.
- Page Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/05-slide-templates-layouts-how-designs-get-chosen.md
- Generated: 2026-05-24T05:28:45.839Z
### Source Files
- `servers/nextjs/app/presentation-templates`
- `servers/fastapi/constants/presentation.py`
- `servers/fastapi/utils/ppt_utils.py`
- `servers/fastapi/utils/theme_utils.py`
- `servers/fastapi/api/v1/ppt/endpoints/theme_generate.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [servers/nextjs/app/presentation-templates/index.tsx](servers/nextjs/app/presentation-templates/index.tsx)
- [servers/nextjs/app/presentation-templates/defaultSchemes.ts](servers/nextjs/app/presentation-templates/defaultSchemes.ts)
- [servers/nextjs/app/presentation-templates/general/IntroSlideLayout.tsx](servers/nextjs/app/presentation-templates/general/IntroSlideLayout.tsx)
- [servers/nextjs/app/presentation-templates/general/settings.json](servers/nextjs/app/presentation-templates/general/settings.json)
- [servers/nextjs/app/presentation-templates/neo-modern/settings.json](servers/nextjs/app/presentation-templates/neo-modern/settings.json)
- [servers/nextjs/app/presentation-templates/pitch-deck/settings.json](servers/nextjs/app/presentation-templates/pitch-deck/settings.json)
- [servers/fastapi/constants/presentation.py](servers/fastapi/constants/presentation.py)
- [servers/fastapi/templates/presentation_layout.py](servers/fastapi/templates/presentation_layout.py)
- [servers/fastapi/templates/get_layout_by_name.py](servers/fastapi/templates/get_layout_by_name.py)
- [servers/fastapi/templates/handler.py](servers/fastapi/templates/handler.py)
- [servers/fastapi/templates/providers.py](servers/fastapi/templates/providers.py)
- [servers/fastapi/utils/ppt_utils.py](servers/fastapi/utils/ppt_utils.py)
- [servers/fastapi/utils/theme_utils.py](servers/fastapi/utils/theme_utils.py)
- [servers/fastapi/utils/llm_calls/generate_presentation_structure.py](servers/fastapi/utils/llm_calls/generate_presentation_structure.py)
- [servers/fastapi/utils/llm_calls/generate_slide_content.py](servers/fastapi/utils/llm_calls/generate_slide_content.py)
- [servers/fastapi/api/v1/ppt/endpoints/theme_generate.py](servers/fastapi/api/v1/ppt/endpoints/theme_generate.py)
</details>
# Slide Templates & Layouts: How Designs Get Chosen
Presenton ships with a library of named design template groups — General, Modern, Standard, Swift, Code, Education, Product Overview, Report, Pitch Deck, and their "Neo" variants — each containing a fixed set of per-slide layouts. When the LLM generates a presentation, it receives a plain-text menu of available slide layouts and returns a list of integer indices, one per slide, telling the renderer which React component to use for each slide. The color palette is then generated independently and applied on top of whichever layout was chosen.
Understanding this system is essential for anyone adding new layouts, debugging unexpected slide appearances, or extending the template library. The design decision — layout index selection — happens entirely inside a structured LLM call, driven by two distinct system prompts. The React components on the Next.js side and the FastAPI backend on the Python side share layout metadata through a JSON schema that is served at runtime via the export service.
---
## Template Groups
Presenton organizes layouts into named template groups. Each group is a folder under `servers/nextjs/app/presentation-templates/` and has a `settings.json` that declares metadata for the group.
**Verified template group directories:**
| Group ID | Display Name | `default` in settings | Notes |
|---|---|---|---|
| `general` | General | `true` | Fallback/default group |
| `modern` | Modern | `false` | |
| `standard` | Standard | `false` | |
| `swift` | Swift | `false` | |
| `code` | Code | `false` | Code/API-focused layouts |
| `education` | Education | `false` | |
| `product-overview` | Product Overview | `false` | |
| `report` | Report | `false` | |
| `pitch-deck` | Pitch Deck | `false` | |
| `neo-general` | Neo General | `false` | Refreshed general |
| `neo-standard` | Neo Standard | `false` | |
| `neo-modern` | Neo Modern | `false` | |
| `neo-swift` | Neo Swift | `false` | |
Each `settings.json` contains a short `description`, an `ordered` boolean (whether layouts have a fixed sequence), and an `icon_weight` hint used when rendering icons across slides.
Example — `general/settings.json`:
```json
{
"description": "General purpose layouts for common presentation elements",
"ordered": false,
"default": true,
"icon_weight": "regular"
}
```
Example — `neo-modern/settings.json`:
```json
{
"description": "New modern purpose layouts for common presentation elements",
"ordered": false,
"default": false,
"icon_weight": "regular"
}
```
Sources: [servers/nextjs/app/presentation-templates/general/settings.json](servers/nextjs/app/presentation-templates/general/settings.json), [servers/nextjs/app/presentation-templates/neo-modern/settings.json](servers/nextjs/app/presentation-templates/neo-modern/settings.json)
The FastAPI layer exposes four template names as "default" templates that do not require a database record:
```python
DEFAULT_TEMPLATES = ["general", "modern", "standard", "swift"]
```
Sources: [servers/fastapi/constants/presentation.py:1](servers/fastapi/constants/presentation.py)
Custom user-created templates are stored in the database and referenced with a `custom-<uuid>` ID prefix.
---
## How Layouts Are Defined (Next.js Side)
Each slide layout lives in its own `.tsx` file inside a group subdirectory. Every layout file exports four named constants that the registry uses to describe the layout to the LLM:
| Export | Purpose |
|---|---|
| `layoutId` | Stable string ID, e.g. `'general-intro-slide'` |
| `layoutName` | Human-readable name, e.g. `'Intro Slide'` |
| `layoutDescription` | One-sentence description fed to the LLM |
| `Schema` | Zod schema describing the slide's data fields |
Example from `general/IntroSlideLayout.tsx`:
```tsx
export const layoutId = 'general-intro-slide'
export const layoutName = 'Intro Slide'
export const layoutDescription = 'A clean slide layout with title, description text, presenter info, and a supporting image.'
const introSlideSchema = z.object({
title: z.string().min(3).max(40)...
description: z.string().min(10).max(150)...
image: ImageSchema...
})
export const Schema = introSlideSchema
```
Sources: [servers/nextjs/app/presentation-templates/general/IntroSlideLayout.tsx:4-26](servers/nextjs/app/presentation-templates/general/IntroSlideLayout.tsx)
Shared image and icon schemas are defined in `defaultSchemes.ts`:
```ts
export const ImageSchema = z.object({
__image_url__: z.url()...,
__image_prompt__: z.string().min(10).max(50)...
})
export const IconSchema = z.object({
__icon_url__: z.string()...,
__icon_query__: z.string().min(5).max(20)...
})
```
Sources: [servers/nextjs/app/presentation-templates/defaultSchemes.ts:1-20](servers/nextjs/app/presentation-templates/defaultSchemes.ts)
The `__image_url__`, `__icon_url__`, `__image_prompt__`, and `__icon_query__` field names are reserved. The FastAPI backend strips or normalizes them before and after LLM calls.
---
## The Central Registry (index.tsx)
All layouts from every group are imported and assembled in `servers/nextjs/app/presentation-templates/index.tsx`. It exports three things:
1. **Per-group arrays** such as `generalTemplates`, `codeTemplates`, `pitchDeckTemplates`, etc.
2. **`allLayouts`** — a flat array combining all templates in a specific priority order (neo variants first, then classic, then domain-specific groups).
3. **`templates`** — the typed `TemplateLayoutsWithSettings[]` array that pairs each group's layouts with its settings metadata for the UI.
```tsx
export const allLayouts: TemplateWithData[] = [
...neoGeneralTemplates,
...neoModernTemplates,
...neoStandardTemplates,
...neoSwiftTemplates,
...generalTemplates,
...modernTemplates,
...standardTemplates,
...swiftTemplates,
...codeTemplates,
...educationTemplates,
...productOverviewTemplates,
...reportTemplates,
...pitchDeckTemplates,
];
```
Sources: [servers/nextjs/app/presentation-templates/index.tsx:498-512](servers/nextjs/app/presentation-templates/index.tsx)
Each entry is created with `createTemplateEntry(Component, Schema, id, name, description, groupId, fileName)`. The group-to-settings wiring in `templates` gives the UI everything it needs to render a template picker:
```tsx
export const templates: TemplateLayoutsWithSettings[] = [
{ id: "general", name: "General", description: generalSettings.description,
settings: generalSettings, layouts: generalTemplates },
{ id: "pitch-deck", name: "Pitch Deck", ...},
...
]
```
Sources: [servers/nextjs/app/presentation-templates/index.tsx:517-610](servers/nextjs/app/presentation-templates/index.tsx)
---
## How the FastAPI Backend Loads Layouts
When a presentation is generated the backend needs the template's layout metadata (id, name, description, and Zod-derived JSON schema for each slide). The function `get_layout_by_name` in `servers/fastapi/templates/get_layout_by_name.py` resolves a template group name to a `PresentationLayoutModel`.
**Resolution order:**
1. **Primary:** calls the export service's `extract_schema` endpoint at `http://localhost/schema?group=<name>`. The export service renders the Next.js page and extracts the schema from the React component tree.
2. **Fallback:** if the primary fails (older export runtimes), calls `http://localhost/api/template?group=<name>`.
3. **Local settings overlay:** reads `<layout_name>/settings.json` directly from the filesystem to apply the `icon_weight` setting accurately, since the export service may not preserve it.
```python
async def get_layout_by_name(layout_name: str) -> PresentationLayoutModel:
...
local_settings = _read_builtin_template_settings(layout_name)
if local_settings:
local_icon_weight = extract_icon_weight_from_settings(local_settings)
schema_payload["icon_weight"] = local_icon_weight
...
return PresentationLayoutModel(**schema_payload)
```
Sources: [servers/fastapi/templates/get_layout_by_name.py:125-224](servers/fastapi/templates/get_layout_by_name.py)
The `PresentationLayoutModel` Pydantic model captures: `name`, `ordered`, `icon_weight`, and a list of `SlideLayoutModel` entries (each with `id`, `name`, `description`, `json_schema`).
Its `to_string()` method serializes the layout for the LLM:
```python
def to_string(self) -> str:
message = "## Presentation Layout\n\n"
for index, slide in enumerate(self.slides):
message += f"### Slide Layout: {index}\n"
message += f"- Name: {slide.name or slide.json_schema.get('title')}\n"
message += f"- Description: {slide.description}\n\n"
return message
```
Sources: [servers/fastapi/templates/presentation_layout.py:45-51](servers/fastapi/templates/presentation_layout.py)
This plain-text representation is what the LLM actually reads when choosing layouts.
---
## How the LLM Picks Layout Indices
Layout selection is a dedicated LLM call in `servers/fastapi/utils/llm_calls/generate_presentation_structure.py`. Two system prompts are used depending on the input mode:
### Standard Mode — Content-Driven Selection
Used for normal text-based input. The system prompt is `GET_MESSAGES_SYSTEM_PROMPT`:
```
You're a professional presentation designer with creative freedom to design engaging presentations.
# Layout Selection Guidelines
1. Content-driven choices: Let the slide's purpose guide layout selection
- Opening/closing → Title layouts
- Processes/workflows → Visual process layouts
- Comparisons/contrasts → Side-by-side layouts
- Data/metrics → Chart/graph layouts
- Concepts/ideas → Image + text layouts
- Key insights → Emphasis layouts
2. Visual variety: Aim for diverse slide layouts across the presentation.
- Don't use same layout for multiple slides unless necessary.
- Adjacent slide layouts should be different unless instructed/necessary otherwise.
4. Table of contents:
- Must only use table of contents layout if slide content contains table of contents.
```
The user message is the `presentation_layout.to_string()` (slide index list with names and descriptions) followed by the presentation outline content.
Sources: [servers/fastapi/utils/llm_calls/generate_presentation_structure.py:60-118](servers/fastapi/utils/llm_calls/generate_presentation_structure.py)
### Markdown/Slide Mode — Rule-Based Selection
Used when the input is pre-structured slide markdown. The system prompt is `STRUCTURE_FROM_SLIDES_MARKDOWN_SYSTEM_PROMPT` with stricter content-matching rules:
```
# Selection Rules
- If content contains table, then select either table layout or graph layout.
- Don't select layout with image unless content contains image.
- Don't select table layout if content does not contain table.
- You are allowed to select same layout for multiple slides.
# Graph Layout Selection Rules
- Must only select a layout with chart if the content contains table with numeric data.
- Must select a layout that supports n-1 charts for n columns.
- Must prioritize layouts that support multiple charts.
```
In this mode, `presentation_layout.to_string(with_schema=True)` is called — passing the full JSON schema to the LLM, not just names and descriptions.
Sources: [servers/fastapi/utils/llm_calls/generate_presentation_structure.py:16-57](servers/fastapi/utils/llm_calls/generate_presentation_structure.py), [servers/fastapi/utils/llm_calls/generate_presentation_structure.py:121-132](servers/fastapi/utils/llm_calls/generate_presentation_structure.py)
### Output Format
Both prompts ask the LLM to respond with a JSON array of integers: one layout index per slide, referencing the position in the template group's ordered layout list.
```
# Output Rules:
- One layout index for each slide.
- Example: [0, 1, 2, 3, 4]
```
The response is validated against a dynamically generated Pydantic model (`get_presentation_structure_model_with_n_slides`) that enforces exactly N integers.
---
## Layout Selection Flow (End to End)
```text
User picks template group (e.g. "general")
│
▼
FastAPI calls get_layout_by_name("general")
→ extract_schema from export service
→ reads settings.json for icon_weight
→ returns PresentationLayoutModel
(ordered list of SlideLayoutModels with ids, names, descriptions, schemas)
│
▼
generate_presentation_structure(outline, layout, instructions)
→ to_string() → plain text: "Slide Layout: 0\n- Name: Intro Slide\n..."
→ LLM receives: layout menu + slide outlines
→ LLM returns: [0, 3, 1, 5, 2, ...] (one index per slide)
│
▼
PresentationStructureModel stores index list
│
▼
For each slide: look up SlideLayoutModel at chosen index
→ extract json_schema
→ generate_slide_content fills in structured data
│
▼
Next.js renders: layout React component + structured data + theme colors
```
---
## Regex-Based Layout Fallback (ppt_utils.py)
When a specific layout type must be guaranteed regardless of what the LLM chose — for example, always finding a "Table of Contents" layout for the agenda slide — `servers/fastapi/utils/ppt_utils.py` provides `find_slide_layout_index_by_regex` and `select_toc_or_list_slide_layout_index`.
These functions search the layout list by scanning each slide's `id`, `name`, `description`, and `json_schema.title` against regex patterns:
```python
def find_slide_layout_index_by_regex(layout, patterns):
def _find_index(pattern):
regex = re.compile(pattern, re.IGNORECASE)
for index, slide_layout in enumerate(layout.slides):
candidates = [
slide_layout.id or "",
(slide_layout.name or ""),
(slide_layout.description or ""),
(slide_layout.json_schema.get("title") if slide_layout.json_schema else ""),
]
for text in candidates:
if text and regex.search(text):
return index
return -1
...
```
For TOC detection, the patterns tried (in priority order) are: `table of contents`, `agenda`, `contents`, `outline`, `index`, `toc`. If none match, it falls back to bullet-list patterns like `bullet list`, `numbered list`, `list`.
Sources: [servers/fastapi/utils/ppt_utils.py:34-82](servers/fastapi/utils/ppt_utils.py)
---
## Color Theme Generation (Independent of Layout)
The layout choice and the color theme are completely independent. After layouts are selected, the endpoint `POST /theme/generate` in `servers/fastapi/api/v1/ppt/endpoints/theme_generate.py` generates a color palette.
All color work happens in the OKLCH perceptual color space (`theme_utils.py`). Given an optional set of seed colors (primary, background, accent 1, accent 2, text 1, text 2), the system:
1. Generates a random primary color if none is provided.
2. Generates a background color with at least **6:1 WCAG contrast** against the primary, retrying up to 200 times.
3. Derives accent colors at 90-degree hue intervals.
4. Derives text colors with at least 6:1 contrast against their base.
5. Produces ten lightness-scale variants of the primary, background, and accent colors.
The `ThemeData` response includes named CSS variables (`primary`, `background`, `card`, `stroke`, `background_text`, `primary_text`, `graph_0` through `graph_9`) which the Next.js templates consume via CSS custom properties like `var(--background-color)`.
Sources: [servers/fastapi/utils/theme_utils.py:179-191](servers/fastapi/utils/theme_utils.py), [servers/fastapi/api/v1/ppt/endpoints/theme_generate.py:25-74](servers/fastapi/api/v1/ppt/endpoints/theme_generate.py)
---
## Per-Slide Content Generation
Once a layout index is selected for a slide, `generate_slide_content.py` fills the slide's Zod-derived JSON schema with real content. The LLM receives the slide's raw outline text and the schema (as a JSON literal in the system prompt), and must return a valid JSON object matching it.
Special fields are excluded from the schema sent to the LLM (`__image_url__`, `__icon_url__`) so the LLM never writes image URLs directly. Image URLs are resolved separately by an image service; the LLM writes only `__image_prompt__` and `__icon_query__` hints.
```python
response_schema = remove_fields_from_schema(
slide_layout.json_schema, ["__image_url__", "__icon_url__"]
)
```
Sources: [servers/fastapi/utils/llm_calls/generate_slide_content.py:172-176](servers/fastapi/utils/llm_calls/generate_slide_content.py)
---
## Adding a New Layout (Developer Guide)
The `index.tsx` file contains explicit `TODO` step markers. The workflow is:
1. **Step 1** — Create the `.tsx` file in the target group directory, exporting `layoutId`, `layoutName`, `layoutDescription`, and `Schema`.
2. **Step 2** — Import the layout (component, schema, id, name, description) at the top of `index.tsx`.
3. **Step 3** — Add a `createTemplateEntry(...)` call to the matching group array.
4. **Step 4/5** — The group array is already spread into `allLayouts` and `templates`; no additional wiring is needed unless creating a wholly new group.
When a new group is needed, a `settings.json` must also be created and imported, and a new entry added to both `allLayouts` and the `templates` array.
Sources: [servers/nextjs/app/presentation-templates/index.tsx:4-6](servers/nextjs/app/presentation-templates/index.tsx), [servers/nextjs/app/presentation-templates/index.tsx:239-260](servers/nextjs/app/presentation-templates/index.tsx)
---
## Summary
Presenton's template system works in three independent layers that compose at runtime. The Next.js side holds the visual library: React components organized into named groups, each described by a `layoutId`, a human-readable name, a one-sentence description, and a Zod schema. The FastAPI backend fetches this schema at generation time through the export service, serializes it as a numbered plain-text list, and sends it to the LLM alongside the slide outline. The LLM returns a list of integer indices — one per slide — selecting from that menu based on content-type rules baked into the system prompt. Color theming runs entirely separately, generating an OKLCH-based palette with guaranteed WCAG contrast ratios that the chosen React components apply through CSS custom properties. The complete layout catalogue and its group-to-template wiring is maintained in `servers/nextjs/app/presentation-templates/index.tsx`, which serves as the single source of truth for what layouts exist and how they are grouped.
Sources: [servers/fastapi/utils/llm_calls/generate_presentation_structure.py:16-95](servers/fastapi/utils/llm_calls/generate_presentation_structure.py)
---
## 06. The Python Backend: FastAPI Server
> The FastAPI server is the engine room: it hosts the REST + SSE API, manages the SQLite/Alembic database, runs background export tasks, validates LLM responses, and serves static assets. Every LLM call, slide write, and export goes through here.
- Page Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/06-the-python-backend-fastapi-server.md
- Generated: 2026-05-24T05:24:55.172Z
### Source Files
- `servers/fastapi/api/main.py`
- `servers/fastapi/api/lifespan.py`
- `servers/fastapi/api/middlewares.py`
- `servers/fastapi/services/database.py`
- `servers/fastapi/services/export_task_service.py`
- `servers/fastapi/api/v1/ppt/router.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [servers/fastapi/api/main.py](servers/fastapi/api/main.py)
- [servers/fastapi/api/lifespan.py](servers/fastapi/api/lifespan.py)
- [servers/fastapi/api/middlewares.py](servers/fastapi/api/middlewares.py)
- [servers/fastapi/services/database.py](servers/fastapi/services/database.py)
- [servers/fastapi/services/export_task_service.py](servers/fastapi/services/export_task_service.py)
- [servers/fastapi/api/v1/ppt/router.py](servers/fastapi/api/v1/ppt/router.py)
- [servers/fastapi/api/v1/ppt/endpoints/presentation.py](servers/fastapi/api/v1/ppt/endpoints/presentation.py)
- [servers/fastapi/api/v1/ppt/endpoints/outlines.py](servers/fastapi/api/v1/ppt/endpoints/outlines.py)
- [servers/fastapi/models/sql/presentation.py](servers/fastapi/models/sql/presentation.py)
- [servers/fastapi/models/sql/slide.py](servers/fastapi/models/sql/slide.py)
- [servers/fastapi/models/sse_response.py](servers/fastapi/models/sse_response.py)
- [servers/fastapi/utils/user_config.py](servers/fastapi/utils/user_config.py)
- [servers/fastapi/utils/llm_calls/generate_presentation_outlines.py](servers/fastapi/utils/llm_calls/generate_presentation_outlines.py)
- [servers/fastapi/migrations.py](servers/fastapi/migrations.py)
</details>
# The Python Backend: FastAPI Server
The FastAPI server at `servers/fastapi/` is the central engine of Presenton. Every action the user takes — typing a topic, picking a template, watching slides appear one by one, downloading a finished deck — flows through this server. It hosts the REST and Server-Sent Events (SSE) API, manages the SQLite database, bridges the LLM of your choice, runs Node.js export subprocesses, and serves static assets to the Next.js frontend.
This page walks through how all those responsibilities are organized: application startup, middleware, routing, database, LLM calls, streaming, and the export pipeline.
---
## Application Entry Point
The application object is created in `main.py` using FastAPI's standard constructor, wired to a lifespan context manager for startup/shutdown logic:
```python
# servers/fastapi/api/main.py:57
app = FastAPI(lifespan=app_lifespan)
```
Four top-level routers are registered immediately after:
| Router import | Mounted prefix |
|---|---|
| `API_V1_PPT_ROUTER` | `/api/v1/ppt` |
| `API_V1_WEBHOOK_ROUTER` | `/api/v1/webhook` |
| `API_V1_MOCK_ROUTER` | `/api/v1/mock` |
| `API_V1_AUTH_ROUTER` | `/api/v1/auth` |
Two `StaticFiles` mounts are added next: one for `/app_data` (user-generated images, exported files) and one for `/static` (bundled UI icons and assets). A special `static_icon_fallback_middleware` catches any 404 under `/static/icons/` and returns a `placeholder.svg` rather than a broken-image error — handling cases where Phosphor icon names changed between versions.
```python
# servers/fastapi/api/main.py:89-101
@app.middleware("http")
async def static_icon_fallback_middleware(request: Request, call_next):
...
if not path.startswith("/static/icons/"):
return response
placeholder = get_resource_path("static/icons/placeholder.svg")
...
return FileResponse(placeholder, media_type="image/svg+xml")
```
Sentry error tracking is initialized before the app object is built (`_maybe_init_sentry()`), pulling `SENTRY_DSN`, `SENTRY_TRACES_SAMPLE_RATE`, and `SENTRY_SEND_DEFAULT_PII` from environment variables. The SDK import is guarded in a try/except so Sentry remains optional in builds that omit it.
Sources: [servers/fastapi/api/main.py:1-101]()
---
## Lifespan: Startup and Shutdown
`app_lifespan` is an `asynccontextmanager` that runs once at process start and once at shutdown. In order, it:
1. Configures Python logging from the `LOG_LEVEL` environment variable (default `INFO`).
2. Ensures the `APP_DATA_DIRECTORY` folder exists on disk.
3. Calls `migrate_database_on_startup()` (Alembic migrations, when `MIGRATE_DATABASE_ON_STARTUP=true`).
4. Calls `create_db_and_tables()` (SQLModel `CREATE TABLE IF NOT EXISTS` for all registered models).
5. Bootstraps single-user credentials from `AUTH_USERNAME`/`AUTH_PASSWORD` env vars if provided.
6. Calls `check_llm_and_image_provider_api_or_model_availability()` to warn early if an API key or model is missing.
7. On shutdown, disposes the async SQLAlchemy connection pool via `dispose_engines()`.
```python
# servers/fastapi/api/lifespan.py:83-100
@asynccontextmanager
async def app_lifespan(_: FastAPI):
_configure_application_logging()
os.makedirs(get_app_data_directory_env(), exist_ok=True)
await migrate_database_on_startup()
await create_db_and_tables()
_bootstrap_auth_from_env()
await check_llm_and_image_provider_api_or_model_availability()
yield
await dispose_engines()
```
The auth bootstrap supports three environment-driven scenarios: `RESET_AUTH=true` wipes stored credentials (recovery), setting `AUTH_USERNAME`+`AUTH_PASSWORD` on a fresh instance seeds credentials without visiting the UI (first-run preseed), and `AUTH_OVERRIDE_FROM_ENV=true` forces an overwrite of existing credentials. Errors in this block are caught and logged rather than crashing startup.
Sources: [servers/fastapi/api/lifespan.py:36-100]()
---
## Middleware Stack
Two custom `BaseHTTPMiddleware` classes sit in the stack, applied in reverse registration order (innermost first in FastAPI):
### `UserConfigEnvUpdateMiddleware`
Runs on every request (unless `CAN_CHANGE_KEYS=false`). It reads a `userConfig.json` file from disk and merges the stored LLM/image provider credentials into the process's environment variables. This is how a user can configure their API keys through the UI without restarting the container — the settings are persisted to disk and re-applied from the JSON file on each request.
```python
# servers/fastapi/api/middlewares.py:15-19
class UserConfigEnvUpdateMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
if get_can_change_keys_env() != "false":
update_env_with_user_config()
return await call_next(request)
```
### `SessionAuthMiddleware`
Guards all `/api/` routes and `/app_data/` paths (except `/app_data/images/`, which must be accessible by the export subprocess). The check is skipped entirely when `DISABLE_AUTH=true`.
The gate works in two layers:
1. **Session token check** — reads a cookie (`SESSION_COOKIE_NAME`) and validates it against stored credentials. Returns `status 428` ("setup required") if no credentials exist yet.
2. **Basic auth fallback** — if no session cookie is present, the middleware tries HTTP Basic credentials from the `Authorization` header, letting headless API callers (including the PPTX exporter) authenticate without a session cookie.
The `/api/v1/auth/` prefix is always exempt so the login endpoint itself can be reached.
Sources: [servers/fastapi/api/middlewares.py:1-83]()
---
## Database Layer
The database layer is built on **SQLAlchemy async** + **SQLModel** (a thin Pydantic-aware wrapper).
```python
# servers/fastapi/services/database.py:27-36
database_url, connect_args = get_database_url_and_connect_args()
_pool_kwargs = get_pool_kwargs() if "sqlite" not in database_url else {}
sql_engine: AsyncEngine = create_async_engine(
database_url, connect_args=connect_args, **_pool_kwargs
)
async_session_maker = async_sessionmaker(sql_engine, expire_on_commit=False)
```
Pool configuration (`pool_size`, `max_overflow`, etc.) is applied only for server-class databases; SQLite uses its own file-locking model and ignores pool settings. The `get_async_session()` generator is used as a FastAPI `Depends` throughout all endpoints.
### Schema
The following SQLModel tables are created on startup:
| Table | Model class | Purpose |
|---|---|---|
| `presentations` | `PresentationModel` | One row per presentation; stores content, outline, layout, structure as JSON |
| `slides` | `SlideModel` | One row per slide, foreign key to `presentations` with `CASCADE DELETE` |
| `key_value` | `KeyValueSqlModel` | General key-value store |
| `chat_history_messages` | `ChatHistoryMessageModel` | Per-presentation chat thread |
| `image_assets` | `ImageAsset` | Tracks generated/fetched images |
| `presentation_layout_codes` | `PresentationLayoutCodeModel` | Custom template code + font lists |
| `template_create_infos` | `TemplateCreateInfoModel` | Template creation metadata |
| `templates` | `TemplateModel` | User-created layout templates |
| `webhook_subscriptions` | `WebhookSubscription` | Registered webhook URLs |
| `async_presentation_generation_tasks` | `AsyncPresentationGenerationTaskModel` | Background task status rows |
| `ollama_pull_status` | `OllamaPullStatus` | Tracks Ollama model download progress |
`PresentationModel` stores complex objects (outlines, layout, structure) as JSON blobs and exposes typed accessors like `get_layout() -> PresentationLayoutModel` and `get_structure() -> PresentationStructureModel`.
Sources: [servers/fastapi/services/database.py:1-77](), [servers/fastapi/models/sql/presentation.py:1-82](), [servers/fastapi/models/sql/slide.py:1-33]()
---
## Routing: The `/api/v1/ppt` Tree
The `API_V1_PPT_ROUTER` aggregates twenty sub-routers under `/api/v1/ppt`. Each major feature area has its own `APIRouter`:
| Sub-router | Typical responsibility |
|---|---|
| `PRESENTATION_ROUTER` | Create, fetch, stream, generate, edit, derive presentations |
| `OUTLINES_ROUTER` | Stream outline generation over SSE |
| `SLIDE_ROUTER` | Fetch/update individual slides |
| `CHAT_ROUTER` | Per-presentation conversational chat |
| `LAYOUT_MANAGEMENT_ROUTER` | Slide-to-HTML rendering |
| `PPTX_SLIDES_ROUTER` | Import slides from an uploaded PPTX |
| `PDF_SLIDES_ROUTER` | Import slides from an uploaded PDF |
| `FILES_ROUTER` | Upload supporting documents |
| `IMAGES_ROUTER` | Image search/generation |
| `ICONS_ROUTER` | Icon search |
| `FONTS_ROUTER` | Font listing |
| `OLLAMA_ROUTER` | Ollama model management |
| `OPENAI_ROUTER` / `ANTHROPIC_ROUTER` / `GOOGLE_ROUTER` | Provider-specific API key validation |
| `CODEX_AUTH_ROUTER` | OAuth token exchange for hosted Codex provider |
| `TEMPLATE_ROUTER` | Custom template CRUD |
| `THEMES_ROUTER` / `THEME_ROUTER` | Theme management and AI theme generation |
Sources: [servers/fastapi/api/v1/ppt/router.py:1-47]()
---
## Presentation Generation: End-to-End
The generation flow is the most complex part of the backend. There are two surface-level endpoints that share the same core `generate_presentation_handler` function:
- `POST /api/v1/ppt/presentation/generate` — **synchronous**: blocks until the full deck is generated and exported, then returns a file path.
- `POST /api/v1/ppt/presentation/generate/async` — **background**: immediately returns an `AsyncPresentationGenerationTaskModel` row; the actual work runs in FastAPI's `BackgroundTasks`. Clients poll `GET /api/v1/ppt/presentation/status/{id}` for progress.
A third interactive flow uses SSE for the step-by-step UI experience:
- `GET /api/v1/ppt/outlines/stream/{id}` — streams outline text tokens over SSE as the LLM generates them.
- `GET /api/v1/ppt/presentation/stream/{id}` — streams completed slide objects one at a time over SSE, with parallel asset fetching in the background.
### Step-by-step inside `generate_presentation_handler`
```
1. Validate inputs (slide count limits, template existence)
2. If file attachments are provided → load and extract document text
3. Call LLM to generate outlines (streamed internally, collected into text)
4. Parse outlines using `dirtyjson` (tolerates LLM formatting imperfections)
5. Ask LLM to map each outline to a slide layout index (structure)
6. Insert table-of-contents placeholder slides if requested
7. Generate slide content: batches of 10 concurrently via asyncio.gather
8. For each batch: immediately start asset fetch tasks in parallel
9. await all asset tasks
10. Persist PresentationModel + SlideModel + ImageAsset rows to DB
11. Call ExportTaskService.export_from_url() → spawns Node.js subprocess → returns file path
12. Fire webhook (success or failure) via ConcurrentService
```
Sources: [servers/fastapi/api/v1/ppt/endpoints/presentation.py:628-989]()
---
## Server-Sent Events (SSE) Protocol
Both streaming endpoints return `StreamingResponse` with `media_type="text/event-stream"`. The wire format is defined by four Pydantic models:
```python
# servers/fastapi/models/sse_response.py
SSEResponse → event: response\ndata: {"type": "chunk", "chunk": "..."}
SSEStatusResponse → event: response\ndata: {"type": "status", "status": "..."}
SSEErrorResponse → event: response\ndata: {"type": "error", "detail": "..."}
SSECompleteResponse→ event: response\ndata: {"type": "complete", "<key>": {...}}
```
The outline stream emits one `chunk` event per text token from the LLM, then a single `complete` event holding the saved `PresentationModel`. The slide stream emits chunk events containing raw slide JSON as each slide finishes LLM generation, then additional `slide_assets` events as background image/icon downloads complete — allowing the frontend to progressively enrich slides without waiting for all assets.
```python
# servers/fastapi/api/v1/ppt/endpoints/presentation.py:400-403
yield SSEResponse(
event="response",
data=json.dumps({"type": "chunk", "chunk": '{ "slides": [ '}),
).to_string()
```
Sources: [servers/fastapi/models/sse_response.py:1-50](), [servers/fastapi/api/v1/ppt/endpoints/outlines.py:103-127](), [servers/fastapi/api/v1/ppt/endpoints/presentation.py:365-519]()
---
## LLM Provider Abstraction
All LLM calls go through a thin `llmai` abstraction layer (`from llmai import get_client`). The actual provider is selected at runtime from the `LLM` environment variable. The `UserConfigEnvUpdateMiddleware` keeps these variables current on every request, so a user can switch providers through the UI without restarting the server.
Providers verified in `utils/user_config.py` (env vars read/written per request):
| Provider | Key env var |
|---|---|
| OpenAI | `OPENAI_API_KEY`, `OPENAI_MODEL` |
| Anthropic | `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL` |
| Google Gemini | `GOOGLE_API_KEY`, `GOOGLE_MODEL` |
| Google Vertex AI | `VERTEX_*` |
| Azure OpenAI | `AZURE_OPENAI_*` |
| AWS Bedrock | `BEDROCK_*` |
| Ollama (local) | `OLLAMA_URL`, `OLLAMA_MODEL` |
| OpenRouter | `OPENROUTER_*` |
| Fireworks, Together, Cerebras | provider-specific vars |
| LiteLLM, LM Studio | proxy URL + API key |
| Custom OpenAI-compatible | `CUSTOM_LLM_URL`, `CUSTOM_LLM_API_KEY` |
| Codex (hosted) | OAuth tokens via `CODEX_*` |
This is a BYOK (Bring Your Own Key) design: the server never hard-codes a provider. Users supply keys through the settings UI or via environment variables; the middleware propagates them into `os.environ` before each request.
Sources: [servers/fastapi/utils/user_config.py:1-76](), [servers/fastapi/utils/llm_calls/generate_presentation_outlines.py:1-24]()
---
## Export Task Service
Exporting a finished presentation to PPTX or PDF is delegated to a Node.js runtime (`presentation-export/index.cjs`). The Python side spawns it as an async subprocess and communicates through temporary JSON files:
```
Python Node.js (presentation-export)
────── ─────────────────────────────
write export_task.json → read task, run headless export
write export_task.response.json
read response.json ← (process exits 0 on success)
```
The `ExportTaskService` class (`services/export_task_service.py`) handles:
- **Runtime discovery**: finds `index.cjs` (or `index.js`) by probing several candidate paths under `EXPORT_RUNTIME_DIR`, `EXPORT_PACKAGE_ROOT`, or relative to the working directory.
- **Converter binary**: selects a platform+arch-specific native binary (`convert-linux-x64`, `convert-darwin-arm64`, etc.) for the PPTX-to-HTML conversion step.
- **Child process management**: uses `asyncio.create_subprocess_exec` with a 300-second timeout and a `BoundedTextBuffer` to capture stdout/stderr without unbounded memory growth.
- **Three task types**: `export` (render slides URL → PDF/PPTX), `pptx-to-html` (parse an uploaded PPTX into slide HTML), `extract-schema` (derive a JSON schema from a template URL).
```python
# servers/fastapi/services/export_task_service.py:291-344
process = await asyncio.create_subprocess_exec(
*command,
cwd=self.export_dir,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env,
**_windows_hidden_subprocess_kwargs(),
)
```
On Windows, `CREATE_NO_WINDOW` is passed to prevent a console window from appearing. The `EXPORT_TASK_SERVICE` singleton is instantiated at module level and shared across all requests.
Sources: [servers/fastapi/services/export_task_service.py:70-467]()
---
## Request Lifecycle Summary
```text
HTTP Request
│
▼
CORSMiddleware (allow all origins)
│
▼
SessionAuthMiddleware (cookie or Basic auth; 401/428 on failure)
│
▼
UserConfigEnvUpdateMiddleware (sync userConfig.json → os.environ)
│
▼
static_icon_fallback_middleware (404 on /static/icons/ → placeholder.svg)
│
▼
Router dispatch
├── /api/v1/auth/ (exempt from auth)
├── /api/v1/ppt/ (presentation, slides, chat, export…)
├── /api/v1/webhook/
└── /api/v1/mock/
│
▼
Endpoint handler
├── Depends(get_async_session) → SQLAlchemy AsyncSession
├── LLM call via llmai abstraction
└── ExportTaskService (Node.js subprocess, when export needed)
```
The entire stack is designed to work with any LLM provider and any SQL-compatible database (SQLite by default, PostgreSQL or MySQL with pool settings). There are no hard dependencies on a specific cloud vendor, making the backend fully self-hostable and BYOK-friendly. The `lifespan` hook ensures the database is always current before the first request is served, and that connection pools are cleanly released on shutdown.
Sources: [servers/fastapi/api/main.py:57-101](), [servers/fastapi/api/lifespan.py:83-100](), [servers/fastapi/services/database.py:27-76]()
---
## 07. The Browser UI: Next.js Frontend
> The Next.js app is the face the user sees: a dashboard to manage decks, an outline editor, a live slide editor with Tiptap rich text, and an export flow (PDF, PPTX, Google Slides). Routes map directly to stages of the generation workflow.
- Page Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/07-the-browser-ui-next.js-frontend.md
- Generated: 2026-05-24T05:24:36.465Z
### Source Files
- `servers/nextjs/app/(presentation-generator)/outline/page.tsx`
- `servers/nextjs/app/(presentation-generator)/presentation/page.tsx`
- `servers/nextjs/app/(presentation-generator)/components/PresentationRender.tsx`
- `servers/nextjs/app/(presentation-generator)/components/EditableLayoutWrapper.tsx`
- `servers/nextjs/app/(export)/pdf-maker/page.tsx`
- `servers/nextjs/app/(presentation-generator)/(dashboard)/dashboard/page.tsx`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [servers/nextjs/app/(presentation-generator)/(dashboard)/dashboard/page.tsx](servers/nextjs/app/(presentation-generator)/(dashboard)/dashboard/page.tsx)
- [servers/nextjs/app/(presentation-generator)/(dashboard)/dashboard/components/DashboardPage.tsx](servers/nextjs/app/(presentation-generator)/(dashboard)/dashboard/components/DashboardPage.tsx)
- [servers/nextjs/app/(presentation-generator)/outline/page.tsx](servers/nextjs/app/(presentation-generator)/outline/page.tsx)
- [servers/nextjs/app/(presentation-generator)/outline/components/OutlinePage.tsx](servers/nextjs/app/(presentation-generator)/outline/components/OutlinePage.tsx)
- [servers/nextjs/app/(presentation-generator)/presentation/page.tsx](servers/nextjs/app/(presentation-generator)/presentation/page.tsx)
- [servers/nextjs/app/(presentation-generator)/presentation/components/PresentationPage.tsx](servers/nextjs/app/(presentation-generator)/presentation/components/PresentationPage.tsx)
- [servers/nextjs/app/(presentation-generator)/presentation/components/PresentationHeader.tsx](servers/nextjs/app/(presentation-generator)/presentation/components/PresentationHeader.tsx)
- [servers/nextjs/app/(presentation-generator)/presentation/components/Chat.tsx](servers/nextjs/app/(presentation-generator)/presentation/components/Chat.tsx)
- [servers/nextjs/app/(presentation-generator)/presentation/hooks/useAutoSave.tsx](servers/nextjs/app/(presentation-generator)/presentation/hooks/useAutoSave.tsx)
- [servers/nextjs/app/(presentation-generator)/components/PresentationRender.tsx](servers/nextjs/app/(presentation-generator)/components/PresentationRender.tsx)
- [servers/nextjs/app/(presentation-generator)/components/EditableLayoutWrapper.tsx](servers/nextjs/app/(presentation-generator)/components/EditableLayoutWrapper.tsx)
- [servers/nextjs/app/(presentation-generator)/components/TiptapText.tsx](servers/nextjs/app/(presentation-generator)/components/TiptapText.tsx)
- [servers/nextjs/app/(export)/pdf-maker/page.tsx](servers/nextjs/app/(export)/pdf-maker/page.tsx)
- [servers/nextjs/app/(export)/pdf-maker/PdfMakerPage.tsx](servers/nextjs/app/(export)/pdf-maker/PdfMakerPage.tsx)
</details>
# The Browser UI: Next.js Frontend
The Next.js application is the complete visual interface for Presenton. It covers every stage a user moves through: starting at a dashboard that lists existing decks, moving through an outline editor where slide topics are arranged, then to a live slide editor where content can be refined with direct text and image editing, and finally to an export flow that produces a PDF or PPTX file. Each stage maps to a distinct URL, and the routes are organized in a way that makes the workflow easy to follow in the file system.
This page describes how those routes are structured, what each screen does, and how the UI components work together—including the rich-text editor, the image/icon editing layer, the AI chat panel, and the export renderer.
---
## Route Map
The Next.js app uses the App Router and groups routes by concern inside parenthesized layout groups. The main generation workflow lives under `(presentation-generator)`, and a separate `(export)` group hosts the headless PDF renderer.
```text
app/
├── (presentation-generator)/
│ ├── (dashboard)/
│ │ ├── dashboard/ → /dashboard (deck list)
│ │ ├── settings/ → /settings
│ │ ├── templates/ → /templates
│ │ └── theme/ → /theme
│ ├── outline/ → /outline (slide outline editor)
│ ├── presentation/ → /presentation?id=<id> (live slide editor)
│ └── documents-preview/ → /documents-preview
└── (export)/
└── pdf-maker/ → /pdf-maker?id=<id> (headless print renderer)
```
Each route group can share its own layout file without polluting sibling groups. The `(dashboard)` sub-group provides the sidebar navigation and shared header used by settings, templates, and themes, while the `outline` and `presentation` pages use a narrower layout focused on the editing canvas.
Sources: [servers/nextjs/app/(presentation-generator)/outline/page.tsx:1-42](), [servers/nextjs/app/(presentation-generator)/presentation/page.tsx:1-27]()
---
## Stage 1: Dashboard (`/dashboard`)
The dashboard is the home screen. It fetches every saved presentation from the backend via `DashboardApi.getPresentations()`, sorts the list by `updated_at` descending by default, and renders each deck as a card in `PresentationGrid`. Users can toggle between ascending and descending sort order with a single button.
The page also exposes a prominent "Create Presentation" card that links to `/upload`, the entry point for creating a new deck. Usage events (page viewed, new presentation clicked) are sent to Mixpanel on load and on interaction.
```tsx
// DashboardPage.tsx – fetching and sorting decks
const data = await DashboardApi.getPresentations();
data.sort(
(a, b) =>
new Date(b.updated_at).getTime() - new Date(a.updated_at).getTime()
);
setPresentations(data);
```
Sources: [servers/nextjs/app/(presentation-generator)/(dashboard)/dashboard/components/DashboardPage.tsx:65-90]()
---
## Stage 2: Outline Editor (`/outline`)
After uploading a document or entering a topic, the user lands on the outline page. This is where the AI-generated slide outline is presented, reviewed, and adjusted before full slide rendering begins.
`OutlinePage` is the top-level client component. It has two tabs—**Outline & Content** and **Select Template**—rendered as a sticky tab bar fixed at the top of the viewport. Switching tabs is blocked while the outline is still streaming from the server.
Three custom hooks coordinate the page's logic:
| Hook | Responsibility |
|---|---|
| `useOutlineStreaming` | Tracks whether the AI outline is still being streamed and which slide index is active |
| `useOutlineManagement` | Handles drag-and-drop reordering (`handleDragEnd`) and adding new slides (`handleAddSlide`) |
| `usePresentationGeneration` | Owns the "Generate" button state and submits the finalized outline to produce slides |
A floating "Generate" button in the bottom-right corner kicks off slide generation. An `OverlayLoader` with a progress message appears while generation runs.
```tsx
// OutlinePage.tsx – tab switch is blocked during streaming
const handleTabChange = (tab: string) => {
if (streamState.isStreaming) {
return;
}
setActiveTab(tab);
};
```
Sources: [servers/nextjs/app/(presentation-generator)/outline/components/OutlinePage.tsx:29-123]()
---
## Stage 3: Slide Editor (`/presentation?id=<id>`)
This is the most complex screen in the application. It renders all slides in a scrollable column, provides a thumbnail side panel for navigation, and houses a chat panel for AI-assisted edits.
### Layout
The editor is a full-height (`h-screen overflow-hidden`) three-column layout:
```text
┌──────────┬──────────────────────────────┬──────────────┐
│ 120px │ Slide canvas (flex-1) │ 370px max │
│ Side │ SlideContent list │ Chat │
│ Panel │ (scrollable) │ Panel │
│ (thumbs) │ │ │
└──────────┴──────────────────────────────┴──────────────┘
```
The slide canvas scrolls automatically during streaming: as each new slide arrives, the container scrolls to keep the latest slide in view using `requestAnimationFrame` and `scrollTo({ behavior: "smooth" })`.
Sources: [servers/nextjs/app/(presentation-generator)/presentation/components/PresentationPage.tsx:299-365]()
### Slide Scaling (`SlideScale` / `PresentationRender`)
Every slide is rendered at a fixed design resolution of **1280 × 720 px** and then scaled down to fit its container using a `ResizeObserver`. The math is straightforward: the scale factor is `(containerWidth / 1280) * 0.98`, capped at 1.0 so slides never enlarge beyond their design size. In presentation mode, both axes are considered and the minimum of `sx` and `sy` is used to fill the viewport without cropping.
```tsx
// PresentationRender.tsx – responsive scale calculation
const scale = useMemo(() => {
if (fixedSize) return 1;
if (presentMode) {
const sx = (w / BASE_WIDTH) * 0.995;
const sy = (h / BASE_HEIGHT) * 0.995;
return Math.min(sx, sy);
}
const safeWidth = Math.max(0, box.w + 20);
return Math.min((safeWidth / BASE_WIDTH) * 0.98, 1);
}, [fixedSize, presentMode, box.w, box.h]);
```
The actual slide content is rendered by `V1ContentRender` inside the scaled `div`, which keeps all layout and font sizes pixel-perfect regardless of screen size.
Sources: [servers/nextjs/app/(presentation-generator)/components/PresentationRender.tsx:31-113]()
### Rich Text Editing with Tiptap
Text blocks inside slides use Tiptap, a headless rich-text editor built on ProseMirror. The `TiptapText` component configures three extensions—`StarterKit`, `tiptap-markdown`, and `Underline`—giving every text field support for bold, italic, underline, strikethrough, and inline code formatting. A floating `BubbleMenu` toolbar appears when text is selected.
Changes are flushed to the Redux store on blur, not on every keystroke, which prevents excessive re-renders. The stored format is Markdown, converted back from the editor's internal state via `editor.storage.markdown.getMarkdown()`.
```tsx
// TiptapText.tsx – save Markdown on blur
onBlur: ({ editor }) => {
const markdown = editor?.storage.markdown.getMarkdown();
if (onContentChange) {
onContentChange(markdown);
}
},
```
Sources: [servers/nextjs/app/(presentation-generator)/components/TiptapText.tsx:32-50]()
### Image and Icon Editing (`EditableLayoutWrapper`)
`EditableLayoutWrapper` is a transparent overlay that wraps each rendered slide and makes images and SVG icons clickable. On mount (with a 400ms delay to wait for lazy-loaded assets), it queries the DOM for all `<img>` and `<svg>` elements that have not yet been processed, maps each one back to its corresponding data path in the slide's JSON structure, and attaches click and hover handlers.
Clicking an image opens an `ImageEditor` modal. Clicking an SVG icon opens an `IconsEditor` modal. Both dispatch Redux actions (`updateSlideImage`, `updateSlideIcon`, `updateImageProperties`) to keep the store in sync.
The path-matching logic uses a multi-strategy URL comparison: exact match, protocol-stripped match, `/app_data/` filename match, and general filename match—falling back through each until a confident pairing is found.
A `MutationObserver` watches the slide's DOM subtree so that images arriving after the initial render (from lazy loading or streaming) are also picked up.
Sources: [servers/nextjs/app/(presentation-generator)/components/EditableLayoutWrapper.tsx:42-372]()
### Auto-Save
While the editor is open and slides are not streaming, changes to `presentationData` in the Redux store trigger a debounced auto-save (default 2000 ms). The `useAutoSave` hook compares the current serialized data against what was last saved and only calls `PresentationGenerationApi.updatePresentationContent` when the content has actually changed. Each save also appends to the undo/redo history via the `addToHistory` action.
Sources: [servers/nextjs/app/(presentation-generator)/presentation/hooks/useAutoSave.tsx:13-88]()
### Header Controls
The `PresentationHeader` bar (sticky, always on top) provides:
| Control | Behavior |
|---|---|
| Title (click to edit) | Inline input; commits on Enter or blur, cancels on Escape |
| Theme selector | Loads custom + default themes; applies CSS variables to the slide wrapper |
| Undo / Redo | Powered by a separate `undoRedoSlice` in Redux |
| Regenerate | Clears slide data and history, re-routes to `?stream=true` |
| Present | Switches to presentation mode via `?mode=present&slide=N` |
| Export | Popover with PDF and PPTX options |
Export in a browser context POSTs to `/api/export-presentation` with the format and presentation ID. In the Electron desktop app, it calls `window.electron.exportPresentation` via IPC instead.
```tsx
// PresentationHeader.tsx – dual export path (web vs Electron)
if (window.electron?.exportPresentation) {
await exportViaIpc("pdf", safePdfTitle);
} else {
const response = await fetch("/api/export-presentation", {
method: "POST",
body: JSON.stringify({ format: "pdf", id: presentation_id, title: safePdfTitle }),
});
}
```
Sources: [servers/nextjs/app/(presentation-generator)/presentation/components/PresentationHeader.tsx:187-307]()
### AI Chat Panel
A `Chat` component occupies the right column. It calls `PresentationChatApi` with the user's message and the current slide index. The API streams back responses including `ChatStreamTrace` events that identify which slide the AI agent is currently working on. The `PresentationPage` uses those events to:
1. **Follow mode** – automatically scroll the canvas to the slide being edited by the agent.
2. **Glow effect** – apply a visual highlight (`isChatEditing`) to the active slide and a softer highlight (`isChatTargeted`) to other slides touched in the same request.
3. **Clear on finish** – remove all highlights 900 ms after the agent stops sending.
Sources: [servers/nextjs/app/(presentation-generator)/presentation/components/PresentationPage.tsx:159-249]()
### Presentation Mode
When the user clicks "Present", the page re-renders as `PresentationMode` (no editor chrome, slides fill the viewport). The mode is encoded in the URL query string (`?mode=present`) so it survives a browser refresh. `toggleFullscreen` uses the Fullscreen API.
Sources: [servers/nextjs/app/(presentation-generator)/presentation/components/PresentationPage.tsx:253-265]()
---
## Stage 4: PDF Export Renderer (`/pdf-maker?id=<id>`)
The `(export)` route group contains a special headless page used only for generating PDFs. The application (or Electron shell) opens this page in a hidden browser window, waits for all slides to render, and then invokes the browser's print function.
`PdfMakerPage` loads the presentation data from the API, applies theme CSS variables and fonts to a wrapper element, then renders every slide using the same `SlideScale` component used in the editor—but with `fixedSize={true}` and `isEditMode={false}` so slides render at exactly 1280 × 720 px with no scaling math and no edit overlays.
A `<style>` block injected via `jsx global` configures print-specific CSS: each slide becomes a separate printed page (`break-after: page`), the page size is set to `1280px 720px`, and all margins are zero.
```tsx
// PdfMakerPage.tsx – print page size matches slide canvas
@page {
size: 1280px 720px;
margin: 0;
}
```
If the export is triggered from Electron, an `exportCookie` header is passed and the slide data is fetched from `/api/export-presentation-data/<id>` with that cookie, so authentication is preserved across the headless render context.
Sources: [servers/nextjs/app/(export)/pdf-maker/PdfMakerPage.tsx:20-89](), [servers/nextjs/app/(export)/pdf-maker/PdfMakerPage.tsx:158-211]()
---
## State Management
The frontend uses Redux (with a `presentationGeneration` slice) as the single source of truth for the current presentation. Both the editor canvas and the chat panel read from and write to the same slice, which is what makes features like auto-save, undo/redo, and follow mode work without prop-drilling through deeply nested component trees. Mixpanel events are fired at each major user action (page viewed, export started, title updated, regenerate triggered) to capture usage telemetry.
---
## Summary
The Next.js frontend is organized as a four-stage funnel—dashboard → outline → editor → export—where each URL corresponds to a concrete user action. The slide canvas always renders at 1280 × 720 px and uses `ResizeObserver`-driven scaling to fit any screen. Tiptap provides rich-text editing directly inside slide content; `EditableLayoutWrapper` adds click-to-edit behavior for images and icons through DOM introspection; and a streaming chat panel lets the AI agent modify slides while the user watches in real time. The `(export)` route group keeps the print renderer cleanly separated from the interactive editor, reusing the same `SlideScale` component but without edit overlays or scaling.
Sources: [servers/nextjs/app/(presentation-generator)/components/PresentationRender.tsx:6-116]()
---
## 08. The Desktop App: Electron Wrapper
> The Electron app bundles the FastAPI backend (as a sidecar process) and the Next.js frontend into a single installable for Mac, Windows, and Linux — no Docker needed. This page explains how the build scripts wire everything together.
- Page Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/pages/08-the-desktop-app-electron-wrapper.md
- Generated: 2026-05-24T05:25:01.621Z
### Source Files
- `electron/app/main.ts`
- `electron/build.js`
- `electron/copy_fastapi_assets.js`
- `electron/build_nextjs_resources.js`
- `electron/package.json`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [electron/app/main.ts](electron/app/main.ts)
- [electron/build.js](electron/build.js)
- [electron/copy_fastapi_assets.js](electron/copy_fastapi_assets.js)
- [electron/build_nextjs_resources.js](electron/build_nextjs_resources.js)
- [electron/package.json](electron/package.json)
- [electron/app/utils/servers.ts](electron/app/utils/servers.ts)
- [electron/app/utils/constants.ts](electron/app/utils/constants.ts)
</details>
# The Desktop App: Electron Wrapper
Presenton ships as a conventional installable desktop application — a `.dmg` for macOS, an `AppImage`/`.deb` for Linux, and an NSIS installer or `.appx` for Windows — with no Docker or separate server setup required. The secret is that the Electron shell bundles both the FastAPI Python backend and the Next.js frontend as ordinary files inside the package, then launches them as child processes when the app opens. From the user's perspective it looks like any other native app; under the hood, three processes are running on `localhost`.
This page explains how the build scripts assemble those three layers, how `main.ts` orchestrates them at runtime, and what platform-specific quirks the code handles.
---
## How the pieces fit together
```text
┌──────────────────────────────────────────────────────┐
│ Electron shell (app_dist/main.js) │
│ │
│ ┌────────────────────┐ ┌────────────────────────┐ │
│ │ FastAPI sidecar │ │ Next.js sidecar │ │
│ │ resources/fastapi │ │ resources/nextjs │ │
│ │ (PyInstaller bin) │ │ (standalone server.js)│ │
│ └────────────────────┘ └────────────────────────┘ │
│ │
│ BrowserWindow → http://127.0.0.1:<nextjsPort> │
└──────────────────────────────────────────────────────┘
```
Electron is the outer container — it creates the native window and provides the OS integration (single-instance lock, IPC, update checker). Both servers are spawned as ordinary child processes on dynamically chosen `localhost` ports. The `BrowserWindow` simply loads the Next.js URL once that server is ready.
---
## Build pipeline
The full build is orchestrated through npm scripts defined in `electron/package.json`. Running `npm run build:all` executes every step in order:
| Step | npm script | What it does |
|------|-----------|--------------|
| 1 | `clean:build` | Wipes `resources/nextjs`, `resources/fastapi`, and `app_dist` |
| 2 | `setup:env` | Installs Node deps for Electron, `uv sync` for FastAPI, and npm deps for Next.js |
| 3 | `build:ts` | Compiles `electron/app/**/*.ts` → `app_dist/` via `tsc` |
| 4 | `build:nextjs` | Builds the Next.js standalone bundle and copies it to `resources/nextjs` |
| 5 | `build:fastapi` | Runs PyInstaller via `uv run --with pyinstaller`, then copies static assets |
| 6 | `build:electron` | Generates version info, fetches the export runtime, type-checks, and calls `electron-builder` |
Sources: [electron/package.json:45-50]()
### Building the FastAPI bundle
The `build:fastapi` script runs PyInstaller inside `servers/fastapi` via `uv run --with pyinstaller python -m PyInstaller --distpath ../../electron/resources server.spec`. This produces a self-contained binary (no Python interpreter on the user's machine is needed). After PyInstaller finishes, `copy_fastapi_assets.js` copies two additional asset directories — `static/` and `assets/` — from the FastAPI source tree into `resources/fastapi/`:
```js
// electron/copy_fastapi_assets.js
const sources = [
{ name: "static", src: path.join(fastapiDir, "static"), dest: path.join(resourcesFastapiDir, "static") },
{ name: "assets", src: path.join(fastapiDir, "assets"), dest: path.join(resourcesFastapiDir, "assets") },
];
```
Sources: [electron/copy_fastapi_assets.js:7-10]()
### Building the Next.js bundle
`build_nextjs_resources.js` runs `npm run build` inside `servers/nextjs` with the environment variable `BUILD_TARGET=electron`, which tells Next.js to use its `output: "standalone"` mode. The resulting standalone directory (a self-contained Node.js server) is then copied wholesale into `resources/nextjs/`. The script handles a layout quirk in Next.js 16+: the standalone server runs from a nested `servers/nextjs/` path, so static files and the `public/` directory must be duplicated next to `server.js`:
```js
// Next.js 16 standalone traces the app under servers/nextjs/; the server process
// runs from that directory, so static assets and public files must live beside server.js
const nestedStandaloneDir = path.join(outDir, "servers", "nextjs")
```
Sources: [electron/build_nextjs_resources.js:57-61]()
### Packaging with electron-builder
`build.js` calls `electron-builder` directly with an inline configuration object. Key settings:
- `asar: false` — files are left unpacked so the FastAPI binary and Next.js `server.js` can be executed directly.
- `files: ["resources", "app_dist", "node_modules", "NOTICE"]` — only these four directories are bundled.
- `afterPack` hook — runs on macOS to `chmod 0o755` the FastAPI binary and any export converter binary, because zip extraction loses executable bits.
```js
// electron/build.js
const afterPack = async (context) => {
if (context.electronPlatformName === "darwin") {
fs.chmodSync(fastapiPath, 0o755)
}
}
```
| Platform | Artifact format(s) |
|----------|--------------------|
| macOS | `.dmg` |
| Linux | `AppImage`, `.deb` |
| Windows | NSIS installer, `.appx` (Microsoft Store) |
Sources: [electron/build.js:6-58](), [electron/build.js:61-117]()
---
## Runtime: what happens when you launch the app
### 1. Single-instance lock
`main.ts` immediately calls `app.requestSingleInstanceLock()`. If another instance is already running, the new one quits; the existing window is focused instead.
Sources: [electron/app/main.ts:167-170]()
### 2. Path initialization
`initializeAppPaths()` resolves platform-appropriate directories for user data, temp files, logs, and cache before anything else happens. On macOS this is `~/Library/Application Support/Presenton Open Source`; on Linux it follows `XDG_CONFIG_HOME`; on Windows it uses `%APPDATA%`.
Sources: [electron/app/utils/constants.ts:234-292]()
### 3. Dependency check
Before showing the main UI, `checkDependenciesBeforeWindow()` verifies that LibreOffice, ImageMagick, and a bundled Chromium (for PDF export) are present. If any are missing, a setup window appears that installs them one after another. If the user cancels, the app quits.
Sources: [electron/app/main.ts:461-467]()
### 4. Port discovery and server startup
Two free `localhost` ports are found dynamically with `findUnusedPorts()`. Then `startServers()` launches both child processes sequentially — FastAPI first, Next.js second — and waits for each to become reachable via HTTP before proceeding.
```text
findUnusedPorts()
→ startFastApiServer(fastapiDir, fastApiPort, env) → await fastApi.ready (polls /docs)
→ startNextJsServer(nextjsDir, nextjsPort, env) → await nextjs.ready (polls /)
→ mainWindow.loadURL(`http://127.0.0.1:${nextjsPort}`)
```
Sources: [electron/app/main.ts:551-575]()
### 5. Dev mode vs. packaged mode
`isDev` is `!app.isPackaged`. The two modes differ significantly in how servers are started:
| Mode | FastAPI command | Next.js command |
|------|----------------|-----------------|
| Dev | `uv run python server.py --port N --reload true` | `npm run dev -- -p N` |
| Packaged | `resources/fastapi/fastapi --port N` (binary) | `process.execPath server.js` (Node via Electron) |
In packaged mode, the Next.js standalone `server.js` is run using Electron's own Node.js runtime (`process.execPath`) with `ELECTRON_RUN_AS_NODE=1`, avoiding the need for a separate system Node installation.
Sources: [electron/app/utils/servers.ts:107-114](), [electron/app/utils/servers.ts:237-255]()
### 6. Environment injection
Before spawning either server, `startServers()` passes a large block of environment variables — LLM provider keys, image provider keys, feature flags, path overrides, and tool binary locations — directly into each child process's `env`. This is how user configuration (stored in `userConfig.json`) flows from the Electron main process down to the FastAPI and Next.js layers.
```ts
// electron/app/main.ts (excerpt, ~line 299-368)
const fastApi = await startFastApiServer(fastapiDir, fastApiPort, {
ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY,
OPENAI_API_KEY: process.env.OPENAI_API_KEY,
OLLAMA_URL: process.env.OLLAMA_URL,
// ... dozens more
}, isDev);
```
Sources: [electron/app/main.ts:299-370]()
### 7. Graceful shutdown
When the window closes or the OS signals a quit, `forceQuitApp()` runs a coordinated teardown: it stops any active export or LibreOffice install processes, then calls `stop()` on both the FastAPI and Next.js managed server objects. Each `stop()` sends `SIGTERM` to the child process and waits for it to exit before calling `app.exit()`.
Sources: [electron/app/main.ts:400-429]()
---
## Linux-specific workarounds
Two Linux issues are handled proactively at startup, before `app.whenReady()`:
1. **Chrome sandbox permissions** — The bundled `chrome-sandbox` binary must be root-owned with mode `4755` (setuid). If it is not, the app appends `--no-sandbox` to avoid a crash.
2. **Shared memory** — `/dev/shm` may be unavailable on some distributions, so `--disable-dev-shm-usage` is always appended on Linux to use `/tmp` instead.
Sources: [electron/app/main.ts:54-73]()
---
## Directory layout inside the installed package
```text
<app bundle>/
├── app_dist/ ← compiled Electron main process (TypeScript → JS)
│ └── main.js
├── resources/
│ ├── fastapi/ ← PyInstaller binary + static/ + assets/
│ │ └── fastapi (or fastapi.exe on Windows)
│ ├── nextjs/ ← Next.js standalone server
│ │ ├── server.js
│ │ ├── .next-build/static/
│ │ └── public/
│ ├── export/ ← export runtime (Chromium, converter binary)
│ └── ui/ ← splash screen HTML and static images
└── node_modules/ ← Electron runtime dependencies
```
`asar: false` in `build.js` is what allows the OS to execute `resources/fastapi/fastapi` and `node resources/nextjs/server.js` directly; a packed `.asar` archive would make these binaries inaccessible to `spawn()`.
Sources: [electron/build.js:63-74]()
---
## Summary
The Electron wrapper's core job is coordination, not computation. At build time, three separate build tools — PyInstaller, Next.js standalone mode, and electron-builder — each package their own runtime into the `resources/` directory. At launch time, `main.ts` picks two free ports, injects configuration as environment variables, and races two child processes to readiness before revealing the UI. The arrangement is deliberately BYOK (bring your own keys): every LLM provider key is passed through as an environment variable, so no provider is hard-wired into the packaged binary.
Sources: [electron/app/utils/constants.ts:9-16](), [electron/app/main.ts:290-398]()
---