# What You Can Feed It: Documents, PDFs & Images

> Presenton can read uploaded PDFs, Office files, images (via OCR), and plain text to ground the presentation in real content. This page covers the document loader, LiteParse integration, and image generation service.

- Repository: presenton/presenton
- GitHub: https://github.com/presenton/presenton
- Human wiki: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc
- Complete Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/llms-full.txt

## Source Files

- `servers/fastapi/services/documents_loader.py`
- `servers/fastapi/services/liteparse_service.py`
- `servers/fastapi/services/image_generation_service.py`
- `servers/fastapi/services/document_conversion_service.py`
- `servers/fastapi/constants/documents.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [servers/fastapi/services/documents_loader.py](servers/fastapi/services/documents_loader.py)
- [servers/fastapi/services/liteparse_service.py](servers/fastapi/services/liteparse_service.py)
- [servers/fastapi/services/document_conversion_service.py](servers/fastapi/services/document_conversion_service.py)
- [servers/fastapi/services/image_generation_service.py](servers/fastapi/services/image_generation_service.py)
- [servers/fastapi/constants/documents.py](servers/fastapi/constants/documents.py)
</details>

# What You Can Feed It: Documents, PDFs & Images

When you ask Presenton to build a presentation, you are not limited to typing a topic into a text box. You can hand it real documents — a PDF report, a Word file, a spreadsheet, or even a scanned image — and it will extract the text content from those files to ground the generated slides in actual source material. This page explains exactly which file types are accepted, how each one is processed under the hood, and how the image generation system works as a separate concern.

## Accepted File Types

The full list of accepted formats is defined in a single constants file, so every API endpoint and upload validation step pulls from the same source.

**PDFs**
`.pdf`

**Plain text**
`.txt`

**Word-processor files (Office)**
`.doc`, `.docx`, `.docm`, `.odt`, `.rtf`

**Spreadsheets**
`.xls`, `.xlsx`, `.xlsm`, `.ods`, `.csv`, `.tsv`

**Presentations**
`.ppt`, `.pptx`, `.pptm`, `.odp`

**Images**
`.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.tif`, `.webp`, `.svg`

Each of these groups has a matching list of MIME types too, because some clients (particularly older browsers uploading legacy Office files) send a generic `application/octet-stream` content type. The server checks both the MIME type and the file extension to decide whether to accept the upload.

Sources: [servers/fastapi/constants/documents.py:1-82](servers/fastapi/constants/documents.py)

---

## How DocumentsLoader Routes Each File

`DocumentsLoader` is the entry point that decides what to do with each uploaded file. It inspects the file extension and dispatches to the appropriate handler:

```python
# servers/fastapi/services/documents_loader.py (simplified)
if extension in PDF_EXTENSIONS:
    document, imgs = await self.load_pdf(file_path, load_text, load_images, temp_dir)
elif extension in TEXT_EXTENSIONS:
    document = await self.load_text(file_path)
elif extension in OFFICE_EXTENSIONS:
    document = await asyncio.to_thread(self.load_office_document, file_path, temp_dir)
elif extension in IMAGE_EXTENSIONS:
    document = await asyncio.to_thread(self.load_image, file_path, temp_dir)
else:
    document = await asyncio.to_thread(self._parse_with_liteparse, file_path)
```

Notice that `.txt` files skip the LiteParse pipeline entirely — they are read with a direct `open()` call since no conversion is needed. Everything else eventually passes through LiteParse, either directly or after a format-conversion step.

After extraction, every document passes through `clean_extracted_document_text()`, a function that strips any stray LiteParse JSON wrapper that may have been stored literally (for example, if the whole runner JSON line ended up as the document body). This is a resilience step that runs up to four passes until the text stabilises.

Sources: [servers/fastapi/services/documents_loader.py:197-240](servers/fastapi/services/documents_loader.py)

---

## The Two-Step Pipeline for Office Files and Images

Office documents (Word, PowerPoint, spreadsheets) and image files cannot be fed to LiteParse directly in their native formats. They go through a conversion step first.

```
┌────────────────────────────────────────────────────────┐
│                    Uploaded file                       │
└──────────────┬────────────────────────┬───────────────┘
               │ Office (.docx etc.)    │ Image (.jpg etc.)
               ▼                        ▼
     LibreOffice (soffice)        ImageMagick (magick/convert)
       converts → .pdf              converts → .png
               │                        │
               └──────────┬─────────────┘
                          ▼
                     LiteParse runner
                  (Node.js subprocess)
                     extracts text
                     as Markdown
```

**Office to PDF via LibreOffice**

`DocumentConversionService.convert_office_to_pdf()` calls `soffice --headless --convert-to pdf`. The binary path comes from the `SOFFICE_PATH` environment variable, falling back to `soffice` (Linux/macOS) or `soffice.exe` (Windows). If the expected `<stem>.pdf` file is not found after conversion, the service picks the newest `.pdf` in the output directory as a fallback.

Sources: [servers/fastapi/services/document_conversion_service.py:83-170](servers/fastapi/services/document_conversion_service.py)

**Image to PNG via ImageMagick**

`DocumentConversionService.convert_image_to_png()` calls `magick <input> <output>.png` (or `convert` on Linux systems where ImageMagick 6 is installed). The binary is resolved at startup: the `IMAGEMAGICK_BINARY` environment variable wins; otherwise the service probes for `magick` and then `convert` by running `<binary> -version`.

Sources: [servers/fastapi/services/document_conversion_service.py:172-241](servers/fastapi/services/document_conversion_service.py)

---

## LiteParse: The Core Text Extraction Engine

LiteParse is a Node.js package (`@llamaindex/liteparse`) that does the heavy lifting of turning PDFs and converted office documents into structured Markdown text, including OCR for scanned pages.

### How It Works

`LiteParseService.parse()` builds a command like:

```
node liteparse_runner.mjs \
  --file /tmp/upload.pdf \
  --ocr-enabled true \
  --ocr-language eng \
  --dpi 120 \
  --num-workers 1 \
  --python-bridge plain
```

The runner is a `.mjs` (ES module) file. The service looks for it in several candidate paths (Docker container path, repo resource directory, PyInstaller bundle layout) and picks the first one that exists.

There are two output modes, selected by the `LITEPARSE_RUNNER_OUTPUT` environment variable:
- **`plain` (default)** — stdout is the raw Markdown text. For large PDFs this avoids the overhead of JSON-encoding multi-megabyte strings.
- **`json`** — stdout is a single JSON line with `{ ok, text, filePath, pageCount }`. The service scans output lines in reverse to find the last valid JSON object, tolerating any stray log output before the payload.

### OCR Language Support

The language passed to LiteParse comes from the presentation language selected by the user. `presentation_language_to_ocr_code()` maps a human language label (e.g. `"French"`) to a Tesseract language code (e.g. `"fra"`). This means scanned documents in non-English languages can still be read correctly.

If an external OCR server is configured via `LITEPARSE_OCR_SERVER_URL`, the URL is appended to the command so LiteParse offloads recognition to that server instead of running Tesseract locally.

Sources: [servers/fastapi/services/liteparse_service.py:67-399](servers/fastapi/services/liteparse_service.py)

### Fallback Parser

If LiteParse fails (network issue, Node.js not installed, package missing), the loader tries an optional lightweight fallback:

```python
# servers/fastapi/services/documents_loader.py:27-30
try:
    from services.lightweight_document_service import DocumentService as DocumentServiceCls
except Exception:
    DocumentServiceCls = None
```

If the fallback also fails, a `500` HTTP error is raised with the original error message. This keeps the stack simple: try the best tool first, recover gracefully, then surface the failure clearly.

Sources: [servers/fastapi/services/documents_loader.py:303-330](servers/fastapi/services/documents_loader.py)

---

## PDF Page Images (for Vision-Based Slide Mapping)

Beyond text extraction, `DocumentsLoader` can also render each page of a PDF as a PNG image. This is used when the caller sets `load_images=True` and supplies a `temp_dir`. The implementation uses `pdfplumber` at 150 DPI:

```python
# servers/fastapi/services/documents_loader.py:332-341
with pdfplumber.open(file_path) as pdf:
    for page in pdf.pages:
        img = page.to_image(resolution=150)
        image_path = os.path.join(temp_dir, f"page_{page.page_number}.png")
        img.save(image_path)
        images.append(image_path)
```

The resulting image paths are stored separately from the extracted text (`self._images` vs `self._documents`), so downstream code can pass page images to a vision model while still having the extracted text for context.

Sources: [servers/fastapi/services/documents_loader.py:332-347](servers/fastapi/services/documents_loader.py)

---

## Image Generation Service (Slide Illustrations)

`ImageGenerationService` is a separate concern from document ingestion. It does not read uploaded files — it generates new images to illustrate slides. It is provider-neutral by design: a single environment variable picks which backend to use, and the rest of the code is unchanged.

| Provider | Selection check | Backend |
|---|---|---|
| Pixabay | `is_pixabay_selected()` | Pixabay REST API (stock photos) |
| Pexels | `is_pixels_selected()` | Pexels REST API (stock photos) |
| DALL·E 3 | `is_dalle3_selected()` | OpenAI `dall-e-3` |
| GPT Image 1.5 | `is_gpt_image_1_5_selected()` | OpenAI `gpt-image-1.5` |
| Gemini Flash | `is_gemini_flash_selected()` | Google `gemini-2.5-flash-image` |
| NanoBanana Pro | `is_nanobanana_pro_selected()` | Google `gemini-3-pro-image-preview` |
| ComfyUI | `is_comfyui_selected()` | Self-hosted ComfyUI workflow API |
| Open WebUI | `is_open_webui_selected()` | OpenAI-compatible self-hosted endpoint |
| OpenAI-compatible | `is_openai_compatible_selected()` | Custom base URL + model |
| Disabled / none | fallthrough | Static placeholder image |

Stock photo providers (Pexels, Pixabay) receive only the image subject prompt; generative providers receive the full prompt including the slide theme. If any provider call fails, the service catches the exception and returns a placeholder image rather than failing the whole presentation generation.

Sources: [servers/fastapi/services/image_generation_service.py:41-122](servers/fastapi/services/image_generation_service.py)

### ComfyUI Workflow Injection

The ComfyUI integration is worth calling out: instead of a simple prompt-to-API call, it accepts a full ComfyUI workflow JSON (supplied via the `COMFYUI_WORKFLOW` environment variable). Presenton traverses the workflow graph to find the node titled `"Input Prompt"` and injects the generated prompt text into that node's text field. It then submits the workflow, polls the `/history/{prompt_id}` endpoint until the job completes, and downloads the resulting image.

Sources: [servers/fastapi/services/image_generation_service.py:401-540](servers/fastapi/services/image_generation_service.py)

---

## Summary

Presenton ingests uploaded files through a layered pipeline: plain text is read directly, Office files go through LibreOffice → LiteParse, images go through ImageMagick → LiteParse (with OCR), and PDFs go straight to LiteParse. Every path produces clean Markdown text that grounds the AI's slide generation in real document content. Image generation for slide illustrations is a completely separate, provider-swappable service that falls back gracefully to a placeholder when no backend is configured. The full set of accepted MIME types and extensions is declared once in `constants/documents.py` and reused everywhere, so adding a new format requires a change in only one place.

Sources: [servers/fastapi/constants/documents.py:67-82](servers/fastapi/constants/documents.py)
