# Bring Your Own Brain: How LLM Providers Work

> Presenton does not pick an AI for you. This page explains the 15 supported providers, how the app selects a default model per provider, and what BYOK (bring your own key) means in practice.

- Repository: presenton/presenton
- GitHub: https://github.com/presenton/presenton
- Human wiki: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc
- Complete Markdown: https://grok-wiki.com/public/wiki/presenton-presenton-f6685dc028cc/llms-full.txt

## Source Files

- `servers/fastapi/enums/llm_provider.py`
- `servers/fastapi/constants/llm.py`
- `servers/fastapi/utils/model_availability.py`
- `servers/fastapi/constants/supported_ollama_models.py`
- `servers/fastapi/api/v1/ppt/endpoints/ollama.py`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [servers/fastapi/enums/llm_provider.py](servers/fastapi/enums/llm_provider.py)
- [servers/fastapi/constants/llm.py](servers/fastapi/constants/llm.py)
- [servers/fastapi/utils/llm_provider.py](servers/fastapi/utils/llm_provider.py)
- [servers/fastapi/utils/model_availability.py](servers/fastapi/utils/model_availability.py)
- [servers/fastapi/utils/get_env.py](servers/fastapi/utils/get_env.py)
- [servers/fastapi/utils/available_models.py](servers/fastapi/utils/available_models.py)
- [servers/fastapi/utils/ollama.py](servers/fastapi/utils/ollama.py)
- [servers/fastapi/constants/supported_ollama_models.py](servers/fastapi/constants/supported_ollama_models.py)
- [servers/fastapi/api/v1/ppt/endpoints/ollama.py](servers/fastapi/api/v1/ppt/endpoints/ollama.py)
</details>

# Bring Your Own Brain: How LLM Providers Work

Presenton does not bundle an AI model and does not pick one for you. Instead, it reads a single environment variable — `LLM` — and routes every generation request to whichever AI provider and model you configured. This design is sometimes called BYOK (Bring Your Own Key) or BYOB (Bring Your Own Brain): you own the relationship with the AI vendor, you supply the credentials, and Presenton just calls it.

This page explains the 15 supported providers, how the application picks a default model when you do not specify one, what credentials each provider requires, and how the Ollama local-inference path works in more detail.

---

## The Provider Enum: All 15 Names in One Place

Every provider the app can talk to is listed in the `LLMProvider` Python enum:

```python
# servers/fastapi/enums/llm_provider.py
class LLMProvider(Enum):
    OLLAMA      = "ollama"
    OPENAI      = "openai"
    GOOGLE      = "google"
    VERTEX      = "vertex"
    AZURE       = "azure"
    BEDROCK     = "bedrock"
    OPENROUTER  = "openrouter"
    FIREWORKS   = "fireworks"
    TOGETHER    = "together"
    CEREBRAS    = "cerebras"
    ANTHROPIC   = "anthropic"
    LITELLM     = "litellm"
    LMSTUDIO    = "lmstudio"
    CUSTOM      = "custom"
    CODEX       = "codex"
```

Sources: [servers/fastapi/enums/llm_provider.py:4-19]()

The string value of each member — `"openai"`, `"ollama"`, and so on — is what you write in the `LLM` environment variable. The function `get_llm_provider()` reads `os.getenv("LLM")` and converts it to this enum; if the value does not match any member it raises an HTTP 500 with a human-readable error message listing all valid choices.

Sources: [servers/fastapi/utils/llm_provider.py:44-56](), [servers/fastapi/utils/get_env.py:49-51]()

---

## How the App Selects an Active Provider

The flow is simple: one environment variable, one enum lookup, done.

```text
Environment          FastAPI startup
─────────────────    ──────────────────────────────────────────────
LLM=anthropic   ──►  get_llm_provider_env()   → "anthropic"
                      LLMProvider("anthropic") → LLMProvider.ANTHROPIC
                      get_model()             → "claude-sonnet-4-20250514"
                                                 (or ANTHROPIC_MODEL if set)
```

`get_model()` in `utils/llm_provider.py` is the single function that resolves the active model name at runtime. It checks the provider-specific environment variable first (e.g. `ANTHROPIC_MODEL`), then falls back to the hardcoded default constant.

Sources: [servers/fastapi/utils/llm_provider.py:119-164]()

---

## Default Models Per Provider

When you set the `LLM` variable but do not supply a model name, Presenton uses these hardcoded defaults:

| Provider | Default model | Environment variable to override |
|---|---|---|
| `openai` | `gpt-4.1` | `OPENAI_MODEL` |
| `google` | `models/gemini-2.5-flash` | `GOOGLE_MODEL` |
| `vertex` | `gemini-2.5-flash` | `VERTEX_MODEL` |
| `azure` | `gpt-4.1` | `AZURE_OPENAI_MODEL` or `AZURE_OPENAI_DEPLOYMENT` |
| `bedrock` | `us.anthropic.claude-3-5-haiku-20241022-v1:0` | `BEDROCK_MODEL` |
| `openrouter` | `openai/gpt-4o` | `OPENROUTER_MODEL` |
| `fireworks` | `accounts/fireworks/models/llama-v3p1-8b-instruct` | `FIREWORKS_MODEL` |
| `together` | `openai/gpt-oss-20b` | `TOGETHER_MODEL` |
| `cerebras` | `llama-3.3-70b` | `CEREBRAS_MODEL` |
| `anthropic` | `claude-sonnet-4-20250514` | `ANTHROPIC_MODEL` |
| `litellm` | `gpt-4.1` | `LITELLM_MODEL` |
| `lmstudio` | `openai/gpt-oss-20b` | `LMSTUDIO_MODEL` |
| `codex` | `gpt-5.2` | `CODEX_MODEL` |
| `ollama` | *(none — required)* | `OLLAMA_MODEL` |
| `custom` | *(none — required)* | `CUSTOM_MODEL` |

The two local-inference providers — `ollama` and `custom` — have no default because the app cannot know what models are installed on your machine. Their model variables are mandatory.

Sources: [servers/fastapi/constants/llm.py:1-17](), [servers/fastapi/utils/llm_provider.py:119-164]()

---

## What BYOK Means in Practice

BYOK is straightforward: **no credentials live inside Presenton**. Every API key is read from the process environment via `os.getenv(...)`. The app validates on startup (when `CAN_CHANGE_KEYS` is not `"false"`) that the required variables are present, and for several providers it goes further and queries the provider's models API to verify that the configured model actually exists in your account.

### Startup Validation by Provider

```
providers that check key presence only:
  openai    → OPENAI_API_KEY
  google    → GOOGLE_API_KEY
  anthropic → ANTHROPIC_API_KEY
  openrouter→ OPENROUTER_API_KEY
  cerebras  → CEREBRAS_API_KEY

providers that also validate model availability:
  openai    → queries https://api.openai.com/v1/models
  google    → queries genai.Client().models.list()
  anthropic → queries https://api.anthropic.com/v1/models
  fireworks → queries provider's /v1/models endpoint
  together  → queries provider's /v1/models endpoint
  lmstudio  → queries local server's /v1/models endpoint
  custom    → queries CUSTOM_LLM_URL/v1/models

providers with special credential forms:
  azure     → AZURE_OPENAI_API_KEY + AZURE_OPENAI_API_VERSION
              + (AZURE_OPENAI_ENDPOINT or AZURE_OPENAI_BASE_URL)
  bedrock   → BEDROCK_MODEL (required) +
              BEDROCK_API_KEY  OR  (BEDROCK_AWS_ACCESS_KEY_ID + BEDROCK_AWS_SECRET_ACCESS_KEY)
  vertex    → VERTEX_API_KEY  OR  (VERTEX_PROJECT + optional VERTEX_LOCATION)
              (combining both raises an error)
  litellm   → LITELLM_BASE_URL + LITELLM_MODEL (no live model check)
  ollama    → OLLAMA_MODEL present + model in supported list + auto-pull on startup
```

Sources: [servers/fastapi/utils/model_availability.py:64-241]()

The `CAN_CHANGE_KEYS` flag exists for deployment scenarios (e.g. Docker images with pre-baked keys) where runtime key swapping should be disabled. When it is set to `"false"`, the startup checks are skipped.

Sources: [servers/fastapi/utils/model_availability.py:65-66](), [servers/fastapi/utils/get_env.py:10-11]()

---

## How Model Validation Works

For providers with a compatible OpenAI-style API (Fireworks, Together, LM Studio, and the `custom` provider), Presenton calls a shared utility that normalizes the base URL and uses the `AsyncOpenAI` SDK to list available models:

```python
# servers/fastapi/utils/available_models.py:64-75
async def list_available_openai_compatible_models(url: str, api_key: str) -> list[str]:
    url = normalize_openai_compatible_base_url(url)
    if is_together_api_base_url(url):
        return await list_together_models(url, api_key)
    effective_key = (api_key or "").strip() or "EMPTY"
    client = AsyncOpenAI(api_key=effective_key, base_url=url)
    models = (await client.models.list()).data
    ...
```

Together's API returns a non-standard payload, so it gets a dedicated `aiohttp`-based fetcher. Google and Anthropic have their own model listing functions that call their respective SDKs directly.

Sources: [servers/fastapi/utils/available_models.py:1-98]()

---

## The Ollama Path: Local Inference Without an API Key

Ollama is the only provider that requires no external account. It connects to a local Ollama process (default: `http://localhost:11434`, overridable with `OLLAMA_URL`).

### Supported Models

The supported Ollama model list is a hardcoded dictionary of `OllamaModelMetadata` objects grouped by model family:

| Family | Example models | Size range |
|---|---|---|
| Llama 3/3.1/3.2/3.3/4 | `llama3:8b`, `llama3.1:70b`, `llama4:16x17b` | 1.3 GB – 245 GB |
| Gemma 3 | `gemma3:1b` – `gemma3:27b` | 815 MB – 17 GB |
| Gemma 4 | `gemma4:e2b` – `gemma4:31b` (with quantized variants) | 7.2 GB – 63 GB |
| DeepSeek R1 | `deepseek-r1:1.5b` – `deepseek-r1:671b` | 1.1 GB – 404 GB |
| Qwen 3 | `qwen3:0.6b` – `qwen3:235b` | 523 MB – 142 GB |
| Qwen 3.5 | `qwen3.5:2b` – `qwen3.5:122b` | 2.7 GB – 81 GB |
| GPT-OSS | `gpt-oss:20b`, `gpt-oss:120b` | 14 GB – 65 GB |

All families are merged into the single `SUPPORTED_OLLAMA_MODELS` dict that the API and startup validator read.

Sources: [servers/fastapi/constants/supported_ollama_models.py:4-311]()

### Pull-on-Startup

When the `ollama` provider is active and `CAN_CHANGE_KEYS` is not `"false"`, startup automatically pulls the model from the Ollama registry if it is not already present:

```python
# servers/fastapi/utils/model_availability.py:213-226
elif is_ollama_selected():
    ollama_model = get_ollama_model_env()
    if ollama_model not in SUPPORTED_OLLAMA_MODELS:
        raise Exception(f"Model {ollama_model} is not supported")
    async for event in pull_ollama_model(ollama_model):
        print(event)
```

The pull streams NDJSON progress events from the Ollama API (`POST /api/pull`) and prints them to stdout. The REST endpoint (`GET /api/v1/ppt/ollama/model/pull`) exposes the same capability for on-demand pulls from the UI, tracking pull status in a database table (`OllamaPullStatus`) to survive across requests.

Sources: [servers/fastapi/utils/ollama.py:10-31](), [servers/fastapi/api/v1/ppt/endpoints/ollama.py:28-85]()

---

## The `custom` and `litellm` Escape Hatches

Two providers exist specifically to connect Presenton to infrastructure you control:

- **`custom`**: Point `CUSTOM_LLM_URL` at any server that speaks the OpenAI `/v1` protocol (vLLM, Ollama in API mode, a local proxy). Supply `CUSTOM_MODEL` and optionally `CUSTOM_LLM_API_KEY`. Startup queries `/v1/models` to confirm the model is reachable.

- **`litellm`**: Point `LITELLM_BASE_URL` at a running [LiteLLM](https://github.com/BerriAI/litellm) proxy. LiteLLM itself can fan out to hundreds of providers, so this path effectively extends Presenton's provider list to anything LiteLLM supports. A model name (`LITELLM_MODEL`) is required; no live model check is performed at startup because LiteLLM proxies vary in how they respond to `/v1/models`.

Sources: [servers/fastapi/utils/model_availability.py:178-197](), [servers/fastapi/utils/get_env.py:205-215]()

---

## Provider Architecture Overview

```text
┌─────────────────────────────────────────────────────────────┐
│  Environment                                                │
│  LLM=<provider>   *_API_KEY=<key>   *_MODEL=<name>         │
└────────────────────────┬────────────────────────────────────┘
                         │ os.getenv()
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  FastAPI startup  (model_availability.py)                   │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  get_llm_provider()  →  LLMProvider enum value       │  │
│  │  Validate credentials present                        │  │
│  │  (optional) query provider /models API               │  │
│  │  Ollama: auto-pull if not local                      │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────────┬────────────────────────────────────┘
                         │ all OK → serve requests
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Request path  (utils/llm_provider.py → get_model())        │
│  env model var set?  ─yes─►  use it                        │
│        │ no                                                 │
│        └──────────────►  DEFAULT_*_MODEL constant          │
└────────────────────────┬────────────────────────────────────┘
                         │
          ┌──────────────┼──────────────────────────┐
          ▼              ▼                           ▼
  Cloud APIs        Local servers           Proxy layers
  (OpenAI, Google,  (Ollama :11434,         (LiteLLM, OpenRouter,
   Anthropic,        LM Studio :1234,        custom URL)
   Bedrock, …)       custom URL)
```

---

## Quick Reference: Required Environment Variables

| Provider (`LLM=`) | Required | Optional / Alternative |
|---|---|---|
| `openai` | `OPENAI_API_KEY` | `OPENAI_MODEL` |
| `google` | `GOOGLE_API_KEY` | `GOOGLE_MODEL` |
| `vertex` | `VERTEX_API_KEY` **or** `VERTEX_PROJECT` | `VERTEX_LOCATION`, `VERTEX_MODEL` |
| `azure` | `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_API_VERSION`, `AZURE_OPENAI_ENDPOINT` or `AZURE_OPENAI_BASE_URL` | `AZURE_OPENAI_MODEL`, `AZURE_OPENAI_DEPLOYMENT` |
| `bedrock` | `BEDROCK_MODEL` + (`BEDROCK_API_KEY` or access+secret key pair) | `BEDROCK_REGION`, `BEDROCK_AWS_SESSION_TOKEN` |
| `openrouter` | `OPENROUTER_API_KEY` | `OPENROUTER_MODEL` |
| `fireworks` | `FIREWORKS_API_KEY`, `FIREWORKS_MODEL` | `FIREWORKS_BASE_URL` |
| `together` | `TOGETHER_API_KEY`, `TOGETHER_MODEL` | `TOGETHER_BASE_URL` |
| `cerebras` | `CEREBRAS_API_KEY` | `CEREBRAS_MODEL`, `CEREBRAS_BASE_URL` |
| `anthropic` | `ANTHROPIC_API_KEY` | `ANTHROPIC_MODEL` |
| `litellm` | `LITELLM_BASE_URL`, `LITELLM_MODEL` | `LITELLM_API_KEY` |
| `lmstudio` | `LMSTUDIO_MODEL` | `LMSTUDIO_BASE_URL` (default: `localhost:1234`), `LMSTUDIO_API_KEY` |
| `ollama` | `OLLAMA_MODEL` (must be in supported list) | `OLLAMA_URL` (default: `localhost:11434`) |
| `custom` | `CUSTOM_LLM_URL`, `CUSTOM_MODEL` | `CUSTOM_LLM_API_KEY` |
| `codex` | OAuth tokens (`CODEX_ACCESS_TOKEN`, etc.) | `CODEX_MODEL` |

Sources: [servers/fastapi/utils/get_env.py:49-317](), [servers/fastapi/utils/model_availability.py:64-241]()

---

## Summary

Presenton treats the AI layer as a pure plug-in: the `LLM` environment variable selects one of 15 providers, credentials stay in environment variables you manage, and the app validates everything at startup before accepting requests. For most cloud providers a sensible default model is pre-configured so you only need one key to get started; for local inference with Ollama the app will even download the model for you. The `custom` and `litellm` providers extend this to any OpenAI-compatible server, making the provider list effectively unlimited. All of this is implemented in under 200 lines across `enums/llm_provider.py`, `constants/llm.py`, `utils/llm_provider.py`, and `utils/model_availability.py`.

Sources: [servers/fastapi/utils/llm_provider.py:119-164]()
