# Cookbook environment setup

> Shared uv/Docker setup for all backends: HF auth, CUDA backend tags, Cosmos Framework clone/sync, Diffusers venv, vLLM + vllm-cosmos3 plugin, vLLM-Omni Docker image, and GPU verification probes.

- Repository: NVIDIA/cosmos
- GitHub: https://github.com/NVIDIA/cosmos
- Human docs: https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9
- Complete Markdown: https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/llms-full.txt

## Source Files

- `cookbooks/cosmos3/README.md`
- `README.md`
- `cookbooks/cosmos3/generator/audiovisual/run_with_cosmos_framework.ipynb`
- `cookbooks/cosmos3/reasoner/run_with_cosmos_framework.ipynb`
- `.gitignore`

---

---
title: "Cookbook environment setup"
description: "Shared uv/Docker setup for all backends: HF auth, CUDA backend tags, Cosmos Framework clone/sync, Diffusers venv, vLLM + vllm-cosmos3 plugin, vLLM-Omni Docker image, and GPU verification probes."
---

`cookbooks/cosmos3/README.md` is the canonical environment guide for every Cosmos3 Reasoner and Generator notebook. Each backend uses a separate install path (framework checkout under `packages/`, a repo-root `.venv` for Diffusers or vLLM, or the `vllm/vllm-omni:cosmos3` image). Pick one backend, complete its section, then run the cookbook that links to it.

## Backend map

| Backend | Install surface | Primary cookbooks |
| --- | --- | --- |
| Cosmos Framework | `packages/cosmos3/.venv` via `uv sync --group=cu130-train` or `cu128-train` | Reasoner, Generator (audiovisual, action) |
| Diffusers | Repo-root `.venv` via `uv pip install --torch-backend=…` | Generator (audiovisual) |
| Transformers | Coming soon | Reasoner |
| vLLM + `vllm-cosmos3` | Repo-root `.venv` | Reasoner |
| vLLM-Omni | Docker `vllm/vllm-omni:cosmos3` (or PR-branch venv) | Generator (audiovisual, action) |

```mermaid
flowchart TB
  subgraph cosmos_repo["cosmos repo checkout"]
    readme["cookbooks/cosmos3/README.md"]
    nb["*.ipynb cookbooks"]
  end
  subgraph fw["packages/cosmos3 — cosmos-framework"]
    uv_sync["uv sync --group=cu130-train | cu128-train"]
    fw_venv[".venv — torchrun / python -m cosmos_framework.scripts.inference"]
  end
  subgraph local_venv["repo-root .venv"]
    diff["Diffusers + Cosmos3OmniPipeline"]
    vllm_r["vllm + vllm-cosmos3 plugin"]
  end
  subgraph docker["Docker"]
    omni["vllm/vllm-omni:cosmos3 — vllm serve --omni"]
  end
  hf["Hugging Face gated models"]
  readme --> fw
  readme --> local_venv
  readme --> docker
  nb --> fw_venv
  nb --> diff
  nb --> vllm_r
  nb --> omni
  uv_sync --> fw_venv
  hf --> fw_venv
  hf --> diff
  hf --> omni
```

<Note>
The framework checkout lives under `packages/` (gitignored). Notebooks resolve `COSMOS3_REPO` from `packages/cosmos3` or `packages/cosmos-framework` when `pyproject.toml` and `cosmos_framework` are present.
</Note>

## Prerequisites

| Requirement | Detail |
| --- | --- |
| OS / GPU | Linux with NVIDIA GPU access |
| Tools | `uv`, `git`, `git-lfs` |
| Hugging Face | Gated Cosmos3 model access; authenticate before first download |
| Framework / vLLM plugin | SSH access to `git@github.com:NVIDIA/cosmos-framework.git` when cloning the framework or installing `vllm-cosmos3` from that repo |
| Disk | Tens of GiB for venvs, `uv` cache, and model weights |

<Steps>
<Step title="Authenticate to Hugging Face">

```bash
uvx hf@latest auth login
```

Or set a token for non-interactive runs:

```bash
export HF_TOKEN=<your_token>
```

Optional: redirect the model cache to a larger disk with `HF_HOME`.

</Step>
<Step title="Confirm uv version">

Cosmos Framework and the notebooks require **`uv >= 0.11.3`**. Older `uv` builds fail on `[tool.uv.audit]` and may not accept `--torch-backend=cu130`.

```bash
uv self update
```

</Step>
<Step title="Match CUDA backend tags to the driver">

Several backends pin a CUDA build of `torch` / `vllm` that must match the NVIDIA driver. Do not rely on `--torch-backend=auto` for vLLM cookbook installs.

| Driver CUDA | Backend tag | vLLM pin (Reasoner) |
| --- | --- | --- |
| 13.x | `cu130` | `vllm==0.21.0` |
| 12.x | `cu128` | `vllm==0.19.1` |

Framework notebooks use dependency groups `cu130-train` (default) or `cu128-train` instead of bare `cu130` / `cu128`.

</Step>
</Steps>

## Shared environment variables

| Variable | Default | When to override |
| --- | --- | --- |
| `COSMOS3_UV_GROUP` | `cu130-train` | `cu128-train` on CUDA 12.x drivers (Framework notebooks) |
| `COSMOS3_TORCH_BACKEND` | `cu130` | `cu128` for Diffusers notebook installs |
| `COSMOS3_REPO` | Auto: `packages/cosmos3` or `packages/cosmos-framework` | Custom framework checkout path |
| `HF_HOME` | `~/.cache/huggingface` | Shared or high-capacity cache |
| `GIT_LFS_SKIP_SMUDGE` | unset | Set to `1` during Framework `uv sync` to skip optional LFS test blobs |
| `VLLM_USE_DEEP_GEMM` | enabled in build | `export VLLM_USE_DEEP_GEMM=0` if DeepGEMM is unavailable |
| `UV_PROJECT_ENVIRONMENT` | Framework `.venv` path | Separate venv location for large installs |
| `CUDA_VISIBLE_DEVICES` | Notebook-specific | GPU selection for inference |

## Cosmos Framework

Native PyTorch inference uses a **cosmos-framework** checkout. From the `cosmos` repo root:

```bash
mkdir -p packages
git clone https://github.com/NVIDIA/cosmos-framework.git packages/cosmos3
cd packages/cosmos3
```

Inference imports training extras today, so sync the **`*-train`** group that matches your driver:

```bash
export GIT_LFS_SKIP_SMUDGE=1

# CUDA 13 driver (default):
uv sync --all-extras --group=cu130-train

# CUDA 12.x driver:
# uv sync --all-extras --group=cu128-train
```

Result: `packages/cosmos3/.venv`. Run commands after `source .venv/bin/activate` or via `.venv/bin/python` / `.venv/bin/torchrun`.

<Tip>
Set `export COSMOS3_UV_GROUP=cu128-train` before launching Framework notebooks on CUDA 12.x systems so install cells pick the correct group.
</Tip>

## Diffusers

Generator audiovisual notebooks use a **repo-root** managed Python 3.13 venv:

```bash
uv venv --python 3.13 --seed --managed-python
source .venv/bin/activate

uv pip install --torch-backend=cu130 \
  "diffusers @ git+https://github.com/huggingface/diffusers.git" \
  accelerate \
  av \
  cosmos_guardrail \
  huggingface_hub \
  imageio \
  imageio-ffmpeg \
  torch \
  torchvision \
  transformers
```

For CUDA 12.x, use `--torch-backend=cu128` instead of `cu130`. The root README quickstart uses `--torch-backend=auto` for Diffusers only; cookbook notebooks pin `COSMOS3_TORCH_BACKEND` explicitly.

<Warning>
On headless hosts, imports may fail with `libxcb.so.1`. Install `libxcb1`, `libgl1`, and `libglib2.0-0` before running pipelines (see [Troubleshooting](/troubleshooting)).
</Warning>

## Transformers

Transformers-based Reasoner inference is documented as **coming soon** in the cookbooks guide; no install steps are published yet.

## vLLM (Reasoner)

OpenAI-compatible **reasoning** serving requires vLLM plus the **`vllm-cosmos3`** plugin (registers `Cosmos3ReasonerForConditionalGeneration`):

```bash
uv venv --python 3.13 --seed --managed-python
source .venv/bin/activate

# CUDA 13 driver:
uv pip install --torch-backend=cu130 "vllm==0.21.0" \
  "vllm-cosmos3 @ git+https://github.com/NVIDIA/cosmos-framework.git#subdirectory=packages/vllm-cosmos3"

# CUDA 12.x driver:
# uv pip install --torch-backend=cu128 "vllm==0.19.1" \
#   "vllm-cosmos3 @ git+https://github.com/NVIDIA/cosmos-framework.git#subdirectory=packages/vllm-cosmos3"
```

If DeepGEMM is unavailable in your build:

```bash
export VLLM_USE_DEEP_GEMM=0
```

<Info>
When launching `.venv/bin/vllm` without activating the venv, keep `.venv/bin` on `PATH` so FlashInfer’s JIT build can find `ninja` in the venv.
</Info>

## vLLM-Omni (Generator)

The recommended path is the prebuilt image **`vllm/vllm-omni:cosmos3`** (all modalities in the cookbooks):

```bash
docker pull vllm/vllm-omni:cosmos3
```

**Cosmos3-Nano** (port 8000):

```bash
docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -v "$(pwd):/workspace" \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-omni:cosmos3 \
  vllm serve nvidia/Cosmos3-Nano \
  --omni \
  --model-class-name Cosmos3OmniDiffusersPipeline \
  --allowed-local-media-path / \
  --port 8000
```

**Cosmos3-Super** (tensor parallel + optional layer offload):

```bash
docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -v "$(pwd):/workspace" \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-omni:cosmos3 \
  vllm serve nvidia/Cosmos3-Super \
  --omni \
  --model-class-name Cosmos3OmniDiffusersPipeline \
  --allowed-local-media-path / \
  --tensor-parallel-size 4 \
  --enable-layerwise-offload \
  --port 8000
```

If you installed vLLM-Omni from the upstreaming PR branch instead, run the same `vllm serve ... --omni ...` command on the host without the Docker wrapper.

## Verify the environment

### PyTorch GPU probe (Framework, Diffusers, vLLM venvs)

```bash
.venv/bin/python - <<'PY'
import torch

print("torch:", torch.__version__)
print("torch cuda:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
print("device count:", torch.cuda.device_count())
if torch.cuda.is_available():
    print("device 0:", torch.cuda.get_device_name(0))
PY
```

Success: `cuda available: True` and a valid device name. `False` usually means a `cu130` wheel on a CUDA 12.x driver — switch to `cu128` / `cu128-train` per the tables above.

### vLLM / vLLM-Omni server probe

With the server listening on port 8000:

```bash
curl http://localhost:8000/v1/models
```

vLLM-Omni logs `Application startup complete.` when the API is ready.

## Runtime layout and ignored paths

```text
cosmos/                          # this repo
├── cookbooks/cosmos3/
│   ├── README.md                # environment guide (this page’s source)
│   ├── generator/…/*.ipynb
│   └── reasoner/…/*.ipynb
├── packages/                    # gitignored — framework clone
│   └── cosmos3/
│       ├── .venv/
│       └── cosmos_framework/
└── .venv/                       # gitignored — Diffusers or vLLM (repo root)
```

`.gitignore` excludes `packages/`, `.venv`, cookbook `outputs/`, and `**/env.sh` (machine-specific secrets).

## Related pages

<CardGroup>
<Card title="Installation" href="/installation">
Prerequisites, CUDA pairing, and top-level verification from the root README.
</Card>
<Card title="Quickstart" href="/quickstart">
Minimal first-run commands after environment setup.
</Card>
<Card title="Choose an integration" href="/choose-integration">
Pick Diffusers, vLLM-Omni, vLLM, or Cosmos Framework by goal.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Driver mismatches, `uv` version errors, libxcb, and DeepGEMM workarounds.
</Card>
</CardGroup>
