# Cookbook environment setup > Shared uv/Docker setup for all backends: HF auth, CUDA backend tags, Cosmos Framework clone/sync, Diffusers venv, vLLM + vllm-cosmos3 plugin, vLLM-Omni Docker image, and GPU verification probes. - Repository: NVIDIA/cosmos - GitHub: https://github.com/NVIDIA/cosmos - Human docs: https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9 - Complete Markdown: https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/llms-full.txt ## Source Files - `cookbooks/cosmos3/README.md` - `README.md` - `cookbooks/cosmos3/generator/audiovisual/run_with_cosmos_framework.ipynb` - `cookbooks/cosmos3/reasoner/run_with_cosmos_framework.ipynb` - `.gitignore` --- --- title: "Cookbook environment setup" description: "Shared uv/Docker setup for all backends: HF auth, CUDA backend tags, Cosmos Framework clone/sync, Diffusers venv, vLLM + vllm-cosmos3 plugin, vLLM-Omni Docker image, and GPU verification probes." --- `cookbooks/cosmos3/README.md` is the canonical environment guide for every Cosmos3 Reasoner and Generator notebook. Each backend uses a separate install path (framework checkout under `packages/`, a repo-root `.venv` for Diffusers or vLLM, or the `vllm/vllm-omni:cosmos3` image). Pick one backend, complete its section, then run the cookbook that links to it. ## Backend map | Backend | Install surface | Primary cookbooks | | --- | --- | --- | | Cosmos Framework | `packages/cosmos3/.venv` via `uv sync --group=cu130-train` or `cu128-train` | Reasoner, Generator (audiovisual, action) | | Diffusers | Repo-root `.venv` via `uv pip install --torch-backend=…` | Generator (audiovisual) | | Transformers | Coming soon | Reasoner | | vLLM + `vllm-cosmos3` | Repo-root `.venv` | Reasoner | | vLLM-Omni | Docker `vllm/vllm-omni:cosmos3` (or PR-branch venv) | Generator (audiovisual, action) | ```mermaid flowchart TB subgraph cosmos_repo["cosmos repo checkout"] readme["cookbooks/cosmos3/README.md"] nb["*.ipynb cookbooks"] end subgraph fw["packages/cosmos3 — cosmos-framework"] uv_sync["uv sync --group=cu130-train | cu128-train"] fw_venv[".venv — torchrun / python -m cosmos_framework.scripts.inference"] end subgraph local_venv["repo-root .venv"] diff["Diffusers + Cosmos3OmniPipeline"] vllm_r["vllm + vllm-cosmos3 plugin"] end subgraph docker["Docker"] omni["vllm/vllm-omni:cosmos3 — vllm serve --omni"] end hf["Hugging Face gated models"] readme --> fw readme --> local_venv readme --> docker nb --> fw_venv nb --> diff nb --> vllm_r nb --> omni uv_sync --> fw_venv hf --> fw_venv hf --> diff hf --> omni ``` The framework checkout lives under `packages/` (gitignored). Notebooks resolve `COSMOS3_REPO` from `packages/cosmos3` or `packages/cosmos-framework` when `pyproject.toml` and `cosmos_framework` are present. ## Prerequisites | Requirement | Detail | | --- | --- | | OS / GPU | Linux with NVIDIA GPU access | | Tools | `uv`, `git`, `git-lfs` | | Hugging Face | Gated Cosmos3 model access; authenticate before first download | | Framework / vLLM plugin | SSH access to `git@github.com:NVIDIA/cosmos-framework.git` when cloning the framework or installing `vllm-cosmos3` from that repo | | Disk | Tens of GiB for venvs, `uv` cache, and model weights | ```bash uvx hf@latest auth login ``` Or set a token for non-interactive runs: ```bash export HF_TOKEN= ``` Optional: redirect the model cache to a larger disk with `HF_HOME`. Cosmos Framework and the notebooks require **`uv >= 0.11.3`**. Older `uv` builds fail on `[tool.uv.audit]` and may not accept `--torch-backend=cu130`. ```bash uv self update ``` Several backends pin a CUDA build of `torch` / `vllm` that must match the NVIDIA driver. Do not rely on `--torch-backend=auto` for vLLM cookbook installs. | Driver CUDA | Backend tag | vLLM pin (Reasoner) | | --- | --- | --- | | 13.x | `cu130` | `vllm==0.21.0` | | 12.x | `cu128` | `vllm==0.19.1` | Framework notebooks use dependency groups `cu130-train` (default) or `cu128-train` instead of bare `cu130` / `cu128`. ## Shared environment variables | Variable | Default | When to override | | --- | --- | --- | | `COSMOS3_UV_GROUP` | `cu130-train` | `cu128-train` on CUDA 12.x drivers (Framework notebooks) | | `COSMOS3_TORCH_BACKEND` | `cu130` | `cu128` for Diffusers notebook installs | | `COSMOS3_REPO` | Auto: `packages/cosmos3` or `packages/cosmos-framework` | Custom framework checkout path | | `HF_HOME` | `~/.cache/huggingface` | Shared or high-capacity cache | | `GIT_LFS_SKIP_SMUDGE` | unset | Set to `1` during Framework `uv sync` to skip optional LFS test blobs | | `VLLM_USE_DEEP_GEMM` | enabled in build | `export VLLM_USE_DEEP_GEMM=0` if DeepGEMM is unavailable | | `UV_PROJECT_ENVIRONMENT` | Framework `.venv` path | Separate venv location for large installs | | `CUDA_VISIBLE_DEVICES` | Notebook-specific | GPU selection for inference | ## Cosmos Framework Native PyTorch inference uses a **cosmos-framework** checkout. From the `cosmos` repo root: ```bash mkdir -p packages git clone https://github.com/NVIDIA/cosmos-framework.git packages/cosmos3 cd packages/cosmos3 ``` Inference imports training extras today, so sync the **`*-train`** group that matches your driver: ```bash export GIT_LFS_SKIP_SMUDGE=1 # CUDA 13 driver (default): uv sync --all-extras --group=cu130-train # CUDA 12.x driver: # uv sync --all-extras --group=cu128-train ``` Result: `packages/cosmos3/.venv`. Run commands after `source .venv/bin/activate` or via `.venv/bin/python` / `.venv/bin/torchrun`. Set `export COSMOS3_UV_GROUP=cu128-train` before launching Framework notebooks on CUDA 12.x systems so install cells pick the correct group. ## Diffusers Generator audiovisual notebooks use a **repo-root** managed Python 3.13 venv: ```bash uv venv --python 3.13 --seed --managed-python source .venv/bin/activate uv pip install --torch-backend=cu130 \ "diffusers @ git+https://github.com/huggingface/diffusers.git" \ accelerate \ av \ cosmos_guardrail \ huggingface_hub \ imageio \ imageio-ffmpeg \ torch \ torchvision \ transformers ``` For CUDA 12.x, use `--torch-backend=cu128` instead of `cu130`. The root README quickstart uses `--torch-backend=auto` for Diffusers only; cookbook notebooks pin `COSMOS3_TORCH_BACKEND` explicitly. On headless hosts, imports may fail with `libxcb.so.1`. Install `libxcb1`, `libgl1`, and `libglib2.0-0` before running pipelines (see [Troubleshooting](/troubleshooting)). ## Transformers Transformers-based Reasoner inference is documented as **coming soon** in the cookbooks guide; no install steps are published yet. ## vLLM (Reasoner) OpenAI-compatible **reasoning** serving requires vLLM plus the **`vllm-cosmos3`** plugin (registers `Cosmos3ReasonerForConditionalGeneration`): ```bash uv venv --python 3.13 --seed --managed-python source .venv/bin/activate # CUDA 13 driver: uv pip install --torch-backend=cu130 "vllm==0.21.0" \ "vllm-cosmos3 @ git+https://github.com/NVIDIA/cosmos-framework.git#subdirectory=packages/vllm-cosmos3" # CUDA 12.x driver: # uv pip install --torch-backend=cu128 "vllm==0.19.1" \ # "vllm-cosmos3 @ git+https://github.com/NVIDIA/cosmos-framework.git#subdirectory=packages/vllm-cosmos3" ``` If DeepGEMM is unavailable in your build: ```bash export VLLM_USE_DEEP_GEMM=0 ``` When launching `.venv/bin/vllm` without activating the venv, keep `.venv/bin` on `PATH` so FlashInfer’s JIT build can find `ninja` in the venv. ## vLLM-Omni (Generator) The recommended path is the prebuilt image **`vllm/vllm-omni:cosmos3`** (all modalities in the cookbooks): ```bash docker pull vllm/vllm-omni:cosmos3 ``` **Cosmos3-Nano** (port 8000): ```bash docker run --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -v "$(pwd):/workspace" \ -p 8000:8000 \ --ipc=host \ vllm/vllm-omni:cosmos3 \ vllm serve nvidia/Cosmos3-Nano \ --omni \ --model-class-name Cosmos3OmniDiffusersPipeline \ --allowed-local-media-path / \ --port 8000 ``` **Cosmos3-Super** (tensor parallel + optional layer offload): ```bash docker run --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -v "$(pwd):/workspace" \ -p 8000:8000 \ --ipc=host \ vllm/vllm-omni:cosmos3 \ vllm serve nvidia/Cosmos3-Super \ --omni \ --model-class-name Cosmos3OmniDiffusersPipeline \ --allowed-local-media-path / \ --tensor-parallel-size 4 \ --enable-layerwise-offload \ --port 8000 ``` If you installed vLLM-Omni from the upstreaming PR branch instead, run the same `vllm serve ... --omni ...` command on the host without the Docker wrapper. ## Verify the environment ### PyTorch GPU probe (Framework, Diffusers, vLLM venvs) ```bash .venv/bin/python - <<'PY' import torch print("torch:", torch.__version__) print("torch cuda:", torch.version.cuda) print("cuda available:", torch.cuda.is_available()) print("device count:", torch.cuda.device_count()) if torch.cuda.is_available(): print("device 0:", torch.cuda.get_device_name(0)) PY ``` Success: `cuda available: True` and a valid device name. `False` usually means a `cu130` wheel on a CUDA 12.x driver — switch to `cu128` / `cu128-train` per the tables above. ### vLLM / vLLM-Omni server probe With the server listening on port 8000: ```bash curl http://localhost:8000/v1/models ``` vLLM-Omni logs `Application startup complete.` when the API is ready. ## Runtime layout and ignored paths ```text cosmos/ # this repo ├── cookbooks/cosmos3/ │ ├── README.md # environment guide (this page’s source) │ ├── generator/…/*.ipynb │ └── reasoner/…/*.ipynb ├── packages/ # gitignored — framework clone │ └── cosmos3/ │ ├── .venv/ │ └── cosmos_framework/ └── .venv/ # gitignored — Diffusers or vLLM (repo root) ``` `.gitignore` excludes `packages/`, `.venv`, cookbook `outputs/`, and `**/env.sh` (machine-specific secrets). ## Related pages Prerequisites, CUDA pairing, and top-level verification from the root README. Minimal first-run commands after environment setup. Pick Diffusers, vLLM-Omni, vLLM, or Cosmos Framework by goal. Driver mismatches, `uv` version errors, libxcb, and DeepGEMM workarounds.