# Installation

> Prerequisites (Linux, NVIDIA GPU, uv, Hugging Face auth), CUDA driver pairing (cu130/cu128), venv and Docker setup paths, and environment verification commands.

- Repository: NVIDIA/cosmos
- GitHub: https://github.com/NVIDIA/cosmos
- Human docs: https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9
- Complete Markdown: https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/llms-full.txt

## Source Files

- `README.md`
- `cookbooks/cosmos3/README.md`
- `cookbooks/cosmos3/generator/audiovisual/run_with_diffusers.ipynb`
- `cookbooks/cosmos3/reasoner/run_with_vllm.ipynb`
- `cookbooks/cosmos3/generator/audiovisual/run_with_vllm_omni.ipynb`
- `.gitignore`

---

---
title: "Installation"
description: "Prerequisites (Linux, NVIDIA GPU, uv, Hugging Face auth), CUDA driver pairing (cu130/cu128), venv and Docker setup paths, and environment verification commands."
---

Cosmos 3 installs through **uv-managed Python 3.13 virtual environments** for Diffusers, Cosmos Framework, and vLLM Reasoner paths, or through the **`vllm/vllm-omni:cosmos3` Docker image** for Generator production serving. Every path requires **Linux**, **NVIDIA GPU access**, **Hugging Face authentication** for gated checkpoints, and a **CUDA driver / PyTorch backend pair** (`cu130` or `cu128`) that matches what `nvidia-smi` reports.

## Prerequisites

| Requirement | Details |
| --- | --- |
| Operating system | Linux (documented and tested for Cosmos 3) |
| GPU | NVIDIA Ampere, Hopper, or Blackwell with working driver |
| Package manager | [`uv`](https://docs.astral.sh/uv/getting-started/installation/) **≥ 0.11.3** (older versions fail on framework `pyproject.toml` and newer `--torch-backend` values such as `cu130`) |
| Version control | `git` and `git-lfs` |
| Model access | Hugging Face account with access to gated Cosmos 3 repos |
| Disk | Tens of GiB for venv, uv cache, and model weights (Nano downloads plus CUDA deps) |

<Warning>
Upgrade `uv` before any install if you see `a value is required for '--torch-backend'` or accepted-values lists stopping at `cu129`: run `uv self update` or reinstall from https://astral.sh/uv.
</Warning>

### Hugging Face authentication

Authenticate once before the first model download:

```bash
uvx hf@latest auth login
```

Alternatively set a token in the environment:

```bash
export HF_TOKEN=<your_token>
```

<ParamField body="HF_HOME" type="string">
Optional cache root for checkpoints. Notebooks default to `~/.cache/huggingface`; point this at a volume with enough space for multi‑tens‑of‑GiB downloads.
</ParamField>

### Cosmos Framework checkout access

Cosmos Framework and vLLM Reasoner installs pull from `NVIDIA/cosmos-framework` (HTTPS or SSH). For SSH:

```bash
export COSMOS3_GIT_URL=git@github.com:NVIDIA/cosmos-framework.git
```

The cookbook clones into `packages/cosmos3` at the repo root; that directory is **gitignored** and created locally.

## CUDA driver and backend pairing

System CUDA and PyTorch’s CUDA build must align. Confirm with:

```bash
nvidia-smi
python -c "import torch; print(torch.version.cuda)"
```

| Driver CUDA | Backend tag | Typical pairing |
| --- | --- | --- |
| 13.x | `cu130` | `vllm==0.21.0`, `COSMOS3_UV_GROUP=cu130-train`, Diffusers `--torch-backend=cu130` |
| 12.x | `cu128` | `vllm==0.19.1`, `COSMOS3_UV_GROUP=cu128-train`, Diffusers `--torch-backend=cu128` |

<Note>
CUDA **13** is recommended; **12.8** is supported. vLLM does not publish wheels for every CUDA minor version — **do not rely on `--torch-backend=auto` for vLLM**; pick the explicit pair above.
</Note>

For Diffusers-only venvs, `--torch-backend=auto` lets uv match the driver. Without it, uv may install the newest CUDA wheel (`cu130`), which fails on older drivers with `The NVIDIA driver on your system is too old` and `torch.cuda.is_available()` → `False`.

### NGC base containers (optional)

When using NVIDIA NGC PyTorch images instead of bare-metal uv:

| CUDA | Container |
| --- | --- |
| 13 | `nvcr.io/nvidia/pytorch:25.09-py3` |
| 12 | `nvcr.io/nvidia/pytorch:25.06-py3` |

## Setup paths overview

```text
                    ┌─────────────────────────────────────┐
                    │  Linux + NVIDIA GPU + HF auth       │
                    └─────────────────┬───────────────────┘
                                      │
          ┌───────────────────────────┼───────────────────────────┐
          │                           │                           │
          v                           v                           v
   ┌──────────────┐           ┌──────────────┐            ┌─────────────────┐
   │ Diffusers    │           │ Cosmos       │            │ vLLM Reasoner   │
   │ .venv (uv)   │           │ Framework    │            │ .venv (uv)      │
   │ Generator    │           │ packages/    │            │ + vllm-cosmos3  │
   │ research     │           │ cosmos3/     │            │ plugin          │
   └──────────────┘           └──────────────┘            └─────────────────┘
          │                           │
          │                           │
          └───────────────┬───────────┘
                          v
                 ┌────────────────────┐
                 │ vLLM-Omni Docker   │
                 │ vllm/vllm-omni:    │
                 │ cosmos3            │
                 │ Generator serving  │
                 └────────────────────┘
```

| Path | Surface | Install surface | Default artifact |
| --- | --- | --- | --- |
| Diffusers venv | Generator | `uv venv` + `uv pip install` at repo root or `COSMOS3_DIFFUSERS_VENV` | `.venv` or `.venv-cosmos3-diffusers` |
| Cosmos Framework | Generator, Reasoner | `git clone` + `uv sync --group=cu130-train` | `packages/cosmos3/.venv` |
| vLLM venv | Reasoner | `uv venv` + `vllm` + `vllm-cosmos3` | `.venv` at repo root |
| vLLM-Omni Docker | Generator | `docker pull` + `docker run` | Prebuilt image, port 8000 |

Backend-specific runbooks live under `cookbooks/cosmos3/README.md`; this page covers shared install mechanics.

## Virtual environment setup

<Tabs>
<Tab title="Diffusers (Generator)">

<Steps>
<Step title="Create and activate venv">

From the `cosmos` repo root:

```bash
uv venv --python 3.13 --seed --managed-python
source .venv/bin/activate
```

Notebooks may use a dedicated path:

```bash
export COSMOS3_DIFFUSERS_VENV=/path/to/.venv-cosmos3-diffusers
export COSMOS3_TORCH_BACKEND=cu130   # or cu128 on CUDA 12.x drivers
```

</Step>
<Step title="Install dependencies">

```bash
uv pip install --torch-backend=cu130 \
  "diffusers @ git+https://github.com/huggingface/diffusers.git" \
  accelerate \
  av \
  cosmos_guardrail \
  huggingface_hub \
  imageio \
  imageio-ffmpeg \
  torch \
  torchvision \
  transformers
```

For driver auto-detection (Diffusers only): replace `--torch-backend=cu130` with `--torch-backend=auto`.

</Step>
</Steps>

</Tab>
<Tab title="Cosmos Framework">

<Steps>
<Step title="Clone framework">

```bash
mkdir -p packages
git clone https://github.com/NVIDIA/cosmos-framework.git packages/cosmos3
cd packages/cosmos3
```

Or set `COSMOS3_REPO` / `COSMOS3_GIT_URL` before notebook cells run the clone for you.

</Step>
<Step title="Sync training extras">

Inference imports training extras today; use the `*-train` group matching your driver:

```bash
export GIT_LFS_SKIP_SMUDGE=1

# CUDA 13 driver (default):
uv sync --all-extras --group=cu130-train

# CUDA 12.x driver:
# uv sync --all-extras --group=cu128-train
```

Notebooks honor `COSMOS3_UV_GROUP` (default `cu130-train`). Export `COSMOS3_UV_GROUP=cu128-train` on CUDA 12.x before launch.

</Step>
<Step title="Use the venv">

Activate `packages/cosmos3/.venv` or call `.venv/bin/python` / `.venv/bin/torchrun` by absolute path.

</Step>
</Steps>

</Tab>
<Tab title="vLLM (Reasoner)">

<Steps>
<Step title="Create venv">

```bash
uv venv --python 3.13 --seed --managed-python
source .venv/bin/activate
```

</Step>
<Step title="Install vLLM and plugin">

```bash
# CUDA 13 driver:
uv pip install --torch-backend=cu130 "vllm==0.21.0" \
  "vllm-cosmos3 @ git+https://github.com/NVIDIA/cosmos-framework.git#subdirectory=packages/vllm-cosmos3"

# CUDA 12.x driver:
# uv pip install --torch-backend=cu128 "vllm==0.19.1" \
#   "vllm-cosmos3 @ git+https://github.com/NVIDIA/cosmos-framework.git#subdirectory=packages/vllm-cosmos3"
```

The Reasoner notebook also installs `transformers-cosmos3` from the same framework checkout when building a local venv beside `packages/cosmos3`.

</Step>
<Step title="Optional DeepGEMM workaround">

If the build reports DeepGEMM unavailable:

```bash
export VLLM_USE_DEEP_GEMM=0
```

</Step>
</Steps>

<Tip>
When launching `.venv/bin/vllm` without activating the venv, keep `.venv/bin` on `PATH` so FlashInfer’s JIT build can find `ninja` in the venv.
</Tip>

</Tab>
<Tab title="vLLM-Omni (venv, partial)">

Until upstream PRs merge all modalities, the **Docker image** is the supported full-modality build. For text-to-image, text-to-video, and image-to-video from the PR branch:

```bash
uv venv --python 3.13 --seed --managed-python
source .venv/bin/activate
uv pip install --torch-backend=cu130 \
  "vllm-omni @ git+https://github.com/vllm-project/vllm-omni.git@refs/pull/3454/head"
# CUDA 12.x: use --torch-backend=cu128 instead
```

Then run `vllm serve` directly without the Docker wrapper.

</Tab>
</Tabs>

### Headless graphics libraries

On minimal servers, imports may fail with `libxcb.so.1: cannot open shared object file`:

```bash
apt-get install -y libxcb1 libgl1 libglib2.0-0
```

## Docker setup (vLLM-Omni Generator)

<Steps>
<Step title="Pull image">

```bash
docker pull vllm/vllm-omni:cosmos3
```

</Step>
<Step title="Run Cosmos3-Nano server">

Mount Hugging Face cache and any host directory with local media or action files:

```bash
docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -v "$(pwd):/workspace" \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-omni:cosmos3 \
  vllm serve nvidia/Cosmos3-Nano \
  --omni \
  --model-class-name Cosmos3OmniDiffusersPipeline \
  --allowed-local-media-path / \
  --port 8000 \
  --init-timeout 1800
```

Success signal: log line **`Application startup complete.`**

</Step>
<Step title="Run Cosmos3-Super (optional)">

On four GPUs with layer offload:

```bash
docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -v "$(pwd):/workspace" \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-omni:cosmos3 \
  vllm serve nvidia/Cosmos3-Super \
  --omni \
  --model-class-name Cosmos3OmniDiffusersPipeline \
  --allowed-local-media-path / \
  --tensor-parallel-size 4 \
  --enable-layerwise-offload \
  --port 8000 \
  --init-timeout 1800
```

Parallelism degrees multiply: ensure GPU count ≥ `tensor_parallel_size × cfg_parallel_size × ulysses_degree`.

</Step>
</Steps>

Generator notebooks expect a running server and optional endpoint overrides:

```bash
export COSMOS3_VLLM_BASE_URL=http://localhost:8000
export COSMOS3_VLLM_NANO_BASE_URL=http://localhost:8000
export COSMOS3_VLLM_SUPER_BASE_URL=http://localhost:8000
```

## Environment verification

### PyTorch GPU probe (venv paths)

Run from an activated Diffusers venv, `packages/cosmos3/.venv`, or repo-root vLLM `.venv`:

```bash
.venv/bin/python - <<'PY'
import torch

print("torch:", torch.__version__)
print("torch cuda:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
print("device count:", torch.cuda.device_count())
if torch.cuda.is_available():
    print("device 0:", torch.cuda.get_device_name(0))
PY
```

<Check>
Expected: `cuda available: True`, non-zero `device count`, and a GPU name on device 0. `False` usually means a **cu130 wheel on a CUDA 12.x driver** — reinstall with `cu128` / `cu128-train` / `vllm==0.19.1`.
</Check>

Framework notebooks run the same probe with `CUDA_VISIBLE_DEVICES` respected inside `packages/cosmos3`.

### vLLM / vLLM-Omni server probe

```bash
curl http://localhost:8000/v1/models
```

A JSON model list confirms the OpenAI-compatible API is up.

### Driver vs wheel cross-check

```bash
nvidia-smi
python -c "import torch; print(torch.version.cuda)"
```

Major versions should match the chosen `cu130` or `cu128` install path.

## Common install variables

| Variable | Used by | Purpose |
| --- | --- | --- |
| `HF_TOKEN` | All backends | Hugging Face auth when not using `hf auth login` |
| `HF_HOME` | Framework, Diffusers notebooks | Checkpoint cache location |
| `HF_HUB_DISABLE_XET` | Diffusers notebook | Disables XET transfer (notebook default) |
| `COSMOS3_UV_GROUP` | Framework notebooks | `cu130-train` or `cu128-train` for `uv sync` |
| `COSMOS3_TORCH_BACKEND` | Diffusers notebook | `cu130` or `cu128` for `uv pip install` |
| `COSMOS3_REPO` | Framework / vLLM notebooks | Framework checkout path (default `packages/cosmos3`) |
| `GIT_LFS_SKIP_SMUDGE` | Framework `uv sync` | Skips LFS smudge for unused test artifacts |
| `VLLM_USE_DEEP_GEMM` | vLLM Reasoner | Set `0` when DeepGEMM is unavailable |
| `COSMOS3_VLLM_BASE_URL` | vLLM-Omni notebooks | Generator API base URL |

Local runtime outputs under `cookbooks/cosmos3/**/outputs/` and `packages/` are gitignored — safe to delete and regenerate.

## Related pages

<CardGroup>
<Card title="Cookbook environment setup" href="/cookbook-environment">
Per-backend install detail, Framework clone/sync, and notebook env vars shared across Reasoner and Generator cookbooks.
</Card>
<Card title="Quickstart" href="/quickstart">
First-run commands after install: Diffusers generation, vLLM-Omni curl, and Reasoner chat completion.
</Card>
<Card title="Choose an integration" href="/choose-integration">
Pick Diffusers, vLLM-Omni, vLLM, or Cosmos Framework by research vs production goal.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
CUDA/driver mismatches, NGC containers, `torch.cuda` false negatives, libxcb, uv version errors, and DeepGEMM workarounds.
</Card>
</CardGroup>