# NVIDIA Cosmos 3 Documentation

> Source-grounded reference for Cosmos 3 omnimodal world models: Reasoner and Generator runtime surfaces, Hugging Face checkpoints, integration paths (Diffusers, vLLM-Omni, vLLM, Cosmos Framework), runnable cookbooks, and OpenAI-compatible serving APIs for Physical AI developers.

This is a Grok-Wiki source-grounded repository documentation set. Use the complete Markdown link when an agent needs the full repo context.

## Context Links

- [Complete Markdown docs](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/llms-full.txt)
- [Complete Markdown alias](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9.md)
- [Human interactive docs](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9)
- [GitHub repository](https://github.com/NVIDIA/cosmos)

## Repository

- Repository: NVIDIA/cosmos

- Generated: 2026-06-01T20:39:21.817Z
- Updated: 2026-06-01T20:39:30.764Z
- Runtime: Grok CLI
- Format: Documentation
- Pages: 25

## Pages

- [Overview](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/01-overview.md): Cosmos 3 omnimodal world model surfaces (Reasoner vs Generator), primary entry points, supported modalities, and the shortest path to a first generation or reasoning call.
- [Installation](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/02-installation.md): Prerequisites (Linux, NVIDIA GPU, uv, Hugging Face auth), CUDA driver pairing (cu130/cu128), venv and Docker setup paths, and environment verification commands.
- [Quickstart](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/03-quickstart.md): Minimal first-run commands for Generator (Diffusers text-to-video, vLLM-Omni curl) and Reasoner (vLLM serve + OpenAI chat completion), including HF login and expected success signals.
- [Choose an integration](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/04-choose-an-integration.md): Decision matrix for Diffusers, vLLM-Omni, vLLM, Transformers (coming soon), and Cosmos Framework by goal: research, production inference, training, or evaluation.
- [Reasoner and Generator](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/05-reasoner-and-generator.md): MoT architecture modes: autoregressive Reasoner (text/vision in, text out) vs diffusion Generator (multimodal in, vision/sound/action out), shared mRoPE, and when to use each surface.
- [Model family](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/06-model-family.md): Checkpoint catalog (Nano 16B, Super 64B, Text2Image, Image2Video, Nano-Policy-DROID), Hugging Face IDs, capability focus, and size tradeoffs for serving.
- [Input and output specifications](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/07-input-and-output-specifications.md): Supported input/output types and formats, resolution tiers (256p–720p), aspect ratios, frame rates/counts, vision conditioning frame counts, prompt length limits, and sound output specs.
- [Action modality](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/08-action-modality.md): Action token semantics, embodiment dimensions (AV 9D, DROID 10D, UMI 10D, humanoid 29D), policy/inverse/forward dynamics modes, and domain_name conditioning for Generator action workflows.
- [Cookbook environment setup](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/09-cookbook-environment-setup.md): Shared uv/Docker setup for all backends: HF auth, CUDA backend tags, Cosmos Framework clone/sync, Diffusers venv, vLLM + vllm-cosmos3 plugin, vLLM-Omni Docker image, and GPU verification probes.
- [Run Generator with Diffusers](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/10-run-generator-with-diffusers.md): Install Cosmos3OmniPipeline dependencies, configure UniPC scheduler flow_shift, run text-to-image/video and image-to-video with structured JSON prompts, and export MP4 outputs.
- [Run Generator with vLLM-Omni](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/11-run-generator-with-vllm-omni.md): Start vllm/vllm-omni:cosmos3 Docker server, tensor-parallel and CFG/Ulysses options for Super, POST vision/action endpoints, guardrails toggles, and deploy-config for server-wide guardrail disable.
- [Run Generator with Cosmos Framework](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/12-run-generator-with-cosmos-framework.md): Clone cosmos-framework, uv sync cu130-train/cu128-train groups, torchrun cosmos_framework.scripts.inference with parallelism presets, checkpoint-path, and JSON input specs from cookbook assets.
- [Run Generator action workflows](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/13-run-generator-action-workflows.md): Forward dynamics (image + action trajectory) and inverse dynamics (video + instruction) across Framework torchrun and vLLM-Omni multipart /v1/videos requests with domain_name and action_mode extra_params.
- [Run Reasoner with vLLM](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/14-run-reasoner-with-vllm.md): Install vllm-cosmos3 plugin, serve Cosmos3ReasonerForConditionalGeneration with mm-encoder and media-io-kwargs, Qwen3-VL-compatible chat messages, and reasoning-format prompt suffix.
- [Run Reasoner with Cosmos Framework](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/15-run-reasoner-with-cosmos-framework.md): Build reasoner JSON inputs (model_mode, vision_path, enable_sound), run cosmos_framework.scripts.inference with latency preset, and read reasoner_text.txt outputs; scale Nano to Super via torchrun.
- [vLLM-Omni API reference](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/16-vllm-omni-api-reference.md): OpenAI-compatible endpoints (/v1/images/generations, /v1/videos, /v1/videos/sync), request fields (prompt, size, num_frames, guidance_scale, extra_params), action_mode values, and curl --form-string constraints.
- [Diffusers pipeline reference](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/17-diffusers-pipeline-reference.md): Cosmos3OmniPipeline.from_pretrained modes (text-to-image, text-to-video, image-to-video, text-to-video-with-sound), key call arguments, export_to_video, and torch-backend install pairing.
- [Reasoner vLLM configuration](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/18-reasoner-vllm-configuration.md): vllm serve flags: hf-overrides architectures, tensor-parallel-size, mm-encoder-tp-mode, async-scheduling, allowed-local-media-path, media-io-kwargs, VLLM_USE_DEEP_GEMM, and vLLM/cu130 version pairs.
- [Sampling and prompt parameters](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/19-sampling-and-prompt-parameters.md): Generator prompt-upsampling defaults, Reasoner sampling tables (with/without reasoning), structured JSON prompt schema, Qwen3-VL message shape, and redacted_reasoning format instruction.
- [Audiovisual cookbook recipes](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/20-audiovisual-cookbook-recipes.md): End-to-end notebooks for text-to-image, text-to-video, image-to-video with optional sound across Diffusers, Cosmos Framework, and vLLM-Omni; asset layout under assets/prompts and assets/images.
- [Action cookbook recipes](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/21-action-cookbook-recipes.md): Forward-dynamics and inverse-dynamics notebooks for AV, DROID, and UMI with checked-in trajectories, LeRobot sample data, and Framework vs vLLM-Omni output directories.
- [Reasoner cookbook recipes](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/22-reasoner-cookbook-recipes.md): Runnable workflows for captioning, temporal localization, embodied/common-sense reasoning, 2D grounding, describe-anything, action CoT, physical plausibility, and situation understanding with bundled media assets.
- [Inference benchmarks](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/23-inference-benchmarks.md): Published latency tables for Cosmos3-Nano/Super Generator (PyTorch, vLLM-Omni, Diffusers by GPU/resolution/TP) and Reasoner vLLM serving metrics (TTFT, throughput at concurrency tiers).
- [Troubleshooting](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/24-troubleshooting.md): CUDA/driver mismatches, NGC container selection, torch.cuda unavailable fixes, libxcb headless imports, uv version and --torch-backend errors, and VLLM_USE_DEEP_GEMM workaround.
- [Ecosystem, license, and release](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/25-ecosystem-license-and-release.md): Related Cosmos projects (Framework, Curator, Evaluator), OpenMDW-1.1 license terms, known model limitations, release cadence pointers, and third-party dependency notices.

## Source Files

- `.gitignore`
- `cookbooks/cosmos3/cosmos3-model-architecture.png`
- `cookbooks/cosmos3/generator/action/assets/actions/av_traj_forward.json`
- `cookbooks/cosmos3/generator/action/assets/actions/av_traj_left.json`
- `cookbooks/cosmos3/generator/action/assets/actions/umi.json`
- `cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/meta/info.json`
- `cookbooks/cosmos3/generator/action/assets/images/av_0.jpg`
- `cookbooks/cosmos3/generator/action/assets/videos/av_0.mp4`
- `cookbooks/cosmos3/generator/action/README.md`
- `cookbooks/cosmos3/generator/action/run_fd_with_cosmos_framework.ipynb`
- `cookbooks/cosmos3/generator/action/run_fd_with_vllm.ipynb`
- `cookbooks/cosmos3/generator/action/run_id_with_cosmos_framework.ipynb`
- `cookbooks/cosmos3/generator/action/run_id_with_vllm.ipynb`
- `cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/car_driving.jpg`
- `cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/humanoid_robot.jpg`
- `cookbooks/cosmos3/generator/audiovisual/assets/negative_prompts/image2video/neg_prompt.json`
- `cookbooks/cosmos3/generator/audiovisual/assets/negative_prompts/text2video/neg_prompt.json`
- `cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/coastal_road_audio.json`
- `cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/humanoid_robot.json`
- `cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2image/robot_draping.json`
- `cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/car_colliding.json`
- `cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/robot_kitchen.json`
- `cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/robot_pouring_water_audio.json`
- `cookbooks/cosmos3/generator/audiovisual/README.md`
- `cookbooks/cosmos3/generator/audiovisual/run_with_cosmos_framework.ipynb`
- `cookbooks/cosmos3/generator/audiovisual/run_with_diffusers.ipynb`
- `cookbooks/cosmos3/generator/audiovisual/run_with_vllm_omni.ipynb`
- `cookbooks/cosmos3/README.md`
- `cookbooks/cosmos3/reasoner/assets/action_cot_driving_scene.mp4`
- `cookbooks/cosmos3/reasoner/assets/common_sense_reasoning.mp4`
- `cookbooks/cosmos3/reasoner/assets/describe_anything.png`
- `cookbooks/cosmos3/reasoner/assets/grounding_2d.png`
- `cookbooks/cosmos3/reasoner/assets/physical_plausibility.mp4`
- `cookbooks/cosmos3/reasoner/assets/robot_planning.png`
- `cookbooks/cosmos3/reasoner/assets/robotics_next_action.mp4`
- `cookbooks/cosmos3/reasoner/assets/temporal_localization_1.mp4`
- `cookbooks/cosmos3/reasoner/assets/video_caption.mp4`
- `cookbooks/cosmos3/reasoner/README.md`
- `cookbooks/cosmos3/reasoner/run_with_cosmos_framework.ipynb`
- `cookbooks/cosmos3/reasoner/run_with_vllm.ipynb`
- `inference_benchmarks.md`
- `LICENSE`
- `README.md`
- `RELEASE.md`