# NVIDIA Cosmos 3 Documentation > Source-grounded reference for Cosmos 3 omnimodal world models: Reasoner and Generator runtime surfaces, Hugging Face checkpoints, integration paths (Diffusers, vLLM-Omni, vLLM, Cosmos Framework), runnable cookbooks, and OpenAI-compatible serving APIs for Physical AI developers. This is a Grok-Wiki source-grounded repository documentation set. Use the complete Markdown link when an agent needs the full repo context. ## Context Links - [Complete Markdown docs](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/llms-full.txt) - [Complete Markdown alias](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9.md) - [Human interactive docs](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9) - [GitHub repository](https://github.com/NVIDIA/cosmos) ## Repository - Repository: NVIDIA/cosmos - Generated: 2026-06-01T20:39:21.817Z - Updated: 2026-06-01T20:39:30.764Z - Runtime: Grok CLI - Format: Documentation - Pages: 25 ## Pages - [Overview](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/01-overview.md): Cosmos 3 omnimodal world model surfaces (Reasoner vs Generator), primary entry points, supported modalities, and the shortest path to a first generation or reasoning call. - [Installation](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/02-installation.md): Prerequisites (Linux, NVIDIA GPU, uv, Hugging Face auth), CUDA driver pairing (cu130/cu128), venv and Docker setup paths, and environment verification commands. - [Quickstart](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/03-quickstart.md): Minimal first-run commands for Generator (Diffusers text-to-video, vLLM-Omni curl) and Reasoner (vLLM serve + OpenAI chat completion), including HF login and expected success signals. - [Choose an integration](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/04-choose-an-integration.md): Decision matrix for Diffusers, vLLM-Omni, vLLM, Transformers (coming soon), and Cosmos Framework by goal: research, production inference, training, or evaluation. - [Reasoner and Generator](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/05-reasoner-and-generator.md): MoT architecture modes: autoregressive Reasoner (text/vision in, text out) vs diffusion Generator (multimodal in, vision/sound/action out), shared mRoPE, and when to use each surface. - [Model family](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/06-model-family.md): Checkpoint catalog (Nano 16B, Super 64B, Text2Image, Image2Video, Nano-Policy-DROID), Hugging Face IDs, capability focus, and size tradeoffs for serving. - [Input and output specifications](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/07-input-and-output-specifications.md): Supported input/output types and formats, resolution tiers (256p–720p), aspect ratios, frame rates/counts, vision conditioning frame counts, prompt length limits, and sound output specs. - [Action modality](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/08-action-modality.md): Action token semantics, embodiment dimensions (AV 9D, DROID 10D, UMI 10D, humanoid 29D), policy/inverse/forward dynamics modes, and domain_name conditioning for Generator action workflows. - [Cookbook environment setup](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/09-cookbook-environment-setup.md): Shared uv/Docker setup for all backends: HF auth, CUDA backend tags, Cosmos Framework clone/sync, Diffusers venv, vLLM + vllm-cosmos3 plugin, vLLM-Omni Docker image, and GPU verification probes. - [Run Generator with Diffusers](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/10-run-generator-with-diffusers.md): Install Cosmos3OmniPipeline dependencies, configure UniPC scheduler flow_shift, run text-to-image/video and image-to-video with structured JSON prompts, and export MP4 outputs. - [Run Generator with vLLM-Omni](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/11-run-generator-with-vllm-omni.md): Start vllm/vllm-omni:cosmos3 Docker server, tensor-parallel and CFG/Ulysses options for Super, POST vision/action endpoints, guardrails toggles, and deploy-config for server-wide guardrail disable. - [Run Generator with Cosmos Framework](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/12-run-generator-with-cosmos-framework.md): Clone cosmos-framework, uv sync cu130-train/cu128-train groups, torchrun cosmos_framework.scripts.inference with parallelism presets, checkpoint-path, and JSON input specs from cookbook assets. - [Run Generator action workflows](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/13-run-generator-action-workflows.md): Forward dynamics (image + action trajectory) and inverse dynamics (video + instruction) across Framework torchrun and vLLM-Omni multipart /v1/videos requests with domain_name and action_mode extra_params. - [Run Reasoner with vLLM](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/14-run-reasoner-with-vllm.md): Install vllm-cosmos3 plugin, serve Cosmos3ReasonerForConditionalGeneration with mm-encoder and media-io-kwargs, Qwen3-VL-compatible chat messages, and reasoning-format prompt suffix. - [Run Reasoner with Cosmos Framework](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/15-run-reasoner-with-cosmos-framework.md): Build reasoner JSON inputs (model_mode, vision_path, enable_sound), run cosmos_framework.scripts.inference with latency preset, and read reasoner_text.txt outputs; scale Nano to Super via torchrun. - [vLLM-Omni API reference](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/16-vllm-omni-api-reference.md): OpenAI-compatible endpoints (/v1/images/generations, /v1/videos, /v1/videos/sync), request fields (prompt, size, num_frames, guidance_scale, extra_params), action_mode values, and curl --form-string constraints. - [Diffusers pipeline reference](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/17-diffusers-pipeline-reference.md): Cosmos3OmniPipeline.from_pretrained modes (text-to-image, text-to-video, image-to-video, text-to-video-with-sound), key call arguments, export_to_video, and torch-backend install pairing. - [Reasoner vLLM configuration](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/18-reasoner-vllm-configuration.md): vllm serve flags: hf-overrides architectures, tensor-parallel-size, mm-encoder-tp-mode, async-scheduling, allowed-local-media-path, media-io-kwargs, VLLM_USE_DEEP_GEMM, and vLLM/cu130 version pairs. - [Sampling and prompt parameters](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/19-sampling-and-prompt-parameters.md): Generator prompt-upsampling defaults, Reasoner sampling tables (with/without reasoning), structured JSON prompt schema, Qwen3-VL message shape, and redacted_reasoning format instruction. - [Audiovisual cookbook recipes](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/20-audiovisual-cookbook-recipes.md): End-to-end notebooks for text-to-image, text-to-video, image-to-video with optional sound across Diffusers, Cosmos Framework, and vLLM-Omni; asset layout under assets/prompts and assets/images. - [Action cookbook recipes](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/21-action-cookbook-recipes.md): Forward-dynamics and inverse-dynamics notebooks for AV, DROID, and UMI with checked-in trajectories, LeRobot sample data, and Framework vs vLLM-Omni output directories. - [Reasoner cookbook recipes](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/22-reasoner-cookbook-recipes.md): Runnable workflows for captioning, temporal localization, embodied/common-sense reasoning, 2D grounding, describe-anything, action CoT, physical plausibility, and situation understanding with bundled media assets. - [Inference benchmarks](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/23-inference-benchmarks.md): Published latency tables for Cosmos3-Nano/Super Generator (PyTorch, vLLM-Omni, Diffusers by GPU/resolution/TP) and Reasoner vLLM serving metrics (TTFT, throughput at concurrency tiers). - [Troubleshooting](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/24-troubleshooting.md): CUDA/driver mismatches, NGC container selection, torch.cuda unavailable fixes, libxcb headless imports, uv version and --torch-backend errors, and VLLM_USE_DEEP_GEMM workaround. - [Ecosystem, license, and release](https://grok-wiki.com/public/docs/nvidia-cosmos-82de3e90abd9/pages/25-ecosystem-license-and-release.md): Related Cosmos projects (Framework, Curator, Evaluator), OpenMDW-1.1 license terms, known model limitations, release cadence pointers, and third-party dependency notices. ## Source Files - `.gitignore` - `cookbooks/cosmos3/cosmos3-model-architecture.png` - `cookbooks/cosmos3/generator/action/assets/actions/av_traj_forward.json` - `cookbooks/cosmos3/generator/action/assets/actions/av_traj_left.json` - `cookbooks/cosmos3/generator/action/assets/actions/umi.json` - `cookbooks/cosmos3/generator/action/assets/droid_lerobot_example/meta/info.json` - `cookbooks/cosmos3/generator/action/assets/images/av_0.jpg` - `cookbooks/cosmos3/generator/action/assets/videos/av_0.mp4` - `cookbooks/cosmos3/generator/action/README.md` - `cookbooks/cosmos3/generator/action/run_fd_with_cosmos_framework.ipynb` - `cookbooks/cosmos3/generator/action/run_fd_with_vllm.ipynb` - `cookbooks/cosmos3/generator/action/run_id_with_cosmos_framework.ipynb` - `cookbooks/cosmos3/generator/action/run_id_with_vllm.ipynb` - `cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/car_driving.jpg` - `cookbooks/cosmos3/generator/audiovisual/assets/images/image2video/humanoid_robot.jpg` - `cookbooks/cosmos3/generator/audiovisual/assets/negative_prompts/image2video/neg_prompt.json` - `cookbooks/cosmos3/generator/audiovisual/assets/negative_prompts/text2video/neg_prompt.json` - `cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/coastal_road_audio.json` - `cookbooks/cosmos3/generator/audiovisual/assets/prompts/image2video/humanoid_robot.json` - `cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2image/robot_draping.json` - `cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/car_colliding.json` - `cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/robot_kitchen.json` - `cookbooks/cosmos3/generator/audiovisual/assets/prompts/text2video/robot_pouring_water_audio.json` - `cookbooks/cosmos3/generator/audiovisual/README.md` - `cookbooks/cosmos3/generator/audiovisual/run_with_cosmos_framework.ipynb` - `cookbooks/cosmos3/generator/audiovisual/run_with_diffusers.ipynb` - `cookbooks/cosmos3/generator/audiovisual/run_with_vllm_omni.ipynb` - `cookbooks/cosmos3/README.md` - `cookbooks/cosmos3/reasoner/assets/action_cot_driving_scene.mp4` - `cookbooks/cosmos3/reasoner/assets/common_sense_reasoning.mp4` - `cookbooks/cosmos3/reasoner/assets/describe_anything.png` - `cookbooks/cosmos3/reasoner/assets/grounding_2d.png` - `cookbooks/cosmos3/reasoner/assets/physical_plausibility.mp4` - `cookbooks/cosmos3/reasoner/assets/robot_planning.png` - `cookbooks/cosmos3/reasoner/assets/robotics_next_action.mp4` - `cookbooks/cosmos3/reasoner/assets/temporal_localization_1.mp4` - `cookbooks/cosmos3/reasoner/assets/video_caption.mp4` - `cookbooks/cosmos3/reasoner/README.md` - `cookbooks/cosmos3/reasoner/run_with_cosmos_framework.ipynb` - `cookbooks/cosmos3/reasoner/run_with_vllm.ipynb` - `inference_benchmarks.md` - `LICENSE` - `README.md` - `RELEASE.md`