Nvidia has unveiled its Cosmos platform, a suite of world foundation models (WFMs) designed to accelerate the development of physical AI for robotics, autonomous vehicles, and industrial applications. Announced on August 11, 2025, at SIGGRAPH, Cosmos empowers robots with advanced simulation, reasoning, and data generation capabilities, bridging the gap between digital training and real-world deployment. This initiative addresses key challenges in robotics, such as data scarcity and the need for physically accurate training environments, enabling AI agents to perceive, reason, and interact intelligently with the physical world.
At the core of Cosmos are generative multimodal models that developers can use out-of-the-box or customize via post-training. Cosmos Predict generates up to 30 seconds of high-fidelity video from text, image, or video prompts, ideal for creating diverse training data for robots or self-driving cars. Cosmos Transfer scales simulations across environments, accelerating synthetic data generation from 3D scenes in tools like NVIDIA Isaac Sim or CARLA, with the upcoming Cosmos Transfer-2 simplifying prompts for photorealistic outputs. A distilled version further streamlines distillation to a single step, runnable on NVIDIA RTX PRO Servers.
The standout model, Cosmos Reason, is a 7-billion-parameter vision language model (VLM) tailored for physical AI. Trained via supervised fine-tuning and reinforcement learning, it processes videos and text inputs to enable human-like reasoning, incorporating memory, physics understanding, and common sense. Robots can use it for tasks like data curation, annotation, and decision-making—such as adapting vehicles to new cities or curating training datasets. Benchmarks show over 10% performance gains from fine-tuning on robotics-specific data, like the robovqa dataset, boosting visual question-answering accuracy. This model bridges multimodal perception with real-world actions, enhancing chain-of-thought reasoning without manual annotations.
These models power robot intelligence by enabling scalable synthetic data pipelines, physically accurate simulations, and advanced analytics. For instance, in robotics, Cosmos generates controllable data to train perception and policy models, while in autonomous vehicles, it amplifies datasets with varied weather or sensor views. Video analytics AI agents leverage Cosmos Reason for real-time insights, improving safety and efficiency in industrial settings.
Supporting infrastructure includes NVIDIA Omniverse libraries like NuRec for 3D world reconstruction and NVIDIA DGX Cloud for high-performance computing. Cosmos Curator aids in data filtering and annotation. Available on Hugging Face and GitHub under the NVIDIA Open Model License, the platform has seen over 2 million downloads and adoption by companies like Boston Dynamics, Uber, and Magna. By integrating with NVIDIA’s ecosystem, Cosmos reduces development time and costs, paving the way for intelligent, adaptable robots in real-world scenarios