Ph.D. Student’s Framework Used to Bolster Nvidia’s Cosmos Predict-2 Model

A new deep learning architectural framework could boost the development and deployment efficiency of autonomous vehicles and humanoid robots. The framework will lower training costs and reduce the amount of real-world data needed for training.

World foundation models (WFMs) enable physical AI systems to learn and operate within synthetic worlds created by generative artificial intelligence (genAI). For example, these models use predictive capabilities to generate up to 30 seconds of video that accurately reflects the real world.

The new framework, developed by a Georgia Tech researcher, enhances the processing speed of the neural networks that simulate these real-world environments from text, images, or video inputs.

The neural networks that make up the architectures of large language models like ChatGPT and visual models like Sora process contextual information using the “attention mechanism.”

Attention refers to a model’s ability to focus on the most relevant parts of input.

The Neighborhood Attention Extension (NATTEN) allows models that require GPUs or high-performance computing systems to process information and generate outputs more efficiently.

Processing speeds can increase by up to 2.6 times, said Ali Hassani, a Ph.D. student in the School of Interactive Computing and the creator of NATTEN. Hassani is advised by Associate Professor Humphrey Shi.

Hassani is also a research scientist at Nvidia, where he introduced NATTEN to Cosmos — a family of WFMs the company uses to train robots, autonomous vehicles, and other physical AI applications.

“You can map just about anything from a prompt or an image or any combination of frames from an existing video to predict future videos,” Hassani said. “Instead of generating words with an LLM, you’re generating a world.

“Unlike LLMs that generate a single token at a time, these models are compute-heavy. They generate many images — often hundreds of frames at a time — so the models put a lot of work on the GPU. NATTEN lets us decrease some of that work and proportionately accelerate the model.”

Media

2X6A3487.jpg

Summary

Georgia Tech Ph.D. student Ali Hassani developed the Neighborhood Attention Extension (NATTEN), a deep learning architectural framework that is being integrated into Nvidia's Cosmos Predict-2 world foundation model. NATTEN enhances the processing speed of neural networks that simulate real-world environments for physical AI systems, which are used to train autonomous vehicles and humanoid robots.

Details

Sidebar Content

No sidebar content

Groups

Status

Workflow status: Published
Created by: Nathan Deen
Created: 11/13/2025
Modified By: Nathan Deen
Modified: 11/13/2025

Mercury (Hg)

Ph.D. Student’s Framework Used to Bolster Nvidia’s Cosmos Predict-2 Model

Log in

Georgia Institute of Technology

Ph.D. Student’s Framework Used to Bolster Nvidia’s Cosmos Predict-2 Model

Primary tabs

Log in

Georgia Institute of Technology