event
PhD Defense by Naoki H Yokoyama
Primary tabs
Date: Friday, April 18th 2025
Time: 1:00 PM – 3:00 PM EST
Location: Zoom (https://gatech.zoom.us/j/5825218212?pwd=NnBMcmNDTlFoNVcxTC91dndacFRadz09)
Committee:
Dr. Sehoon Ha (Advisor) – School of Interactive Computing, Georgia Institute of Technology
Dr. Dhruv Batra (Advisor) – School of Interactive Computing, Georgia Institute of Technology
Dr. Jie Tan – School of Interactive Computing, Georgia Institute of Technology
Dr. Vladlen Koltun – Distinguished Scientist, Apple
Dr. Mrinal Kalakrishnan – Research Lead, Meta
Title:
From Web to World: Harnessing Foundation Models for Intelligent Robotic Assistants in Real-World Environments
Abstract:
In this dissertation, we explore how simulated embodied experience and spatial grounding can enhance foundation models for robotics, bridging the gap between abstract reasoning capabilities and physical robotic interaction. We present three key contributions:
(1) Adaptive Skill Coordination (ASC) and Language-guided Skill Coordination (LSC): These approaches address open-vocabulary long-horizon mobile manipulation tasks, demonstrating how simulators can develop fundamental sensorimotor skills. This creates a robust repertoire of capabilities that foundation models can employ for real-world interaction.
(2) Vision-Language Frontier Maps (VLFM): This approach combines pre-trained vision-language models with low-level navigation policies trained in simulation. By grounding these models with explicit spatial maps of the environment, VLFM enhances their ability to reason about and navigate in the real world.
(3) A novel method for fine-tuning multi-modal large language models using simulated data: This approach enables models to develop reasoning capabilities beyond semantic understanding for navigation tasks. By fine-tuning with diverse simulated scenarios, we demonstrate that models leverage knowledge from both their pre-training on web-scale data and navigation training to achieve superior navigation performance.
This research aims to create embodied AI systems that leverage the strengths of foundation models while effectively operating in real-world environments.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:04/04/2025
- Modified By:Tatianna Richardson
- Modified:04/04/2025
Categories
Keywords
Target Audience