event

PhD Defense by Naoki H Yokoyama

Primary tabs

Date: Friday, April 18th 2025

Time: 1:00 PM – 3:00 PM EST

Location: Zoom (https://gatech.zoom.us/j/5825218212?pwd=NnBMcmNDTlFoNVcxTC91dndacFRadz09)

 

Committee:

Dr. Sehoon Ha (Advisor) – School of Interactive Computing, Georgia Institute of Technology

Dr. Dhruv Batra (Advisor) – School of Interactive Computing, Georgia Institute of Technology

Dr. Jie Tan – School of Interactive Computing, Georgia Institute of Technology

Dr. Vladlen Koltun – Distinguished Scientist, Apple

Dr. Mrinal Kalakrishnan – Research Lead, Meta

 

Title:

From Web to World: Harnessing Foundation Models for Intelligent Robotic Assistants in Real-World Environments

 

Abstract:

In this dissertation, we explore how simulated embodied experience and spatial grounding can enhance foundation models for robotics, bridging the gap between abstract reasoning capabilities and physical robotic interaction. We present three key contributions:

(1) Adaptive Skill Coordination (ASC) and Language-guided Skill Coordination (LSC): These approaches address open-vocabulary long-horizon mobile manipulation tasks, demonstrating how simulators can develop fundamental sensorimotor skills. This creates a robust repertoire of capabilities that foundation models can employ for real-world interaction.

(2) Vision-Language Frontier Maps (VLFM): This approach combines pre-trained vision-language models with low-level navigation policies trained in simulation. By grounding these models with explicit spatial maps of the environment, VLFM enhances their ability to reason about and navigate in the real world.

(3) A novel method for fine-tuning multi-modal large language models using simulated data: This approach enables models to develop reasoning capabilities beyond semantic understanding for navigation tasks. By fine-tuning with diverse simulated scenarios, we demonstrate that models leverage knowledge from both their pre-training on web-scale data and navigation training to achieve superior navigation performance.

This research aims to create embodied AI systems that leverage the strengths of foundation models while effectively operating in real-world environments. 

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:04/04/2025
  • Modified By:Tatianna Richardson
  • Modified:04/04/2025

Categories

Keywords

Target Audience