PhD Defense by Naoki H Yokoyama

Date: Friday, April 18th 2025

Time: 1:00 PM – 3:00 PM EST

Location: Zoom (https://gatech.zoom.us/j/5825218212?pwd=NnBMcmNDTlFoNVcxTC91dndacFRadz09)

Committee:

Dr. Sehoon Ha (Advisor) – School of Interactive Computing, Georgia Institute of Technology

Dr. Dhruv Batra (Advisor) – School of Interactive Computing, Georgia Institute of Technology

Dr. Jie Tan – School of Interactive Computing, Georgia Institute of Technology

Dr. Vladlen Koltun – Distinguished Scientist, Apple

Dr. Mrinal Kalakrishnan – Research Lead, Meta

Title:

From Web to World: Harnessing Foundation Models for Intelligent Robotic Assistants in Real-World Environments

Abstract:

In this dissertation, we explore how simulated embodied experience and spatial grounding can enhance foundation models for robotics, bridging the gap between abstract reasoning capabilities and physical robotic interaction. We present three key contributions:

(1) Adaptive Skill Coordination (ASC) and Language-guided Skill Coordination (LSC): These approaches address open-vocabulary long-horizon mobile manipulation tasks, demonstrating how simulators can develop fundamental sensorimotor skills. This creates a robust repertoire of capabilities that foundation models can employ for real-world interaction.

(2) Vision-Language Frontier Maps (VLFM): This approach combines pre-trained vision-language models with low-level navigation policies trained in simulation. By grounding these models with explicit spatial maps of the environment, VLFM enhances their ability to reason about and navigate in the real world.

(3) A novel method for fine-tuning multi-modal large language models using simulated data: This approach enables models to develop reasoning capabilities beyond semantic understanding for navigation tasks. By fine-tuning with diverse simulated scenarios, we demonstrate that models leverage knowledge from both their pre-training on web-scale data and navigation training to achieve superior navigation performance.

This research aims to create embodied AI systems that leverage the strengths of foundation models while effectively operating in real-world environments.

Media

No media selected

Summary

From Web to World: Harnessing Foundation Models for Intelligent Robotic Assistants in Real-World Environments

Details

Friday

Apr 18 2025

01:00pm - 03:00pm

Location: ZOOM

URL: https://gatech.zoom.us/j/5825218212?pwd=NnBMcmNDTlFoNVcxTC91dndacFRadz09

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:04/04/2025
Modified By:Tatianna Richardson
Modified:04/04/2025

Mercury (Hg)

PhD Defense by Naoki H Yokoyama

Log in

Georgia Institute of Technology

PhD Defense by Naoki H Yokoyama

Primary tabs

Log in

Georgia Institute of Technology