event

PhD Defense by Anh Thai

Primary tabs

Title: Mutual Exclusivity Bias & Spatial Reasoning In Vision-Language Models

Date: Friday, March 14, 2025

Time: 12:30PM-2:30PM ET

In-person Location: CODA conference room 234

Zoom link: https://gatech.zoom.us/j/93917488333

 

Anh Thai

PhD Student in Computer Science

School of Interactive Computing

College of Computing

Georgia Institute of Technology

 

Committee

Dr. James M. Rehg (advisor), College of Computing, Georgia Institute of Technology, 

      Department of Computer Science and Industrial and Enterprise Systems Engineering, University of Illinois Urbana-Champaign

Dr. Judy Hoffman (co-advisor), College of Computing, Georgia Institute of Technology

Dr. James Hays, College of Computing, Georgia Institute of Technology

Dr. Michael C. Frank, Department of Psychology, Stanford University

Dr. Jia-Bin Huang, Department of Computer Science, University of Maryland, College Park

 

Summary

 

Despite the rapid advancements in machine learning, enabling models to generalize beyond their training data, they still fall far behind the learning pace of young children. In this dissertation, we draw inspiration from developmental psychology, specifically children's learning environments and strategies, to inform machine learning algorithms. To achieve this, we focus on two key aspects of children’s word and object learning: (1) Spatial preposition comprehension through 3D information, and (2) Mutual exclusivity bias, which aids in object-word association. We begin by investigating the generalization ability of 3D reconstruction models, identifying the key factors that influence this capability. Extending this exploration, we demonstrate that 2D feature representations with strong semantic correspondence matching can be effectively utilized for 3D object part segmentation. With the rapid progress in large vision-language models (VLMs), we introduce a novel method that leverages multi-view RGB images to tackle the 3D Visual Question Answering (3D VQA) task, where 3D spatial understanding is essential for achieving high performance. To further examine the capabilities of VLMs and assess whether they exhibit human-like learning biases, particularly those observed in young children, we introduce MEBench, a benchmark for object recognition. This benchmark challenges computational models to leverage mutual exclusivity bias to rapidly associate new semantic concepts with novel objects. Beyond traditional mutual exclusivity bias evaluation, we explore whether VLMs can effectively use spatial information to reason about scenes and resolve ambiguities in uncertain learning environments.

 

----------------------------------------------------------------------------------------------

Anh Thai (Ngoc Anh Thai)

Georgia Institute of Technology

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:03/10/2025
  • Modified By:Tatianna Richardson
  • Modified:03/10/2025

Categories

Keywords

Target Audience