PhD Defense by Anh Thai

Title: Mutual Exclusivity Bias & Spatial Reasoning In Vision-Language Models

Date: Friday, March 14, 2025

Time: 12:30PM-2:30PM ET

In-person Location: CODA conference room 234

Zoom link: https://gatech.zoom.us/j/93917488333

Anh Thai

PhD Student in Computer Science

School of Interactive Computing

College of Computing

Georgia Institute of Technology

Committee

Dr. James M. Rehg (advisor), College of Computing, Georgia Institute of Technology,

Department of Computer Science and Industrial and Enterprise Systems Engineering, University of Illinois Urbana-Champaign

Dr. Judy Hoffman (co-advisor), College of Computing, Georgia Institute of Technology

Dr. James Hays, College of Computing, Georgia Institute of Technology

Dr. Michael C. Frank, Department of Psychology, Stanford University

Dr. Jia-Bin Huang, Department of Computer Science, University of Maryland, College Park

Summary

Despite the rapid advancements in machine learning, enabling models to generalize beyond their training data, they still fall far behind the learning pace of young children. In this dissertation, we draw inspiration from developmental psychology, specifically children's learning environments and strategies, to inform machine learning algorithms. To achieve this, we focus on two key aspects of children’s word and object learning: (1) Spatial preposition comprehension through 3D information, and (2) Mutual exclusivity bias, which aids in object-word association. We begin by investigating the generalization ability of 3D reconstruction models, identifying the key factors that influence this capability. Extending this exploration, we demonstrate that 2D feature representations with strong semantic correspondence matching can be effectively utilized for 3D object part segmentation. With the rapid progress in large vision-language models (VLMs), we introduce a novel method that leverages multi-view RGB images to tackle the 3D Visual Question Answering (3D VQA) task, where 3D spatial understanding is essential for achieving high performance. To further examine the capabilities of VLMs and assess whether they exhibit human-like learning biases, particularly those observed in young children, we introduce MEBench, a benchmark for object recognition. This benchmark challenges computational models to leverage mutual exclusivity bias to rapidly associate new semantic concepts with novel objects. Beyond traditional mutual exclusivity bias evaluation, we explore whether VLMs can effectively use spatial information to reason about scenes and resolve ambiguities in uncertain learning environments.

----------------------------------------------------------------------------------------------

Anh Thai (Ngoc Anh Thai)

Georgia Institute of Technology

Media

No media selected

Summary

Mutual Exclusivity Bias & Spatial Reasoning In Vision-Language Models

Details

Friday

Mar 14 2025

12:30pm - 02:30pm

Location: CODA conference room 234

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:03/10/2025
Modified By:Tatianna Richardson
Modified:03/10/2025

Mercury (Hg)