event
PhD Defense by Anh Thai
Primary tabs
Title: Mutual Exclusivity Bias & Spatial Reasoning In Vision-Language Models
Date: Friday, March 14, 2025
Time: 12:30PM-2:30PM ET
In-person Location: CODA conference room 234
Zoom link: https://gatech.zoom.us/j/93917488333
Anh Thai
PhD Student in Computer Science
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Committee
Dr. James M. Rehg (advisor), College of Computing, Georgia Institute of Technology,
Department of Computer Science and Industrial and Enterprise Systems Engineering, University of Illinois Urbana-Champaign
Dr. Judy Hoffman (co-advisor), College of Computing, Georgia Institute of Technology
Dr. James Hays, College of Computing, Georgia Institute of Technology
Dr. Michael C. Frank, Department of Psychology, Stanford University
Dr. Jia-Bin Huang, Department of Computer Science, University of Maryland, College Park
Summary
Despite the rapid advancements in machine learning, enabling models to generalize beyond their training data, they still fall far behind the learning pace of young children. In this dissertation, we draw inspiration from developmental psychology, specifically children's learning environments and strategies, to inform machine learning algorithms. To achieve this, we focus on two key aspects of children’s word and object learning: (1) Spatial preposition comprehension through 3D information, and (2) Mutual exclusivity bias, which aids in object-word association. We begin by investigating the generalization ability of 3D reconstruction models, identifying the key factors that influence this capability. Extending this exploration, we demonstrate that 2D feature representations with strong semantic correspondence matching can be effectively utilized for 3D object part segmentation. With the rapid progress in large vision-language models (VLMs), we introduce a novel method that leverages multi-view RGB images to tackle the 3D Visual Question Answering (3D VQA) task, where 3D spatial understanding is essential for achieving high performance. To further examine the capabilities of VLMs and assess whether they exhibit human-like learning biases, particularly those observed in young children, we introduce MEBench, a benchmark for object recognition. This benchmark challenges computational models to leverage mutual exclusivity bias to rapidly associate new semantic concepts with novel objects. Beyond traditional mutual exclusivity bias evaluation, we explore whether VLMs can effectively use spatial information to reason about scenes and resolve ambiguities in uncertain learning environments.
----------------------------------------------------------------------------------------------
Anh Thai (Ngoc Anh Thai)
Georgia Institute of Technology
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:03/10/2025
- Modified By:Tatianna Richardson
- Modified:03/10/2025
Categories
Keywords
Target Audience