event
PhD Proposal by Fiona Ryan
Primary tabs
Title: Contextual Behavior Modeling with Computer Vision
Fiona Ryan
Ph.D. Student in Computer Science
School of Interactive Computing
Georgia Institute of Technology
Date: Wednesday, June 25th, 2025
Time: 3-5pm EDT
Location: (Virtual) https://gatech.zoom.us/j/95216024861
Committee
Dr. Judy Hoffman (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. James Rehg (Advisor) - Department of Computer Science, University of Illinois Urbana-Champaign
Dr. Zsolt Kira - School of Interactive Computing, Georgia Institute of Technology
Dr. James Hays - School of Interactive Computing, Georgia Institute of Technology
Dr. Josef Sivic - Czech Institute of Informatics, Robotics, and Cybernetics, Czech Technical University in Prague
Abstract
Understanding human behavior with computer vision is a core challenge for developing AI systems that can effectively interact with and assist people in everyday life. Modeling human behavior is challenging because it requires not only visual recognizing behaviors like gaze, gesture, and movement, but also interpreting them in context. Human behavior is shaped by intent and higher-level goals, the surrounding physical environment, interactions with other people, and additional modalities such as speech, making it inherently multimodal and situated.
This thesis proposal explores how to model human behavior in context by addressing two core needs: (1) multimodal datasets that capture naturalistic human interactions in everyday environments, enabling new behavior modeling tasks and (2) methods that leverage foundation models that encode general-purpose world knowledge — including visual semantics, physical structure, and commonsense understanding — to contextualize human behavior in relation to its environment. I will present contributions to large-scale multimodal egocentric datasets that capture social interactions and object interactions during activities, a modeling approach and dataset for identifying targets of selective auditory attention during social conversations in noisy environments, a framework for estimating gaze targets in scenes from general visual representations, and a method for efficiently adapting vision-language retrieval models to represent new concepts and recognize them in different contexts. Finally, I will propose new work on integrating visual behavioral cues into understanding conversation transcripts with large language models.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:06/23/2025
- Modified By:Tatianna Richardson
- Modified:06/23/2025
Categories
Keywords
Target Audience