PhD Proposal by Fiona Ryan

Title: Contextual Behavior Modeling with Computer Vision

Fiona Ryan
Ph.D. Student in Computer Science
School of Interactive Computing
Georgia Institute of Technology

Date: Wednesday, June 25th, 2025
Time: 3-5pm EDT
Location: (Virtual) https://gatech.zoom.us/j/95216024861

Committee
Dr. Judy Hoffman (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. James Rehg (Advisor) - Department of Computer Science, University of Illinois Urbana-Champaign
Dr. Zsolt Kira - School of Interactive Computing, Georgia Institute of Technology
Dr. James Hays - School of Interactive Computing, Georgia Institute of Technology
Dr. Josef Sivic - Czech Institute of Informatics, Robotics, and Cybernetics, Czech Technical University in Prague

Abstract
Understanding human behavior with computer vision is a core challenge for developing AI systems that can effectively interact with and assist people in everyday life. Modeling human behavior is challenging because it requires not only visual recognizing behaviors like gaze, gesture, and movement, but also interpreting them in context. Human behavior is shaped by intent and higher-level goals, the surrounding physical environment, interactions with other people, and additional modalities such as speech, making it inherently multimodal and situated.

This thesis proposal explores how to model human behavior in context by addressing two core needs: (1) multimodal datasets that capture naturalistic human interactions in everyday environments, enabling new behavior modeling tasks and (2) methods that leverage foundation models that encode general-purpose world knowledge — including visual semantics, physical structure, and commonsense understanding — to contextualize human behavior in relation to its environment. I will present contributions to large-scale multimodal egocentric datasets that capture social interactions and object interactions during activities, a modeling approach and dataset for identifying targets of selective auditory attention during social conversations in noisy environments, a framework for estimating gaze targets in scenes from general visual representations, and a method for efficiently adapting vision-language retrieval models to represent new concepts and recognize them in different contexts. Finally, I will propose new work on integrating visual behavioral cues into understanding conversation transcripts with large language models.

Media

No media selected

Summary

Contextual Behavior Modeling with Computer Vision

Details

Wednesday

Jun 25 2025

03:00pm - 05:00pm

Location: (Virtual)

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow status: Published
Created by: Tatianna Richardson
Created: 06/23/2025
Modified By: Tatianna Richardson
Modified: 06/23/2025

Mercury (Hg)

PhD Proposal by Fiona Ryan

Log in

Georgia Institute of Technology

PhD Proposal by Fiona Ryan

Primary tabs

Log in

Georgia Institute of Technology