PhD Defense by Fiona Ryan

Title: Towards Human-Centric Perception: Grounding Human Behavior in Multimodal Context

Date: Tuesday, April 7th 2026
Time: 3:00-5:00 PM ET
Location: Coda 0915 & Zoom (https://gatech.zoom.us/j/95248425147)

Fiona Ryan
Ph.D. Student
School of Interactive Computing
Georgia Institute of Technology

Committee
Dr. Judy Hoffman (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. James Rehg (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. James Hays - School of Interactive Computing, Georgia Institute of Technology
Dr. Zsolt Kira - School of Interactive Computing, Georgia Institute of Technology
Dr. Josef Sivic - Czech Institute of Informatics, Robotics, and Cybernetics, Czech Technical University in Prague

Abstract
Perceiving and understanding human behavior with computer vision is a core challenge for developing AI systems that can effectively interact with and assist people in everyday life. Modeling human behavior is challenging because it requires not only visually recognizing behaviors like gaze, gesture, and movement, but also grounding them in the context in which they occur. Human behavior is shaped by intent and higher-level goals, the surrounding physical environment, social interactions with other people, and additional modalities such as speech and language, making it inherently multimodal and situated.

This thesis explores how to model human behavior in context by addressing three core needs: (1) datasets that capture naturalistic human interactions in everyday environments, enabling new behavior modeling tasks, (2) multimodal methods that ground behavior by leveraging information across multiple modalities including vision, audio, and language, and (3) robust methods for recognizing behavioral cues that leverage advances in foundation models to encode context. First, I present contributions to large-scale multimodal egocentric datasets that capture social interactions and human object interactions during activities. Second, I present a modeling approach and dataset for the novel task of identifying targets of selective auditory attention during social conversations in noisy environments. Third, I present a method for efficiently adapting vision-language retrieval models to represent new concepts and recognize them in different contexts. Fourth, I propose a framework for estimating gaze targets in scenes using the representation from a visual foundation model. Finally, I extend this framework to forecasting gaze behavior in egocentric video.

Media

No media selected

Summary

Towards Human-Centric Perception: Grounding Human Behavior in Multimodal Context

Details

Tuesday

Apr 7 2026

03:00pm - 05:00pm

Location: Coda 0915 & Zoom

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow status: Published
Created by: Tatianna Richardson
Created: 03/19/2026
Modified By: Tatianna Richardson
Modified: 03/19/2026

Mercury (Hg)

PhD Defense by Fiona Ryan

Log in

Georgia Institute of Technology

PhD Defense by Fiona Ryan

Primary tabs

Log in

Georgia Institute of Technology