PhD Defense by Miao Liu
Title: Egocentric Action Understanding by Learning Embodied Attention
Date: Thursday, June 30, 2022
Time: 12:00 pm to 1:30 pm (EST)
Robotics Ph.D. Candidate
School of Electrical and Computer Engineering
Georgia Institute of Technology
Dr. James M. Rehg (Advisor, School of Interactive Computing, Georgia Institute of Technology)
Dr. Diyi Yang (School of Interactive Computing, Georgia Institute of Technology)
Dr. Zsolt Kira (School of Interactive Computing, Georgia Institute of Technology)
Dr. James Hays (School of Interactive Computing, Georgia Institute of Technology)
Dr. Jitendra Malik (Department of Electrical Engineering and Computer Science, University of California at Berkeley)
Videos captured from wearable cameras, known as egocentric videos, create a continuous record of human daily visual experience, and thereby offer a new perspective for human activity understanding. Importantly, egocentric video aligns gaze, embodied movement, and action in the same “first-person” coordinate system. The rich egocentric cues reflect the attended scene context of an action, and thereby provide novel means for reasoning human daily routines.
In my thesis work, I describe my efforts on developing novel computational models that learn the embodied egocentric attention for the automatic analysis of egocentric actions. First, I introduce a probabilistic model for learning gaze and actions in egocentric video and further demonstrate that attention can serve as a robust tool for learning motion-aware video representation. Second, I develop a novel deep model to address the challenging problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos. Third, I present a novel deep latent variable model that makes use of human intentional body movement (motor attention) as a key representation for forecasting human-object interaction in egocentric video. Finally, I propose a novel task of future hand segmentation from egocentric videos, and show how explicitly modeling the future head motion can facilitate future hand movement forecasting.