<![CDATA[PhD Proposal by Yin Li]]>

545371 event 1466081841 1475893087 <![CDATA[PhD Proposal by Yin Li]]> Title: Learning Embodied Models of Actions from First Person Video

Yin Li
Computer Science Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology

Date: Monday, June 20th, 2016
Time: 1:00pm to 3:00pm (EST)
LocationTSRB GVU Cafe

Committee:
---------------
Dr. James M. Rehg (Advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Irfan Essa, School of Interactive Computing, Georgia Institute of Technology

Dr. James Hays, School of Interactive Computing, Georgia Institute of Technology

Dr. Kristen Grauman, Department of Computer Science, University of Texas at Austin

Abstract:
-----------

The development of wearable cameras and the advancement of computer vision make it possible for the first time in history to collect and analyze a large scale record of our daily visual experiences, in the form of first person videos. My thesis work focuses on the automatic analysis of these first person videos, known as First Person Vision (FPV). My goal is to develop novel embodied representations for understanding the camera wearer's actions, by leveraging first person visual cues derived from first person videos, including body motion, hand locations and gaze. This ``embodied'' representation is different from traditional visual representations, as it derives from the purposive body movements of the first person and captures the concept of objects within the context of actions.

By considering actions as intentional body movements, I propose to investigate three important parts of first person actions. First, I present a method to estimate egocentric gaze that reveal the visual trajectory of an action. Our work demonstrates for the first time that egocentric gaze can be reliably estimated using only head motion and hand locations derived from first person video, and without the need of object or action information. Second, I develop a method for first person action recognition. Our work demonstrates that an embodied representation that combines egocentric cues and visual cues can inform the location of actions and significantly improve the accuracy of recognition. Finally, I propose a novel task of object interaction prediction to uncover the plan of a future object manipulation and thus explain the purposive motions. I will develop novel learning schemes for the task and learn a embodied object representation from the task.

]]> <![CDATA[]]> 221981 1788 102851