PhD Proposal by Yin Li

Title: Learning Embodied Models of Actions from First Person Video

Yin Li
Computer Science Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology

Date: Monday, June 20th, 2016
Time: 1:00pm to 3:00pm (EST)
LocationTSRB GVU Cafe

Committee:
---------------
Dr. James M. Rehg (Advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Irfan Essa, School of Interactive Computing, Georgia Institute of Technology

Dr. James Hays, School of Interactive Computing, Georgia Institute of Technology

Dr. Kristen Grauman, Department of Computer Science, University of Texas at Austin

Abstract:
-----------

The development of wearable cameras and the advancement of computer vision make it possible for the first time in history to collect and analyze a large scale record of our daily visual experiences, in the form of first person videos. My thesis work focuses on the automatic analysis of these first person videos, known as First Person Vision (FPV). My goal is to develop novel embodied representations for understanding the camera wearer's actions, by leveraging first person visual cues derived from first person videos, including body motion, hand locations and gaze. This ``embodied'' representation is different from traditional visual representations, as it derives from the purposive body movements of the first person and captures the concept of objects within the context of actions.

By considering actions as intentional body movements, I propose to investigate three important parts of first person actions. First, I present a method to estimate egocentric gaze that reveal the visual trajectory of an action. Our work demonstrates for the first time that egocentric gaze can be reliably estimated using only head motion and hand locations derived from first person video, and without the need of object or action information. Second, I develop a method for first person action recognition. Our work demonstrates that an embodied representation that combines egocentric cues and visual cues can inform the location of actions and significantly improve the accuracy of recognition. Finally, I propose a novel task of object interaction prediction to uncover the plan of a future object manipulation and thus explain the purposive motions. I will develop novel learning schemes for the task and learn a embodied object representation from the task.

Media

No media selected

Summary

Details

Monday

Jun 20 2016

02:00pm - 04:00pm

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:06/16/2016
Modified By:Fletcher Moore
Modified:10/07/2016

Mercury (Hg)

PhD Proposal by Yin Li

Log in

Georgia Institute of Technology

PhD Proposal by Yin Li

Primary tabs

Log in

Georgia Institute of Technology