PhD Defense by Daniel A. Castro Chin

Primary tabs

Ph.D. Thesis Defense Announcement


Title: Understanding the Motion of a Human Pose in Video Classification

Daniel A. Castro Chin

Ph.D. Student
Computer Vision
School of Interactive Computing, College of Computing
Georgia Institute of Technology

Date: Monday, March 25th, 2019
Time: 2:00PM EST
Location: College of Computing Room 312A



Dr. Irfan Essa (Advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. James Hays, School of Interactive Computing, Georgia Institute of Technology

Dr. Devi Parikh, School of Interactive Computing, Georgia Institute of Technology

Dr. Dhruv Batra, School of Interactive Computing, Georgia Institute of Technology
Dr. Rahul Sukthankar, Google



For the last 50 years we have studied the correspondence between human motion and the action or goal they are attempting to accomplish. Humans themselves subconsciously learn subtle cues about other individuals that gives them insight into their motivation and overall sincerity. In contrast, computers require significant guidance in order to correctly determine deceivingly basic activities. With the recent advent of widespread video recording and the sheer amount of video data being stored, the ability to study human motion has never been more essential. In this thesis, we propose explicit representations of the human gait can be used in order to provide intuitive improvements in high-level human action recognition.


We explore three existing motion representations that attempt to integrate motion parameters into video categorization: (1): regular video frames (2): optical flow and (3): skeletal joint representation. Regular video frames are most commonly used in video analysis on a per-frame basis due to the nature of most video categories. First, we introduce a technique which enables us to combine contextual features with a traditional neural network to improve the classification of human actions in egocentric video. Then, we introduce a dataset focused on humans performing various dances, an activity which inherently requires its motion to be identified. We discuss the value and relevance of this dataset along the most commonly used video datasets and among a handful of recently released datasets which are relevant to human motion. Next, we analyze the performance of existing algorithms with each of the motion parameterizations mentioned above. This assists us in understanding the intrinsic value of each representation and a better understanding of each algorithm. Following this, we introduce an approach that utilizes each of the motion parameterizations concurrently, in order to have a better understanding of the video. From here, we propose a method to represent human skeletons over time to improve human video categorization. Specifically, we look at specific joint distances over time to generate features that represents the distribution of specific human poses over time. Performance of each individual metric will be computed and analyzed in order to assess their intrinsic value. The main objective and contribution of our work is to introduce a parameterization of human poses which improve action recognition in video.


  • Workflow Status:
  • Created By:
    Tatianna Richardson
  • Created:
  • Modified By:
    Tatianna Richardson
  • Modified:


Target Audience

    No target audience selected.