event

PhD Proposal by Unaiza Ahsan

Primary tabs

Ph.D. Thesis Proposal Announcement

Title: Leveraging Mid-Level Representations For Complex Activity Recognition

Unaiza Ahsan
Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology

Date: Friday, April 28, 2017
Time: 4:30 PM-6:30PM EDT (1:30 PM-3:30PM PDT)
Location: MiRC Rm 102A (Pettit Building)

Committee:
Dr. Irfan Essa (Advisor), School of Interactive Computing, Georgia Institute of Technology
Dr. James Hayes, School of Interactive Computing, Georgia Institute of Technology
Dr. Devi Parikh, School of Interactive Computing, Georgia Institute of Technology
Dr. Munmun De Choudhury, School of Interactive Computing, Georgia Institute of Technology
Dr. Chen Sun, Google

Abstract:
Dynamic scene understanding requires learning representations of the components of the scene including objects, environments, actions and events. This understanding leads to activity and event predictions. Automatic aggregation and recognition of activities from images and videos has applications in social media, journalism, automatic browsing and retrieval of multimedia content. One important way to achieve this is to design a mid-level or intermediate feature representation which is useful for cases where strong supervision may not be possible. In this thesis, we propose that such mid-level representations significantly contribute to higher level event and action understanding both in still images and videos.

First, we propose an automatic approach to aggregate social event images from different sources and users by leveraging an intermediate joint representation involving images and associated text. Recognizing the events in clusters of images requires us to go beyond recognizing specific objects or scenes hence we next propose an event concept-based intermediate representation which learns concepts via the Web and uses this representation to identify events even with a single labelled example. Next, to demonstrate the strength of the proposed approaches, we contribute two diverse social event datasets to the community. We then present a use case of event concepts as a mid level representation that generalizes to sentiment recognition in diverse social event images. Our proposed work involves training Generative Adversarial Networks (GANs) with video frames (which does not require labels) and then using the trained discriminator from GANs as an intermediate representation and finetuning it on a smaller labelled video activity dataset to recognize actions in videos. This unsupervised pre-training step avoids any manual feature engineering, video frame encoding or searching for the best video frame sampling technique. Our preliminary experiments result in an action recognition performance competitive to the state-of-the-art using only appearance information.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:04/24/2017
  • Modified By:Tatianna Richardson
  • Modified:04/24/2017

Categories

Keywords