Ph.D. Thesis Proposal by Vinay Bettadapura

Event Details
  • Date/Time:
    • Monday April 6, 2015 - Tuesday April 7, 2015
      10:00 am - 11:59 am
  • Location: CCB 153
  • Phone:
  • URL:
  • Email:
  • Fee(s):
  • Extras:
No contact information submitted.

Summary Sentence: Leveraging Contextual Cues for Dynamic Scene Understanding

Full Summary: No summary paragraph submitted.

Ph.D. Thesis Proposal Announcement

 Title: Leveraging Contextual Cues for Dynamic Scene Understanding Vinay BettadapuraPh.D. StudentSchool of Interactive ComputingCollege of ComputingGeorgia Institute of Technology Date: Monday, April 6th, 2015Time: 10 AM to 12 Noon ESTLocation: CCB 153 Committee:Dr. Irfan Essa, School of Interactive Computing, Georgia TechDr. Gregory Abowd, School of Interactive Computing, Georgia TechDr. Thad Starner, School of Interactive Computing, Georgia TechDr. Rahul Sukthankar, GoogleDr. Caroline Pantofaru, Google Abstract: 

Environments with people are complex, with many activities and events that need to be represented and explained. The goal of scene understanding is to either determine what objects and people are doing in such complex and dynamic environments, or to know the overall happenings, such as the summary of the scene. The context within which the activities and events unfold provides key insights that cannot be derived by studying the activities and events alone. In this thesis, we propose that this rich contextual information can be successfully leveraged, along with the video data, to support dynamic scene understanding.

We explore four different types of contextual cues: (1) spatio-temporal context, (2) egocentric context, (3) geographic context, and (4) environmental context, and show that they improve dynamic scene understanding tasks across several different application domains. First, we present data-driven techniques to enrich spatio-temporal context by augmenting Bag-of-Words models with temporal, local and global causality information and show that this improves activity recognition, anomaly detection and scene assessment from videos. Next, we leverage the egocentric context derived from sensor data captured from first-person point-of-view devices to perform field-of-view localization in order to understand the user's focus of attention. We demonstrate single and multi-user field-of-view localization in both indoor and outdoor environments with applications in augmented reality, event understanding and studying social interactions. Next, we look at how geographic context can be leveraged to make challenging "in-the-wild" object recognition tasks more tractable using the problem of food recognition in restaurants as a case-study. Supported by our previous results on exploring the first three types of contextual cues, we propose to investigate the fourth type of context, i.e. environmental context. Dynamic scenes such as sporting events, which take place in responsive environments such as stadiums and gymnasiums, are ideal for the proposed work. Specifically, our proposed research will focus on analyzing sports such as basketball, with the goal of generating game highlights and summaries, as an effective demonstration of dynamic scene understanding using environmental context. Video data along with the contextual information will be collected during basketball games in college gymnasiums and a prototype system that will analyze the games and generate the summaries will be built and demonstrated.

Additional Information

In Campus Calendar

Graduate Studies

Invited Audience
No keywords were submitted.
  • Created By: Danielle Ramirez
  • Workflow Status: Published
  • Created On: Mar 30, 2015 - 5:40am
  • Last Updated: Oct 7, 2016 - 9:49pm