event

PhD Proposal by Steven Hickson

Primary tabs

Title: Encoding 3D Contextual Information For Dynamic Scene Understanding

 

Steven Hickson

Ph.D. Student in Computer Science

School of Interactive Computing

College of Computing

Georgia Institute of Technology

 

Date: Friday, December 13, 2019

Time: 2:00 - 3:30pm (EST)

Location: Coda C1108 Brookhaven

 

 

Committee:

------------

Dr. Irfan Essa (Advisor),  Senior Associate Dean, School of Interactive Computing, Georgia Institute of Technology

Dr. Frank Dellaert, School of Interactive Computing, Georgia Institute of Technology

Dr. Zsolt Kira, School of Interactive Computing , Georgia Institute of Technology

Dr. Judy Hoffman, School of Interactive Computing ,  Georgia Institute of Technology

Dr. Rahul Sukthankar, Principal Scientist/Director at Google AI Perception / Robotics Institute, Carnegie Mellon University

 

Abstract:

-----------

 

Humans have an inherent understanding of the shape of their environment and the objects contained in it. Given a description of a room, a person can understand a reasonable approximation of the space and the objects. However, our current methods lack this type of contextual understanding (i.e. a chair is shaped a particular way and indicates you can sit on it). This work is motivated by the idea that there is an inherent relationship between 3D information such as shape and scene understanding/object classification. Objects such as tables, chairs, and cups have a specific shape and our models should leverage and learn that information. Depth and surface normals have frequently been used as additional signals in semantic labeling work; however, there is still limited understanding on using and learning shape and labels jointly. Our work examines using 3D cues for unsupervised and supervised approaches for segmentation and semantic labeling. We show how to use 3D information for robust unsupervised segmentation, supervised semantic labeling using segmentation, and unsupervised object categorization. We explore this relationship further by showing how shape helps deep neural networks semantically label indoor environments. We explore how joint estimation of shape and labels improves both results when learned together and how they can both be done with little added model capacity.

 

This proposal aims to demonstrate how 3D cues may be used to improve semantic labeling and object classification. Specifically, we will consider depth, surface normals, object classification, and pixel-wise semantic labeling in this work. The works outlined aim to validate the following thesis statement:  Shape is used as an additional context that improves segmentation, unsupervised clustering, object classification and semantic labeling with little computational overhead.

 

The proposed work will show:

Combining shape and object labels improves classification with (1) requiring few extra parameters, (2) with surface normals being a closer shared-task to labeling than depth, and (3) combining shape with labels improves accuracy for each task. We describe various methods to combine shape and object classification and then discuss our extensions of the proposed work which focus on surface normal prediction and semantic labeling specifically.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:12/06/2019
  • Modified By:Tatianna Richardson
  • Modified:12/06/2019

Categories

Keywords