event

PhD Defense by Steven Hickson

Primary tabs

Title: Encoding 3D Contextual Information For Dynamic Scene Understanding

 

Steven Hickson

Ph.D. Student in Computer Science

School of Interactive Computing

College of Computing

Georgia Institute of Technology

 

Date: Wednesday March 4th, 2020

Time: 10:00am - 12:00pm (EST)

Location: Coda C114 (1st floor room)

Bluejeans: https://bluejeans.com/775408514

Food will be provided.

 

 

Committee:

------------

Dr. Irfan Essa (Advisor),  Senior Associate Dean, School of Interactive Computing, Georgia Institute of Technology

Dr. Frank Dellaert, School of Interactive Computing, Georgia Institute of Technology

Dr. Zsolt Kira, School of Interactive Computing , Georgia Institute of Technology

Dr. Judy Hoffman, School of Interactive Computing ,  Georgia Institute of Technology

Dr. Rahul Sukthankar, Principal Scientist/Director at Google AI Perception / Robotics Institute, Carnegie Mellon University

 

Abstract:

-----------

 

Humans have an inherent understanding of the shape of their environment and the objects contained in it. Given a description of a room, a person can understand a reasonable approximation of the space and the objects. However, our current methods lack this type of contextual understanding (i.e. a chair is shaped a particular way and indicates you can sit on it). This work is motivated by the idea that there is an inherent relationship between 3D information such as shape and scene understanding/object classification. Objects such as tables, chairs, and cups have a specific shape and our models should leverage and learn that information. Depth and surface normals have frequently been used as additional signals in semantic labeling work; however, there is still limited understanding on using and learning shape and labels jointly. Our work examines using 3D cues for unsupervised and supervised approaches for segmentation and semantic labeling. We show how to use 3D information for robust unsupervised segmentation, supervised semantic labeling using segmentation, and unsupervised object categorization. We explore this relationship further by showing how shape helps deep neural networks semantically label indoor environments. We explore how joint estimation of shape and labels improves both results when learned together and how they can both be done with little added model capacity.

 

This defense aims to demonstrate how 3D cues may be used to improve semantic labeling and object classification. Specifically, we will consider depth, surface normals, object classification, and pixel-wise semantic labeling in this work. The works outlined aim to validate the following thesis statement:  Shape is used as an additional context that improves segmentation, unsupervised clustering, object classification, and semantic labeling with little computational overhead.

 

The presented work will show:

Combining shape and object labels improves results (1) requires few extra parameters to do so, (2) improves with surface normals more than with depth, and (3) improves accuracy for each task. We describe various methods to combine shape and object classification and then discuss our extensions of the proposed work which focus on surface normal prediction and semantic labeling specifically.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:02/27/2020
  • Modified By:Tatianna Richardson
  • Modified:02/27/2020

Categories

Keywords