event

PhD Defense by Abhishek Das

Primary tabs

Title: Building Agents That Can See, Talk, and Act

----------------

 

Abhishek Das

Ph.D. Candidate in Computer Science

School of Interactive Computing

Georgia Institute of Technology

https://abhishekdas.com

 

Date: Monday, March 9th, 2020

Time: 02:00 pm to 04:00 pm (ET)

Location: Coda 1215 Midtown

Bluejeans: https://bluejeans.com/54039458580

 

Committee:

----------------

Dr. Dhruv Batra (Advisor; School of Interactive Computing, Georgia Institute of Technology & Facebook AI Research) Dr. Devi Parikh (School of Interactive Computing, Georgia Institute of Technology & Facebook AI Research) Dr. James Hays (School of Interactive Computing, Georgia Institute of Technology & Argo AI) Dr. Joelle Pineau (McGill University & Facebook AI Research) Dr. Jitendra Malik (University of California, Berkeley & Facebook AI Research)

 

Abstract:

----------------

My research goal is to build intelligent agents that possess the ability to perceive the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and execute actions in a physical environment. Even a small advance towards such agents can fundamentally change our lives – from assistive chatbots for the visually impaired, to natural language interaction with self-driving cars!

 

Towards this grand goal, in this thesis, I will present my work on

1) visual dialog — agents capable of holding free-form conversations about images and reinforcement learning-based algorithms to train these visual dialog agents without exhaustively collecting human-annotated datasets (via self-play),

2) embodied question answering — agents with hierarchical and modular navigation architectures that can move around, actively perceive, and answer questions in simulated environments,

3) targeted multi-agent communication — teams of agents that can communicate with each other in a targeted manner for cooperative tasks, such that they learn both what messages to send and who to communicate with, solely from downstream task-specific reward without any communication supervision,

4) question-answering as a general-purpose task-agnostic probe to ask a self-supervised embodied agent what it knows about its physical world, and using it to quantify differences in internal representations agents develop when trained with various auxiliary generative and contrastive objectives.

----------------

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:02/25/2020
  • Modified By:Tatianna Richardson
  • Modified:02/25/2020

Categories

Keywords