PhD Proposal by Abhishek Das

Title: Building Agents That Can See, Talk, and Act

----------------

Date: Monday, December 17th, 2018

Time: 11:00am to 12:30pm (ET)

Location: CCB 247

Bluejeans: https://bluejeans.com/6506900315

Abhishek Das

Computer Science Ph.D. Student

School of Interactive Computing

Georgia Institute of Technology

abhishekdas.com

Committee:

----------------

Dr. Dhruv Batra (Advisor; School of Interactive Computing, Georgia Institute of Technology & Facebook AI Research)

Dr. Devi Parikh (School of Interactive Computing, Georgia Institute of Technology & Facebook AI Research)

Dr. Joelle Pineau (McGill University & Facebook AI Research, Montréal)

Dr. James Hays (School of Interactive Computing, Georgia Institute of Technology & Argo AI)

Dr. Jitendra Malik (University of California, Berkeley & Facebook AI Research)

Abstract:

----------------

My research goal is to build intelligent agents (the next generation of Cortana, Alexa, Siri, etc.) that possess the ability to perceive the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and execute actions in a physical environment. Even a small advance towards such agents can fundamentally change our lives – from assistive chatbots for the visually impaired, to natural language interaction with self-driving cars and in-home mobile robots!

Towards this grand goal, in this thesis, I will present my work on

1) visual dialog (see+talk) — agents capable of holding free-form conversations about images and reinforcement learning-based algorithms to train these visual dialog agents without exhaustively collecting human-annotated datasets (via self-play),

2) embodied question answering (see+talk+act) — agents with hierarchical and modular navigation architectures that can move around, actively perceive, and answer questions in simulated environments,

3) targeted multi-agent communication (multi-agent see+talk+act) — agents that can communicate with each other in a targeted manner for cooperative tasks, such that they learn both what messages to send and who to communicate with, solely from downstream task-specific reward without any communication supervision.

In proposed work, I will develop architectures for multi-agent embodied question answering, where the goal is to answer complex 3D visual reasoning questions (such as "What size is the cylinder that is left of the brown metal thing that is left of the big sphere?") by appropriately combining first-person active perception with navigation actions and inter-agent communication.

----------------

Media

No media selected

Summary

Details

Monday

Dec 17 2018

11:00am - 12:30pm

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:12/11/2018
Modified By:Tatianna Richardson
Modified:12/11/2018

Mercury (Hg)