PhD Proposal by Himanshu Sahni

Title: Hallucinating agent experience to speed up reinforcement learning

Himanshu Sahni

Ph.D. student in Computer Science

School of Interactive Computing

College of Computing

Georgia Institute of Technology

Date: Tuesday, March 17, 2020

Time: 12:45pm-2:30 PM EST

Location: https://bluejeans.com/536486204

Meeting ID: 536 486 204

**Note: this proposal is remote-only due to the institute's guidelines on COVID-19**

---

Committee:

Dr. Charles Isbell (Advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology

Dr. Judy Hoffman, School of Interactive Computing, Georgia Institute of Technology

Dr. Dhruv Batra, School of Interactive Computing, Georgia Institute of Technology

---

Summary:

Reinforcement learning has seen widespread success recently. Yet, training RL agents remains prohibitively expensive in terms of number of environment interactions. The overall aim of this research is to significantly reduce sample complexity required for training RL agents, making it easier to deploy them in the real world and quickly learn from experience. This proposal focuses on learning how to alter experience collected by the agent during exploration, rather than the learning algorithm itself. We define realistic alterations, those permitted by the environment state space and dynamics, to the trajectory of an agent as hallucinations. I will demonstrate that by presenting hallucinated data to off-the-shelf RL algorithms, we can significantly improve their sample efficiency.

As contributions, I will outline three ways of altering agent experience to benefit learning. The first uses hallucinations to train a representation of the state of the environment when the agent has a limited field of view. Key components of this system are a short term memory architecture for such environments and an adversarially trained attention controller. The second contribution is a method to alter visual trajectories in hindsight using learned hallucinations of goal images. Combined with Hindsight Experience Replay, this significantly speeds up reinforcement learning as shown in two navigation based domains. The third proposed contribution outlines how to hallucinate realistic subgoals using state-based value functions.

The contributions above serve to support the thesis statement: We can alter the distribution of an agent's future experiences by

Media

No media selected

Summary

Details

Tuesday

Mar 17 2020

01:30pm - 03:30pm

URL: https://bluejeans.com/536486204

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow status: Published
Created by: Tatianna Richardson
Created: 03/16/2020
Modified By: Tatianna Richardson
Modified: 03/16/2020

Mercury (Hg)