event

PhD Defense by Himanshu Sahni

Primary tabs

Title: Task Generalized MDPs for Multi-Task Reinforcement Learning

 

Date: December 8th, 2021 (Wednesday)

Time: 3:00 - 4:30 pm Eastern Time (12:00-1:30 PM Pacific Time)

Location: Coda C1115 Druid Hills and https://bluejeans.com/556574054/8997

 

Himanshu Sahni

Computer Science PhD Candidate

School of Interactive Computing
Georgia Institute of Technology

 

Committee

1. Dr. Charles Isbell (Advisor), School of Interactive Computing, John P. Imlay, Jr. Dean of the College of Computing, Georgia Institute of Technology

2. Dr. Judy Hoffman, School of Interactive Computing, Georgia Institute of Technology

3. Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology

4. Dr. Dhruv Batra, School of Interactive Computing, Georgia Institute of Technology

5. Dr. Volodymyr Mnih, DeepMind

 

Abstract

 

Reinforcement learning (RL) has seen widespread success in creating intelligent agents in several challenging domains. Yet, training RL agents remains prohibitively expensive in terms of the number of environment interactions required. One of the reasons for this inefficiency is that every new task is usually learned from scratch, instead of leveraging information from similar tasks.

 

In this talk, I will describe task-generalized Markov Decision Processes which are built from a distribution of tasks, or MDPs that differ only in their reward functions. This thesis demonstrates that task-generalized MDPs can provide significant speedups for reinforcement learning in multi-task settings. Specifically, I claim that by first building a task-generalized MDP from a set of training tasks, one can achieve significant speedups on later tasks drawn from the set.

 

There are three key contributions made in this work:

 

1. I introduce the idea of combining attention, short term memory and unsupervised rewards to build a state representation in a limited field of view environment. By altering the underlying MDP's state space, we can enable reinforcement learning of tasks within it.

 

2. HALGAN, which inserts realistic goals retroactively into desired locations along the agent's trajectory while respecting the environment dynamics. This work extends the idea of Hindsight Experience Replay to visual environments thereby speeding up reinforcement learning in them.

 

3. A framework for task distribution biased unsupervised reinforcement learning. This framework allows for learning skills that are biased towards a task distribution and simultaneously distinct from one another. Skills learnt in this manner generalize better to downstream tasks compared against skill learning methods that do not incorporate this bias.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:12/02/2021
  • Modified By:Tatianna Richardson
  • Modified:12/02/2021

Categories

Keywords