PhD Defense by Himanshu Sahni

Title: Task Generalized MDPs for Multi-Task Reinforcement Learning

Date: December 8th, 2021 (Wednesday)

Time: 3:00 - 4:30 pm Eastern Time (12:00-1:30 PM Pacific Time)

Location: Coda C1115 Druid Hills and https://bluejeans.com/556574054/8997

Himanshu Sahni

Computer Science PhD Candidate

School of Interactive Computing
Georgia Institute of Technology

Committee

1. Dr. Charles Isbell (Advisor), School of Interactive Computing, John P. Imlay, Jr. Dean of the College of Computing, Georgia Institute of Technology

2. Dr. Judy Hoffman, School of Interactive Computing, Georgia Institute of Technology

3. Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology

4. Dr. Dhruv Batra, School of Interactive Computing, Georgia Institute of Technology

5. Dr. Volodymyr Mnih, DeepMind

Abstract

Reinforcement learning (RL) has seen widespread success in creating intelligent agents in several challenging domains. Yet, training RL agents remains prohibitively expensive in terms of the number of environment interactions required. One of the reasons for this inefficiency is that every new task is usually learned from scratch, instead of leveraging information from similar tasks.

In this talk, I will describe task-generalized Markov Decision Processes which are built from a distribution of tasks, or MDPs that differ only in their reward functions. This thesis demonstrates that task-generalized MDPs can provide significant speedups for reinforcement learning in multi-task settings. Specifically, I claim that by first building a task-generalized MDP from a set of training tasks, one can achieve significant speedups on later tasks drawn from the set.

There are three key contributions made in this work:

1. I introduce the idea of combining attention, short term memory and unsupervised rewards to build a state representation in a limited field of view environment. By altering the underlying MDP's state space, we can enable reinforcement learning of tasks within it.

2. HALGAN, which inserts realistic goals retroactively into desired locations along the agent's trajectory while respecting the environment dynamics. This work extends the idea of Hindsight Experience Replay to visual environments thereby speeding up reinforcement learning in them.

3. A framework for task distribution biased unsupervised reinforcement learning. This framework allows for learning skills that are biased towards a task distribution and simultaneously distinct from one another. Skills learnt in this manner generalize better to downstream tasks compared against skill learning methods that do not incorporate this bias.

Media

No media selected

Summary

Details

Wednesday

Dec 8 2021

03:00pm - 04:30pm

URL: https://bluejeans.com/556574054/8997

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:12/02/2021
Modified By:Tatianna Richardson
Modified:12/02/2021

Mercury (Hg)

PhD Defense by Himanshu Sahni

Log in

Georgia Institute of Technology

PhD Defense by Himanshu Sahni

Primary tabs

Log in

Georgia Institute of Technology