event

PhD Proposal by Nirbhay Modhe

Primary tabs

Title: Task-Dependent Models for Reinforcement Learning

Date: Monday, October 25th, 2021

Time: 12:00 PM - 2:00 PM

Location (virtual): https://bluejeans.com/264974579/4014

 

Nirbhay Modhe

PhD Student in Computer Science

College of Computing

Georgia Institute of Technology

 

Committee

Dr. Dhruv Batra (Advisor, School of Interactive Computing, Georgia Institute of Technology)

Dr. Zsolt Kira (School of Interactive Computing, Georgia Institute of Technology)

Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)

Dr. Ashwin Kalyan (Allen Institute for AI)

Dr. Dipendra Misra (Microsoft Research)

 

Abstract

Model-based reinforcement learning (RL) is the field that lies at the intersection of planning and learning for sequential decision making in Markov Decision Processes (MDPs). Model-based RL has gained popularity due to its many potential benefits such as sample/data efficiency, optimization stability and targeted exploration. However, most of the research progress in model-based RL has persisted in the use of maximum-likelihood estimation for learning a correct dynamics model of future state transitions in MDPs -- an objective that does not align with the down-stream task of using the model to learn an approximately optimal control policy.

 

In this thesis, we push the boundaries of task-dependent model learning -- where the model learning objective aligns with the control objective of learning a policy -- and its applications in model-based reinforcement learning for continuous control. We present (1) a novel value-aware model learning objective derived by upper bounding the model-performance difference -- the difference in performance of a policy across two MDPs that differ in their transition dynamics and reward distributions. We study the relationship between model performance difference, generalization gap and optimality gap in reinforcement learning and find that even a sub-optimal policy is good enough to rank and select a good model from a list of candidate models that approximate the target MDP. Next, (2) we present an algorithm that deploys our proposed as well as existing value-aware model learning objectives in a model-based reinforcement learning problem setup, demonstrating the first practically significant performance in challenging continuous control simulation tasks, exceeding the performance and sample efficiency of maximum-likelihood estimation. In the proposed work, we aim to expand our task-dependent model learning framework to incorporate intelligent exploration techniques to further improve sample efficiency in model-based reinforcement learning.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:10/21/2021
  • Modified By:Tatianna Richardson
  • Modified:10/21/2021

Categories

Keywords