PhD Defense by Nirbhay Modhe

Primary tabs

Title: Leveraging Value-awareness for Online and Offline Model-based Reinforcement Learning

Date: Thursday, October 27th, 2022

Time: 9:00 AM - 11:00 AM Eastern Time

Location (virtual): https://bluejeans.com/264974579/4014


Nirbhay Modhe

Ph.D. Candidate

School of Interactive Computing

College of Computing

Georgia Institute of Technology



Dr. Dhruv Batra (advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Zsolt Kira, School of Interactive Computing, Georgia Institute of Technology

Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology

Dr. Gaurav Sukhatme, University of Southern California

Dr. Ashwin Kalyan, Allen Institute for AI (AI2)



Model-based Reinforcement Learning (RL) lies at the intersection of planning and learning for sequential decision making. Value-awareness in model learning has recently emerged as a means to imbue task or reward information into the objective of model learning, in order for the model to leverage specificity of a task. While finding success in theory as being superior to maximum likelihood estimation in the context of (online) model-based RL, value-awareness has remained impractical for most non-trivial tasks.


This thesis aims to bridge the gap in theory and practice by applying the principle of value-awareness to two settings -- the online RL setting and offline RL setting. First, within online RL, this thesis revisits value-aware model learning from the perspective of minimizing performance difference, obtaining a novel value-aware model learning objective as a direct upper bound of it. Then, this thesis investigates and remedies the issue of stale value estimates that has so far been holding back the practicality of value-aware model learning. Using the proposed remedy, performance improvements are presented over maximum-likelihood based baselines and existing value-aware objectives, in several continuous control tasks, while also enabling existing value-aware objectives to become performant.


In the offline RL setting, this thesis takes a step back from model learning and applies value-awareness towards better data augmentation. Such data augmentation, when applied to model-based offline RL algorithms, allows for leveraging unseen states with low epistemic uncertainty that have previously not been reachable within the assumptions and limitations of model-based offline RL. Value-aware state augmentations are found to enable better performance on offline RL benchmarks compared to existing baselines and non-value-aware alternatives.


  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:10/13/2022
  • Modified By:Tatianna Richardson
  • Modified:10/13/2022