{"651905":{"#nid":"651905","#data":{"type":"event","title":"PhD Proposal by Nirbhay Modhe","body":[{"value":"\u003Cp\u003ETitle: Task-Dependent Models for Reinforcement Learning\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDate: Monday, October 25th, 2021\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETime: 12:00 PM - 2:00 PM\u003C\/p\u003E\r\n\r\n\u003Cp\u003ELocation (virtual): \u003Ca href=\u0022https:\/\/bluejeans.com\/264974579\/4014\u0022\u003Ehttps:\/\/bluejeans.com\/264974579\/4014\u003C\/a\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ENirbhay Modhe\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPhD Student in Computer Science\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECollege of Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003EGeorgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECommittee\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Dhruv Batra (Advisor, School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Zsolt Kira (School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Ashwin Kalyan (Allen Institute for AI)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Dipendra Misra (Microsoft Research)\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAbstract\u003C\/p\u003E\r\n\r\n\u003Cp\u003EModel-based reinforcement learning (RL) is the field that lies at the intersection of planning and learning for sequential decision making in Markov Decision Processes (MDPs). Model-based RL has gained popularity due to its many potential benefits such as sample\/data efficiency, optimization stability and targeted exploration. However, most of the research progress in model-based RL has persisted in the use of maximum-likelihood estimation for learning a correct dynamics model of future state transitions in MDPs -- an objective that does not align with the down-stream task of using the model to learn an approximately optimal control policy.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIn this thesis, we push the boundaries of task-dependent model learning -- where the model learning objective aligns with the control objective of learning a policy -- and its applications in model-based reinforcement learning for continuous control. We present (1) a novel value-aware model learning objective derived by upper bounding the model-performance difference -- the difference in performance of a policy across two MDPs that differ in their transition dynamics and reward distributions. We study the relationship between model performance difference, generalization gap and optimality gap in reinforcement learning and find that even a sub-optimal policy is good enough to rank and select a good model from a list of candidate models that approximate the target MDP. Next, (2) we present an algorithm that deploys our proposed as well as existing value-aware model learning objectives in a model-based reinforcement learning problem setup, demonstrating the first practically significant performance in challenging continuous control simulation tasks, exceeding the performance and sample efficiency of maximum-likelihood estimation. In the proposed work, we aim to expand our task-dependent model learning framework to incorporate intelligent exploration techniques to further improve sample efficiency in model-based reinforcement learning.\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Task-Dependent Models for Reinforcement Learning"}],"uid":"27707","created_gmt":"2021-10-21 14:26:52","changed_gmt":"2021-10-21 14:26:52","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2021-10-25T13:00:00-04:00","event_time_end":"2021-10-25T15:00:00-04:00","event_time_end_last":"2021-10-25T15:00:00-04:00","gmt_time_start":"2021-10-25 17:00:00","gmt_time_end":"2021-10-25 19:00:00","gmt_time_end_last":"2021-10-25 19:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"102851","name":"Phd proposal"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78761","name":"Faculty\/Staff"},{"id":"78771","name":"Public"},{"id":"78751","name":"Undergraduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}