{"656963":{"#nid":"656963","#data":{"type":"event","title":"PhD Defense by Zaiwei Chen","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle: \u003C\/strong\u003EA Unified Lyapunov Framework for Finite-Sample Analysis of\u0026nbsp;Reinforcement Learning Algorithms\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\n\u003Cstrong\u003EDate:\u003C\/strong\u003E\u0026nbsp;04\/07\/2022\u003Cbr \/\u003E\r\n\u003Cstrong\u003ETime:\u0026nbsp;\u003C\/strong\u003E1:00 - 2:30 pm EST\u003Cbr \/\u003E\r\n\u003Cstrong\u003ELocation:\u003C\/strong\u003E Groseclose 402, or virtually at\u0026nbsp;\u003Ca href=\u0022https:\/\/gatech.zoom.us\/j\/9849731860?pwd=K29BSStGekgvYkxlK1ZRZVp1QUlLdz09\u0022\u003Ehttps:\/\/gatech.zoom.us\/j\/9849731860?pwd=K29BSStGekgvYkxlK1ZRZVp1QUlLdz09\u003C\/a\u003E\u0026nbsp;(Meeting ID: 984 973 1860 Passcode: 7n46MA).\u003Cbr \/\u003E\r\n\u003Cstrong\u003EStudent Name: \u003C\/strong\u003EZaiwei Chen\u003Cbr \/\u003E\r\nMachine Learning PhD Student\u003Cbr \/\u003E\r\nSchool of Industrial \u0026amp; Systems Engineering\u003Cbr \/\u003E\r\nGeorgia Institute of Technology\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\n\u003Cstrong\u003ECommittee\u003C\/strong\u003E\u003Cbr \/\u003E\r\n1 Dr. Siva Theja Maguluri (Advisor)\u003Cbr \/\u003E\r\n2 Dr. John-Paul Clarke (Co-advisor)\u003Cbr \/\u003E\r\n3 Dr. Justin Romberg\u003Cbr \/\u003E\r\n4 Dr.\u0026nbsp;Ashwin Pananjady\u003Cbr \/\u003E\r\n5 Dr. Benjamin Van Roy\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\n\u003Cstrong\u003EAbstract: \u003C\/strong\u003EIn this thesis, we develop a unified Lyapunov approach for establishing finite-sample guarantees of\u0026nbsp;reinforcement learning (RL) algorithms. Since most of the RL algorithms can be modeled by stochastic approximation (SA) algorithms under Markovian noise, we first provide a Lyapunov framework for analyzing Markovian SA algorithms. The key idea is to construct a novel Lyapunov function (called generalized Moreau envelop) to capture the dynamics of the corresponding SA algorithm, and establish a negative drift inequality, which then can be repeatedly used to derive finite-sample bounds. We use our SA results to design RL algorithms and perform finite-sample analysis. Specifically, for tabular RL, we establish finite-sample bounds for Q-learning, variants of on-policy TD-learning algorithms such as n-step TD and TD(\\lambda), and off-policy TD-learning algorithms such as Retrace(\\lambda), Q^\\pi(\\lambda), and V-trace, etc. As by-products, we provide theoretical insight into the problem of efficiency of bootstrapping in on-policy TD-learning, and demonstrate the bias-variance trade-off in off-policy TD. For RL with linear function approximation, we design convergent variants of Q-learning and TD-learning in the presence of the deadly triad, and derive finite-sample guarantees. The TD-learning algorithm was later used in a general policy-based framework (including approximate policy iteration and natural policy gradient) to eventually find an optimal policy of the RL algorithm with an O(\\epsilon^{-2}) sample complexity.\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms"}],"uid":"27707","created_gmt":"2022-04-04 13:56:21","changed_gmt":"2022-04-04 13:56:21","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2022-04-07T14:00:00-04:00","event_time_end":"2022-04-07T16:00:00-04:00","event_time_end_last":"2022-04-07T16:00:00-04:00","gmt_time_start":"2022-04-07 18:00:00","gmt_time_end":"2022-04-07 20:00:00","gmt_time_end_last":"2022-04-07 20:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78761","name":"Faculty\/Staff"},{"id":"78771","name":"Public"},{"id":"78751","name":"Undergraduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}