{"675345":{"#nid":"675345","#data":{"type":"event","title":"PhD Defense by Yan Li","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003EThesis Title\u003C\/strong\u003E: Theories and Algorithms for Efficient and Robust Sequential Decision Making\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EThesis Committee:\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EDr. Guanghui Lan (advisor), School of Industrial and Systems Engineering, Georgia Tech\u003C\/p\u003E\u003Cp\u003EDr. Tuo Zhao (co-advisor), School of Industrial and Systems Engineering, Georgia Tech\u003C\/p\u003E\u003Cp\u003EDr. Eric Delage, Department of Decision Sciences, HEC Montr\u00e9al\u003C\/p\u003E\u003Cp\u003EDr. Anton Kleywegt, School of Industrial and Systems Engineering, Georgia Tech\u003C\/p\u003E\u003Cp\u003EDr. Arkadi Nemirovski, School of Industrial and Systems Engineering, Georgia Tech\u003C\/p\u003E\u003Cp\u003EDr. Alexander Shapiro, School of Industrial and Systems Engineering, Georgia Tech\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EDate and Time\u003C\/strong\u003E: Friday, July 12, 11:00 AM to 12:30 PM (EST).\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ELocation:\u0026nbsp;\u003C\/strong\u003EGroseclose 403\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EMeeting Link\u003C\/strong\u003E:\u0026nbsp;\u003Ca href=\u0022https:\/\/gatech.zoom.us\/j\/99192359035\u0022\u003Ehttps:\/\/gatech.zoom.us\/j\/99192359035\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EAbstract: \u003C\/strong\u003EThis thesis aims to develop efficient and scalable first-order methods for solving reinforcement learning. The suite of algorithms developed here, termed policy gradient methods, directly use the gradient information of the non-convex objective with respect to the policy, yet are able to offer global convergence guarantees and oftentimes optimal computational and statistical complexities.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EIn Chapter 2, we design a novel policy gradient method for solving reinforcement learning problems with large state spaces. At each iteration, the computation involves only performing the policy update for a randomly sampled state and hence is independent of the state space. We further show that with an instance-dependent sampling scheme, the resulting method is capable of achieving substantial acceleration over existing alternatives.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EIn Chapter 3, we develop the first policy gradient method with provable convergence in both value and policy spaces. The developed method adopts a mirror descent-type policy update with a diminishing decomposable convex regularizer. In particular, we reveal the global linear and local superlinear convergence of the optimality gap. This global-to-local phase transition is subsequently exploited by the diminishing regularization to induce convergence in the policy space.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EIn Chapter 4, we proceed to address an important statistical challenge, namely the exploration of the action space. Existing approaches, such as the $\\epsilon$-greedy strategy, offer unsatisfactory patches that yield sub-optimal sample complexities. We instead develop a novel construction of the stochastic policy gradient and subsequently establish an optimal sample complexity, even when there is no explicit exploration over the actions.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EIn Chapter 5, we turn our attention to the problem of learning robust policies. To this end, we investigate the formulation of (distributionally) robust MDPs. We introduce a rather unifying dynamic game formulation that subsumes all existing case-by-case studies of robust MDPs. We also establish the strong duality of the game and the static formulations, and discuss issues associated with history-dependent policies.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EIn Chapter 6, we consider optimizing robust MDPs and subsequently learning robust policies (i.e., robust reinforcement learning). In particular, we design a policy gradient method that performs a mirror descent update to improve the policy at each iteration, with its first-order information constructed by another efficient gradient-based method developed in Chapter 7. We establish linear convergence when the ambiguity is known and sample complexities when the ambiguity is unknown. Notably, the method introduced here seems to be the first method that is applicable to solving large-scale robust MDPs in the literature.\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003ETheories and Algorithms for Efficient and Robust Sequential Decision Making\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Theories and Algorithms for Efficient and Robust Sequential Decision Making"}],"uid":"27707","created_gmt":"2024-07-03 17:38:03","changed_gmt":"2024-07-03 17:43:46","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2024-07-12T11:00:00-04:00","event_time_end":"2024-07-12T12:30:00-04:00","event_time_end_last":"2024-07-12T12:30:00-04:00","gmt_time_start":"2024-07-12 15:00:00","gmt_time_end":"2024-07-12 16:30:00","gmt_time_end_last":"2024-07-12 16:30:00","rrule":null,"timezone":"America\/New_York"},"location":"Groseclose 403","extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}