event

PhD Defense by Yan Li

Primary tabs

Thesis Title: Theories and Algorithms for Efficient and Robust Sequential Decision Making

 

Thesis Committee:

Dr. Guanghui Lan (advisor), School of Industrial and Systems Engineering, Georgia Tech

Dr. Tuo Zhao (co-advisor), School of Industrial and Systems Engineering, Georgia Tech

Dr. Eric Delage, Department of Decision Sciences, HEC Montréal

Dr. Anton Kleywegt, School of Industrial and Systems Engineering, Georgia Tech

Dr. Arkadi Nemirovski, School of Industrial and Systems Engineering, Georgia Tech

Dr. Alexander Shapiro, School of Industrial and Systems Engineering, Georgia Tech

 

Date and Time: Friday, July 12, 11:00 AM to 12:30 PM (EST). 

Location: Groseclose 403

Meeting Linkhttps://gatech.zoom.us/j/99192359035

 

Abstract: This thesis aims to develop efficient and scalable first-order methods for solving reinforcement learning. The suite of algorithms developed here, termed policy gradient methods, directly use the gradient information of the non-convex objective with respect to the policy, yet are able to offer global convergence guarantees and oftentimes optimal computational and statistical complexities. 

 

In Chapter 2, we design a novel policy gradient method for solving reinforcement learning problems with large state spaces. At each iteration, the computation involves only performing the policy update for a randomly sampled state and hence is independent of the state space. We further show that with an instance-dependent sampling scheme, the resulting method is capable of achieving substantial acceleration over existing alternatives.

 

In Chapter 3, we develop the first policy gradient method with provable convergence in both value and policy spaces. The developed method adopts a mirror descent-type policy update with a diminishing decomposable convex regularizer. In particular, we reveal the global linear and local superlinear convergence of the optimality gap. This global-to-local phase transition is subsequently exploited by the diminishing regularization to induce convergence in the policy space.

 

In Chapter 4, we proceed to address an important statistical challenge, namely the exploration of the action space. Existing approaches, such as the $\epsilon$-greedy strategy, offer unsatisfactory patches that yield sub-optimal sample complexities. We instead develop a novel construction of the stochastic policy gradient and subsequently establish an optimal sample complexity, even when there is no explicit exploration over the actions.

 

In Chapter 5, we turn our attention to the problem of learning robust policies. To this end, we investigate the formulation of (distributionally) robust MDPs. We introduce a rather unifying dynamic game formulation that subsumes all existing case-by-case studies of robust MDPs. We also establish the strong duality of the game and the static formulations, and discuss issues associated with history-dependent policies.

 

In Chapter 6, we consider optimizing robust MDPs and subsequently learning robust policies (i.e., robust reinforcement learning). In particular, we design a policy gradient method that performs a mirror descent update to improve the policy at each iteration, with its first-order information constructed by another efficient gradient-based method developed in Chapter 7. We establish linear convergence when the ambiguity is known and sample complexities when the ambiguity is unknown. Notably, the method introduced here seems to be the first method that is applicable to solving large-scale robust MDPs in the literature.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:07/03/2024
  • Modified By:Tatianna Richardson
  • Modified:07/03/2024

Categories

Keywords

Target Audience