event

PhD Defense by Sajad Khodadadian

Primary tabs

Thesis Title: Sample Complexity of Reinforcement Learning Algorithms with a Focus on Policy Space Methods

 

Thesis Committee:

Dr. Siva Theja Maguluri (Advisor), Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Guanghui (George) Lan, Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Ashwin Pananjady, Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Justin Romberg, Electrical and Computer Engineering, Georgia Institute of Technology

Dr. Daniel Russo, Columbia Business School, Columbia University

 

Date and Time: Monday, Nov 20th, 2 pm (EST)

 

In-Person Location: ISyE Executive Boardroom (Groseclose 402)

Meeting Link: https://gatech.zoom.us/j/93338819620

 

 

Abstract:

In this thesis, we develop fast reinforcement learning algorithms with finite sample complexity guarantees. The work is divided into two main parts. In the first, we investigate stochastic approximation across various domains to establish finite sample complexity bounds. We study two settings: federated stochastic approximation and two-time-scale linear stochastic approximation with Markovian noise. In the former, we develop a FedSAM algorithm where multiple agents are utilized to solve a fixed-point equation, following a stochastic approximation with Markovian noise. Moreover, we show that FedSAM has linear speedup with respect to the number of agents, while enjoying a constant communication cost. In the latter, we explore two-time-scale linear stochastic approximation with Markovian noise, establishing tight finite-time bounds.

 

The second part delves into finite-time bounds for reinforcement learning algorithms, with an emphasis on policy space methods. First, we consider two-time-scale natural actor-critic algorithm with on-policy data. For this algorithm we establish a $\epsilon^{-6}$ sample complexity for convergence to the global optimum. Next, we study two-loop natural actor-critic, and we establish a $\epsilon^{-3}$ sample complexity, improving upon the two-time-scale counterpart. In this case, we consider an off-policy sampling strategy.

 

To enhance the sample complexity of the natural actor-critic, we separate the algorithm into 'Actor' and 'Critic' components. For the Critic, we consider federated TD-learning and TD-learning with Polyak averaging. For the former, we show a linear speedup, and in the latter we establish a tight finite time bound. Furthermore, we establish a tight finite time convergence bound for the TDC algorithm. For the Actor, we demonstrate linear and superlinear convergence rates for the natural policy gradient.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/07/2023
  • Modified By:Tatianna Richardson
  • Modified:11/07/2023

Categories

Keywords

Target Audience