PhD Defense by Sajad Khodadadian

Thesis Title: Sample Complexity of Reinforcement Learning Algorithms with a Focus on Policy Space Methods

Thesis Committee:

Dr. Siva Theja Maguluri (Advisor), Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Guanghui (George) Lan, Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Ashwin Pananjady, Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Justin Romberg, Electrical and Computer Engineering, Georgia Institute of Technology

Dr. Daniel Russo, Columbia Business School, Columbia University

Date and Time: Monday, Nov 20th, 2 pm (EST)

In-Person Location: ISyE Executive Boardroom (Groseclose 402)

Meeting Link: https://gatech.zoom.us/j/93338819620

Abstract:

In this thesis, we develop fast reinforcement learning algorithms with finite sample complexity guarantees. The work is divided into two main parts. In the first, we investigate stochastic approximation across various domains to establish finite sample complexity bounds. We study two settings: federated stochastic approximation and two-time-scale linear stochastic approximation with Markovian noise. In the former, we develop a FedSAM algorithm where multiple agents are utilized to solve a fixed-point equation, following a stochastic approximation with Markovian noise. Moreover, we show that FedSAM has linear speedup with respect to the number of agents, while enjoying a constant communication cost. In the latter, we explore two-time-scale linear stochastic approximation with Markovian noise, establishing tight finite-time bounds.

The second part delves into finite-time bounds for reinforcement learning algorithms, with an emphasis on policy space methods. First, we consider two-time-scale natural actor-critic algorithm with on-policy data. For this algorithm we establish a $\epsilon^{-6}$ sample complexity for convergence to the global optimum. Next, we study two-loop natural actor-critic, and we establish a $\epsilon^{-3}$ sample complexity, improving upon the two-time-scale counterpart. In this case, we consider an off-policy sampling strategy.

To enhance the sample complexity of the natural actor-critic, we separate the algorithm into 'Actor' and 'Critic' components. For the Critic, we consider federated TD-learning and TD-learning with Polyak averaging. For the former, we show a linear speedup, and in the latter we establish a tight finite time bound. Furthermore, we establish a tight finite time convergence bound for the TDC algorithm. For the Actor, we demonstrate linear and superlinear convergence rates for the natural policy gradient.

Media

No media selected

Summary

Sample Complexity of Reinforcement Learning Algorithms with a Focus on Policy Space Methods

Details

Monday

Nov 20 2023

02:00am - 04:00am

Location: ISyE Executive Boardroom (Groseclose 402)

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:11/07/2023
Modified By:Tatianna Richardson
Modified:11/07/2023

Mercury (Hg)

PhD Defense by Sajad Khodadadian

Log in

Georgia Institute of Technology

PhD Defense by Sajad Khodadadian

Primary tabs

Log in

Georgia Institute of Technology