event

PhD Defense by Caleb Ju

Primary tabs

Title: Fast and reliable optimization for dynamic decision-making under uncertainty

Date: May 21st, 2026

Time: 3:00 PM – 4:30 PM EST

Location: Groseclose 404 and Zoom

Meeting Link: https://gatech.zoom.us/j/97324269229

 

Caleb Ju

Ph.D. Candidate in Operations Research

School of Industrial and Systems Engineering

Georgia Institute of Technology

 

Committee:

Dr. Guanghui Lan (advisor), School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Yuejie Chi, Department of Statistics and Data Science, Yale University

Dr. Constance Crozier, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Katya Scheinberg, School of Industrial and Systems Engineering, Georgia Institute of Technology

Dr. Alexander Shapiro, School of Industrial and Systems Engineering, Georgia Institute of Technology

 

Abstract:

This thesis focuses on the design and implementation of stochastic optimization methods towards dynamic decision-making under uncertainty. This includes problems such as reinforcement learning (RL), multi-stage stochastic programs, and stochastic optimal control, as well as applications in energy and sustainability. A central theme is developing new algorithms with state-of-the-art sample complexity under relaxed assumptions that better match practice. 

 

This thesis starts by deriving new convergence guarantees and termination criteria for finite state and action Markov decision processes (MDPs) and RL problems. Chapter 2 introduces a new advantage gap function for these problems. This gap function provides close approximations of the (unknown) optimality gap, which can be easily estimated in a data-driven manner for RL. Moreover, by incorporating the gap function into the design of step size rules, we demonstrate that policy gradient methods can solve MDPs in strongly-polynomial time. This result shows popular gradient-based approaches can efficiently find exact solutions when the model is known, and it matches the strongly-polynomial runtime of simplex and Howard’s policy iteration proven by Ye. In Chapter 3, we develop a novel framework called auto-exploration for solving RL problems in the online (or single-trajectory) model. We use the framework to derive a new algorithm-independent sample complexity under weaker mixing assumptions on the optimal policy. Moreover, our algorithm is parameter-free since it does not require a priori knowledge of the unknown mixing time. Additionally, the method can easily incorporate linear function approximation.

 

After investigating finite state and action problems, the thesis advances towards RL and classical control problems over continuous spaces. In Chapter 4, we revisit the linear quadratic regulator in the online model. Despite the non-convexity of the problem in policy space, we design a globally convergent natural policy gradient paired with a new conditional stochastic primal-dual algorithm. This combined algorithm delivers state-of-the-art sample complexity under a relaxed assumption on the stability of the initial controller (rather than the stability of all intermediate controllers, which is commonly posited in prior art). Then Chapter 5 introduces a new policy dual averaging (PDA) for solving RL problems over general state and action space. PDA can easily incorporate function approximation (e.g., nonlinear kernels and neural networks) while providing efficient global convergence guarantees. Preliminary numerical results demonstrate the robustness of PDA and show it can be competitive with state-of-the-art RL algorithms. In Chapter 6, we study stationary stochastic programs over an infinite horizon. We introduce a continually-exploring infinite-horizon explorative dual dynamic programming. Compared to the celebrated stochastic dual dynamic programming, our method explores the feasible region longer and updates the cutting-plane model more frequently. These innovations yield new iteration complexities while offering numerically efficient performance on inventory control and hydrothermal planning problems. 

 

The thesis concludes with RL applications towards energy and sustainability. Chapter 7 applies RL towards the operation of grid-scale batteries co-located with solar generation. Compared to simpler rules-based control, we show RL has two significant downstream effects: (1) solar energy is more effectively shifted towards high demand periods, and (2) potential to reduce ramping issues caused by super-position of many similar battery operations. In Chapter 8, we utilize RL for real-time control of multiple feedstocks in waste biorefining processes. In the short term, RL achieves faster target tracking with increased precision and accuracy, while in the long term, it shows adaptive and robust behavior even under additional seasonal supply variability, meeting downstream demand with high probability.

Status

  • Workflow status: Published
  • Created by: Tatianna Richardson
  • Created: 05/11/2026
  • Modified By: Tatianna Richardson
  • Modified: 05/11/2026

Categories

Keywords

User Data

Target Audience