event
PhD Defense by Caleb Ju
Primary tabs
Title: Fast and reliable optimization for dynamic decision-making under uncertainty
Date: May 21st, 2026
Time: 3:00 PM – 4:30 PM EST
Location: Groseclose 404 and Zoom
Meeting Link: https://gatech.zoom.us/j/97324269229
Caleb Ju
Ph.D. Candidate in Operations Research
School of Industrial and Systems Engineering
Georgia Institute of Technology
Committee:
Dr. Guanghui Lan (advisor), School of Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Yuejie Chi, Department of Statistics and Data Science, Yale University
Dr. Constance Crozier, School of Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Katya Scheinberg, School of Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Alexander Shapiro, School of Industrial and Systems Engineering, Georgia Institute of Technology
Abstract:
This thesis focuses on the design and implementation of stochastic optimization methods towards dynamic decision-making under uncertainty. This includes problems such as reinforcement learning (RL), multi-stage stochastic programs, and stochastic optimal control, as well as applications in energy and sustainability. A central theme is developing new algorithms with state-of-the-art sample complexity under relaxed assumptions that better match practice.
This thesis starts by deriving new convergence guarantees and termination criteria for finite state and action Markov decision processes (MDPs) and RL problems. Chapter 2 introduces a new advantage gap function for these problems. This gap function provides close approximations of the (unknown) optimality gap, which can be easily estimated in a data-driven manner for RL. Moreover, by incorporating the gap function into the design of step size rules, we demonstrate that policy gradient methods can solve MDPs in strongly-polynomial time. This result shows popular gradient-based approaches can efficiently find exact solutions when the model is known, and it matches the strongly-polynomial runtime of simplex and Howard’s policy iteration proven by Ye. In Chapter 3, we develop a novel framework called auto-exploration for solving RL problems in the online (or single-trajectory) model. We use the framework to derive a new algorithm-independent sample complexity under weaker mixing assumptions on the optimal policy. Moreover, our algorithm is parameter-free since it does not require a priori knowledge of the unknown mixing time. Additionally, the method can easily incorporate linear function approximation.
After investigating finite state and action problems, the thesis advances towards RL and classical control problems over continuous spaces. In Chapter 4, we revisit the linear quadratic regulator in the online model. Despite the non-convexity of the problem in policy space, we design a globally convergent natural policy gradient paired with a new conditional stochastic primal-dual algorithm. This combined algorithm delivers state-of-the-art sample complexity under a relaxed assumption on the stability of the initial controller (rather than the stability of all intermediate controllers, which is commonly posited in prior art). Then Chapter 5 introduces a new policy dual averaging (PDA) for solving RL problems over general state and action space. PDA can easily incorporate function approximation (e.g., nonlinear kernels and neural networks) while providing efficient global convergence guarantees. Preliminary numerical results demonstrate the robustness of PDA and show it can be competitive with state-of-the-art RL algorithms. In Chapter 6, we study stationary stochastic programs over an infinite horizon. We introduce a continually-exploring infinite-horizon explorative dual dynamic programming. Compared to the celebrated stochastic dual dynamic programming, our method explores the feasible region longer and updates the cutting-plane model more frequently. These innovations yield new iteration complexities while offering numerically efficient performance on inventory control and hydrothermal planning problems.
The thesis concludes with RL applications towards energy and sustainability. Chapter 7 applies RL towards the operation of grid-scale batteries co-located with solar generation. Compared to simpler rules-based control, we show RL has two significant downstream effects: (1) solar energy is more effectively shifted towards high demand periods, and (2) potential to reduce ramping issues caused by super-position of many similar battery operations. In Chapter 8, we utilize RL for real-time control of multiple feedstocks in waste biorefining processes. In the short term, RL achieves faster target tracking with increased precision and accuracy, while in the long term, it shows adaptive and robust behavior even under additional seasonal supply variability, meeting downstream demand with high probability.
Groups
Status
- Workflow status: Published
- Created by: Tatianna Richardson
- Created: 05/11/2026
- Modified By: Tatianna Richardson
- Modified: 05/11/2026
Categories
Keywords
User Data
Target Audience