event

PhD Defense by Linyun He

Primary tabs

Title: Risk-Aware Data-Driven Simulation: Uncertainty Quantification, Model Validation, and Optimization
Date: June 27, 2025
Time: 8:30AM – 10:30AM EST
Location: Groseclose 118
Zoom link: https://gatech.zoom.us/j/96097476498

Linyun He
Ph.D. Candidate in Operations Research
School of Industrial and Systems Engineering
Georgia Institute of Technology

Committee:
Dr. Eunhye Song (Advisor), H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Roshan Joseph, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Seong-Hee Kim, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Enlu Zhou, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Uday V. Shanbhag, Department of Industrial and Operations Engineering, University of Michigan

Abstract:
Simulation serves as a powerful methodology for analyzing complex stochastic systems, especially when analytical solutions are intractable. A fundamental but often implicit assumption underlying simulation-based analysis is that the simulation model accurately characterizes the behavior of the real-world system it represents. However, in practice, this assumption is rarely satisfied and some components (e.g., parameters) of the simulator are unknown and thus, must be estimated from data. In this dissertation, we adopt "data-driven simulation" as an umbrella term to refer to a class of stochastic simulation models that are calibrated/validated from real-world data. Due to finiteness of the data, the analysis based on data-driven simulation is subject to model risk stemming from discrepancy between the simulation model and the target system. This dissertation explores methodological frameworks for simulation-based inference and optimization in the presence of model risk, with a focus on three core challenges: uncertainty quantification, simulation model validation, and optimization under uncertainty.

Chapter 2 addresses the uncertainty quantification of a simulation model where its stochastic input-generating processes are estimated from data. Especially, we focus on performance measures that can be expressed as a ratio of two dependent simulation outputs' means. We employ the parametric bootstrap approach to construct percentile confidence intervals (CIs) for the ratio under the correct input models. A standard ratio estimator that takes the ratio of two sample means may have a large bias and variance when the simulation budget is small, leading to a CI with a poor coverage probability. We propose two new estimators: the kNN estimator pools information via k-nearest neighbor regression and performs well in low dimensions, and the kLR estimator extends the kNN approach by combining the likelihood ratio (LR) method to achieve better convergence rates and finite variance even in high-dimensional settings. We analyze the asymptotic efficiency of the estimators and propose finite-sample heuristics to improve the experiment design. Empirical studies demonstrate their superior performances compared to the standard approach.

Chapter 3 studies simulation model validation when the simulator's and system's key performance indicators (KPIs) are high-dimensional and evolve with time. We assume that the system KPIs are observed at the end of several finite-length epochs alongside with the system states at the beginning of the epochs. The simulator takes the observed system state and initializes its own state to generate the KPIs at the end of each epoch. Unlike the system KPIs, which are observed only once in each epoch, the simulator can generate multiple sample paths of the simulated KPIs. Merging these two data sets, we propose a two-part hypothesis testing procedure:  marginal tests for checking the distributional alignment of each KPI across all epochs, and a joint test for assessing whether the dependence structure, modeled by copulas, matches. The approach uses probability integral transform and nonparametric copula estimation, and establishes asymptotic guarantees on Type-I error.  To enhance finite-sample reliability, a stepdown multiple testing procedure with bootstrap-derived critical values is introduced.

Chapter 4 considers a simulation optimization problem under input uncertainty, where the parameters of the simulator's input models are updated using streaming data obtained in batches over multiple periods from the system. Since decisions must be made in each period, the chapter formulates a multi-period stochastic approximation (SA) framework that iteratively refines decisions based on increasingly accurate parameter estimates. Assuming strong convexity of the objective function, two algorithms are proposed. Re-start SA (ReSA) resets its stepsizes each period and takes just enough number of steps so that the SA error matches the input uncertainty error. Warm-start SA (WaSA) improves the computational efficiency of ReSA by adaptively reducing the stepsizes for later periods. Both are shown to achieve optimal convergence under some regularity conditions. A regularized version of ReSA is also introduced to relax an assumption on the known strong convexity parameter.
 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:06/23/2025
  • Modified By:Tatianna Richardson
  • Modified:06/23/2025

Categories

Keywords

Target Audience