event

PhD Defense by Yujie Zhao

Primary tabs

Thesis Title: New Progress in Hot-spots Detection in Spatial-temporal Data, Partial-differential-equation-based Model Identification, and Statistical Computing

  

Advisors:

Dr. Yajun Mei, School of Industrial and Systems Engineering, Georgia Tech

Dr. Xiaoming Huo, School of Industrial and Systems Engineering, Georgia Tech

  

Committee members

Dr. Jianjun Shi, School of Industrial and Systems Engineering, Georgia Tech

Dr. Haomin Zhou, School of Mathematics, Georgia Tech

Dr. Sarah E. Holte, Fred Hutchinson Cancer Research Center

  

Date and Time: 1:00 pm (EST), Friday, April 9th, 2021 

  

Meeting URL:   https://bluejeans.com/101334221

Meeting ID:  101 334 221 (BlueJeans) 

  

Abstract

This thesis contributes to sparse identification problem in the spatio-temporal data and its computations. Our study helps (1) hot-spots detection among multivariate spatio-temporal data, (2) identifications in partial differential equations (PDE), and (3) optimization in the Least Absolute Shrinkage and Selection Operator (Lasso) type problem. And we have four main works.

 

In Chapter 1, we aim at sparse hot-spots detection in multivariate spatio-temporal data that are non-stationary over time. In this chapter, we propose an efficient statistical method to detect hot-spots through tensor decomposition, and our method has three steps. First, we fit the observed data into three components:  smooth global mean, sparse local anomalies, and random noises. Next, we estimate the parameters by a combination of Lasso and fused Lasso to address the spatial sparsity and temporal consistency. Finally, we apply a Cumulative Sum  (CUSUM) Control Chart to monitor the model  residuals, which allows us to detect when and where the hot-spot events occur. To demonstrate the usefulness of our proposed method, we compare it with several other methods in extensive numerical simulation studies and a real crime rate dataset.

 

In Chapter 2, we improve the methodology in Chapter 1 in two aspects. First, we propose an more computationally efficient algorithm to realize sparse hot-spots detection among high-dimensional spatio-temporal data. Second, we focus on detecting hot-spots with temporal circularity, instead of temporal continuity as in Chapter 1. This helps us handle many bio-surveillance and healthcare applications, where data sources are measured from many spatial locations repeatedly over time, say, daily/weekly/monthly. The usefulness of our proposed methodology is validated through numerical simulation and a real-world dataset in the weekly number of gonorrhea cases from 2006 to 2018 for 50  states in the United States.

 

In Chapter 3, we propose a two-stage method called Spline Assisted Partial Differential Equation involved Model Identification (SAPDEMI) to efficiently identify the underlying partial differential equation (PDE) models from the noisy data. In the first stage -- functional estimation stage -- we employ the cubic spline to estimate the unobservable derivatives, which serve as candidates included in the underlying PDE models. In the second stage -- model identification stage-- we apply Lasso to identify the underlying PDE model. The contributions of our proposed SAPDEMI method are: (1) it is computationally efficient in the functional estimation stage because it achieves the lowest possible order of complexity, (2) we focus on the model selections in the model identification stage, while the existing literature mostly focus on parameter estimations, (3) we develop statistical properties of our method for correct identification.

 

In Chapter 4, we focus on developing an algorithm to solve optimization in the Lasso-type problem, whose objective function is not strictly convex when the number of features is less than the number of samples. To handle this non-strict convexity, we use a homotopic method, i.e., use a sequence of surrogate functions to approximate the L1 penalty in the Lasso-type problem. The surrogate functions will converge to the L1 penalty in the Lasso estimator. At the same time,  each surrogate function is strictly convex, which enables a provable faster numerical rate of convergence. In this chapter, we demonstrate that by meticulously defining the surrogate functions, one can prove a faster numerical convergence rate than any existing methods in computing for the Lasso-type of estimators.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:03/10/2021
  • Modified By:Tatianna Richardson
  • Modified:03/10/2021

Categories

Keywords