event

PhD Defense by Hongzhen (Jenny) Tian

Primary tabs

Thesis Title: Information Extraction from Messy Data, Noisy Spectra, Incomplete Data, and Unlabeled Images

 

Advisors:

Dr. Chuck Zhang, School of Industrial and Systems Engineering, Georgia Tech

Dr. Yajun Mei, School of Industrial and Systems Engineering, Georgia Tech

 

Committee Members:

Dr. Jianjun Shi, School of Industrial and Systems Engineering, Georgia Tech

Dr. Ben Wang, School of Industrial and Systems Engineering, Georgia Tech

Dr. Yan Wang, The George W. Woodruff School of Mechanical Engineering, Georgia Tech

 

Date and Time: Friday, June 18, 2021 @ 10:00am (EST)

 

Meeting URL: https://bluejeans.com/332096290

Meeting ID (BlueJeans): 332 096 290

 

Abstract

 

Data collected from real-world scenarios are never ideal but often messy because data errors are inevitable and may occur in creative and unexpected ways. And there are always some unexpected tricky troubles between ideal theory and real-world applications. Although with the development of data science, more and more elegant algorithms have been well developed and validated by rigorous proof, data scientists still have to spend 50% to 80% of their work time on cleaning and organizing data, leaving little time for actual data analysis. This dissertation research involves three scenarios of statistical modeling with common data issues: quantifying function effect on noisy functional data, multistage decision-making model over incomplete data, and unsupervised image segmentation over imperfect engineering images. And three methodologies are proposed accordingly to solve them efficiently.

 

In Chapter 2, a general two-step procedure is proposed to quantify the effects of a certain treatment on the spectral signals subjecting to multiple uncertainties for an engineering application which involves materials treatment for aircraft maintenance. With this procedure, two types of uncertainties in the spectral signals, offset shift and multiplicative error, are carefully addressed. In the two-step procedure, a novel optimization problem is formulated to estimate the representative template spectrum first, and then another optimization problem is formulated to obtain the pattern of modification g that reveals how the treatment affects the shape of the spectral signal, as well as a vector δ that describes the degree of change caused by different treatment magnitudes. The effectiveness of the proposed method is validated in a simulation study. Furtherly, in a real case study, the proposed method is used to investigate the effect of plasma exposure on the FTIR spectra. As a result, the proposed method effectively identifies the pattern of modification under uncertainties in the manufacturing environment, which matches the knowledge of the affected chemical components by the plasma treatment. And the recovered magnitude of modification provides guidance in selecting the control parameter of the plasma treatment.

 

In Chapter 3, an active learning-based multistage sequential decision-making model is proposed to assist doctors and patients to make cost-effective treatment recommendations when some clinical data are more expensive or time-consuming to collect than other laboratory data. The main idea is to formulate the incomplete clinical data into a multistage decision-making model where the doctors can make diagnostics decisions sequentially in these stages, and actively collect only the necessary examination data from certain patients rather than all. There are two novelties in estimating parameters in the proposed model. First, unlike the existed ordinal logistic regression model which only models a single stage, a multistage model is built by maximizing the joint likelihood function for all samples in all stages. Second, considering that the data in different stages are nested in a cumulative way, it is assumed that the coefficients for common features in different stages are invariant. Compared with the baseline approach that models each stage individually and independently, the proposed multistage model with common coefficients assumption has significant advantages. It reduces the number of variables to estimate significantly, improves the computational efficiency, and makes the doctors feel intuitive by assuming that newly added features will not affect the weights of existed ones. In a simulation study, the relative efficiency of the proposed method with regards to the baseline approach is 162% to 1,938%, proving its efficiency and effectiveness soundly. Then, in a real case study, the proposed method estimates all parameters very efficiently and reasonably.

 

In Chapter 4, a simple yet very effective unsupervised image segmentation method, called RG-filter, is proposed to segment engineering images with no significant contrast between foreground and background for a material testing application. With the challenge of limited data size, imperfect data quality, unreachable binary true label, we developed the RG-filter which thresholding the pixels according to the relative magnitude of the R channel and G channel of the RGB image. To test the performance of the existed image segmentation and proposed algorithm on our CFRP image data, we conducted a series of experiments over an example specimen. Comparing all the pixel labeling results, the proposed RG-filter outperforms the others to be the most recommended one. In addition, it is super intuitive and efficient in computation. The proposed RG-filter can help to analyze the failure mode distribution and proportion on the surface of composite material after destructive DCB testing. The result can help engineers better understand the weak link during the bonding of composite materials, which may provide guidance on how to improve the joining of structures during aircraft maintenance. Also, if we can predict it from other variables, the destructive DCB testing can be avoided, a lot of time and money can be saved.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:06/11/2021
  • Modified By:Tatianna Richardson
  • Modified:06/11/2021

Categories

Keywords