**Title**: Predictive Analytics for Complex Engineering Systems Using High-Dimensional Signals

**Advisors**: Dr. Nagi Gebraeel, Dr. Kamran Paynabar

**Committee Members**:

Dr. Jianjun Shi

Dr. Tuo Zhao

Dr. Edmond Chow (School of Computational Science and Engineering)

**Date and Time**: Monday, April 2nd, 11:00 AM

**Location**: ISyE Main 126

**Abstract**:

Industrial predictive analytics has multiple facets that range from failure predictability and optimal asset management to high-level operational and managerial insights. Failure predictability and asset management can have far-reaching implications ranging from significant economic losses to endangering human life. As a result, modern capital-intensive assets are equipped with numerous sensors to monitor their health condition. These sensors often generate massive amounts of complex-structured high-dimensional signals that pose significant analytical, computational, and scalability challenges. To tackle these challenges, this thesis presents new predictive analytics methodologies that extract information from massive and complex-structured data with the goal of predicting (in real-time) the future state-of-health of complex engineering systems.

Chapter I of the thesis introduces a multi-stream sensor fusion-based prognostics model for systems with single failure modes. This chapter considers multi-sensor applications and proposes a systematic methodology to reduce the amount of data required to predict the remaining operational lifetime of an asset by identifying the key sensors that are most correlated with the underlying physical degradation process. This is achieved by developing a penalized (log)-location-scale (LLS) functional regression model, which integrates LLS functional regression and group nonnegative garrote (GNNG). The LLS functional model regresses degradation trajectories against time-to-failures, and the coefficient functions are penalized using GNNG. To address the model estimation challenge, functional principal component analysis (FPCA) is employed to transform the penalized LLS functional regression to penalized LLS regression. The transformed model is then solved using penalized maximum likelihood estimation and informative sensors are selected. The informative sensors are then fused utilizing multivariate FPCA to predict remaining operational lifetimes. Using multivariate sensor data from an aircraft turbofan engine consisting of 21 sensors, we were able to achieve higher prediction accuracy using 4 sensors selected by our approach relative to the original 21 sensors.

Chapter II presents a scalable prognostic models for large-scale condition monitoring applications. This chapter focuses on computational scalability of the functional data analysis-based prognostic framework, which utilizes multivariate FPCA to fuse the multi-stream high-dimensional degradation signals and then uses the resulting features to predict the time-to-failure. Classic multivariate FPCA typically involves some form of decomposition or factorization of a matrix (or covariance matrix) constructed from multi-stream signals. Such decomposition/factorization is often computationally infeasible since the matrix is usually extremely large given the large size and high dimensionality of the data. The paper addresses this challenge by integrating randomized low-rank matrix approximation with multivariate FPCA computations. Randomized low-rank matrix approximation computes the leading singular values and vectors of the signal matrix via randomized sampling. This is achieved by first computing an approximation to the range (also known as column space) of a matrix via randomized sampling. The signal matrix is then projected to the approximated range and a factorization of the resulting low-rank matrix is computed. Using a numerical study, we show that the computational time for predicting remaining lifetime distribution of 100 units with 1,000 sensors per unit using best-in class models required 24 minutes compared to 10 seconds using the proposed approach (without loss in accuracy).

Chapter III proposes a robust prognostic model for poor data quality settings. Most industrial predictive analytics are based on the premise that sensor data are observed and collected continuously with no interruptions. In reality, industrial assets operate in harsh environments that generate errors in data acquisition, communication, read/write operations, etc., which results in poor-quality data. This chapter develops two algorithms that utilized matrix completion to address the missing data challenge for multi-sensor applications. Matrix completion focuses on finding the lowest-rank matrix that best matches the observed entries of the original signal matrix (with missing data) by solving a nuclear norm-based optimization criterion. The first algorithm, the subspace detection method, uses matrix completion techniques to compute a set of basis that spans the column space of the original signal matrix. The basis are then utilized by a novel-developed algorithm to extract signal features. The second algorithm, the signal recovery method, involves two steps, conventional matrix completion followed by feature extraction. Matrix completion techniques are employed to recover the missing degradation data of each sensor individually. Recovered signals are then utilized to extract signal features via a newly-developed incremental SVD algorithm, which significantly helps reduce the computational complexity and memory requirement. The proposed methodologies were evaluated through an extensive numerical study and real-world data. The results demonstrated that the proposed approaches are robust to significant levels of missing data and can maintain reasonable prediction accuracy even if the signals are highly incomplete.

Chapter IV focuses on developing prognostic models for industrial applications involving image data. Imaging is one of the fastest growing technologies for condition monitoring of industrial assets. Compared to other sensing techniques, industrial imaging devices are generally noncontact and thus easier to use. Moreover, image data contains rich information about the object being monitored. In this chapter, we propose a new methodology to predict and update the residual useful lifetime of a system using a sequence of degradation images. The methodology integrates tensor linear algebra with traditional location-scale regression widely used in reliability and prognostics. To address the high dimensionality challenge, the degradation image streams are first projected to a low-dimensional tensor subspace that is able to preserve their information. Next, the projected image tensors are regressed against time-to-failure via penalized location-scale tensor regression. The coefficient tensor is then decomposed using CANDECOMP/PARAFAC (CP) and Tucker decompositions, which enables parameter estimation in a high-dimensional setting. Two optimization algorithms with a global convergence property are developed for model estimation. The effectiveness of our models is validated using two simulated datasets and infrared degradation image streams from a rotating machinery.