**Title:** Contributions to the Nonparametric Methods for Computer Experiments

**Advisors:** Dr. C. F. Jeff Wu, Dr. Benjamin Haaland, Dr. Rui Tuo

**Committee Members:**

Dr. Roshan Vengazhiyil (ISyE)

Dr. Jianjun Shi (ISyE)

Dr. Matthew Plumlee (Department of Industrial Engineering and Management Sciences, Northwestern University)

**Date and Time:** Thursday, May 3rd, 11:00 AM

**Location:** Groseclose 226A

**Abstract:**

Kriging, or Gaussian process modeling, is widely used in estimating unknown functions based on the (noisy) evaluations. Originally, kriging was introduced in geostatistics by Matheron (1963) and has seen revived interest in the areas of spatial statistics (Cressie, 2015; Diggle, 2013), computer experiments (Fang et al., 2005; Santner et al., 2003) and machine learning (Rasmussen and Williams, 2006; Witten et al., 2016).

The main idea of kriging is to assume the underlying function is a realization of a Gaussian random field. The accuracy of kriging, or more generally, nonparametric regression, depends very strongly on the manner in which data is collected (Staum, 2009; Haaland et al., 2011, 2018) and the properties of the underlying function, especially the smoothness of the underlying function. This dissertation addresses three important problems related to: (i) What type of data collection might be expected to enable one to build an accurate model; (ii) Based on a high-quality design, what is the accuracy of the model; and (iii) Can we construct estimators that achieve the optimal convergence rate without knowing the true smoothness in advance.

In Chapter 1 we consider the first problem: What type of data collection might be expected to enable one to build an accurate model. This problem is known as *computer experimental design* in the field of computer experiments. In many situations actual physical experimentation is difficult or impossible, so scientists and engineers use simulations, or *computer experiments,* to study a system of interest. Many simulations are stochastic in the sense that repeated runs with the same input configuration will result in different outputs. For expensive or time-consuming simulations, stochastic kriging (Ankenman et al., 2010) is commonly used to generate predictions for simulation model outputs subject to uncertainty due to both function approximation and stochastic variation. In this chapter, we develop and justify a few guidelines for experimental design, which ensure accuracy of stochastic kriging emulators. We decompose error in stochastic kriging predictions into nominal, numeric, parameter estimation and parameter estimation numeric components and provide means to control each in terms of properties of the underlying experimental design. The design properties implied for each source of error are weakly conflicting and broad principles are proposed. In brief, the space-filling properties ``small fill distance" and ``large separation distance" should balance with replication at distinct input configurations, with number of replications depending on the relative magnitudes of stochastic and process variability. Non-stationarity implies higher input density in more active regions, while regression functions imply a balance with traditional design properties. A few examples are presented to illustrate the results.

In Chapter 2 we derive error bounds of the (simple) kriging predictor under a uniform metric. The kriging method has pointwise predictive distributions which are computationally simple. However, in many applications one would like to predict for a range of untried points simultaneously. In this chapter we introduce some error bounds for the (simple) kriging predictor under the uniform metric. The predictive error is bounded in terms of the maximum pointwise predictive variance of kriging, which can be further bounded with the fill distance of the design set. It works for a scattered set of input points in an arbitrary dimension, and also covers the case where the covariance function of the Gaussian process is misspecified. These results lead to a better understanding of the rate of convergence of kriging under the Gaussian or the Matérn correlation functions, the relationship between space-filling designs and the accuracy of kriging models, and the robustness of the Matérn correlation functions.

In Chapter 3 we consider identifying the smoothness of an underlying function, by employing maximum likelihood estimation for the Gaussian process model. The function estimator based on the smoothness estimator is also constructed in this chapter. This maximum likelihood approach is widely used in estimating the smoothness parameter in practice, but theoretical studies are lacking. We propose a modified maximum likelihood method to estimate the underlying function as well as its smoothness based on noisy evaluations. We prove the consistency of the proposed smoothness estimator and that the function estimator achieves a nearly optimal rate of convergence for all degrees of smoothness.