event

PhD Defense by Yuanshuo (David) Zhao

Primary tabs

Title: Statistical modeling and experimental design with contributions in environment, health care, and e-commerce.

 

Advisors: Dr. C. F. Jeff Wu, Dr. Benjamin Haaland

 

Committee Members

Dr. Roshan Vengazhiyil (ISyE)

Dr. Yajun Mei (ISyE)

Dr. Yu Jeffrey Hu (Scheller College of Business)

 

Date and Time: Friday, March 1st, 12:30 pm

 

Location: ISyE main building 228

 

Abstract: Design of experiment and statistical modeling have played an increasingly important role in science and business and received enormous attention from industries and research institutes. Motivated from real-world examples, this dissertation develops new statistical methodologies in the field of experimental design and causality inferences. First two chapters of this dissertation focus on online experimental design. E-commerce companies like Linkedin and Amazon perform hundreds of experiments each day, with the goal of testing certain website functions and design in order to best serve customers and maximize profits. New experiment design and testing scheme based on multi-armed bandit and conditional main-effect have been developed to let companies run experiment more efficiently.  In chapter three, we develop a new statistical model based on combining information from physical experiment and computer experiment. The new method has been applied to model the Solar Irradiance data in the U.S. that were provided by IBM. Chapter four extends the linear G-formula method in the field of causality inference to non-linear set-up to study the causality relationship between physical activity level and health outcomes. 

 

In e-commerce companies, a key step for revenue optimization is designing a website which maximizes conversion rates. This is achieved by first running many conversion experiments on different website settings (i.e., with different combinations of design factors), then using this data to pick an optimal website setting. In real-world scenarios, there are oftentimes many factors of interest, resulting in a large website design space. For such problems, only a small fraction of websites can be run in each experiment round due to budget constraints. This poses a problem for traditional multi-armed bandit methods, which typically assume all website settings (arms) are tested in each experiment round. To address this so-called "arm budget constraint", in chapter 1, we propose a new method called Active Arm SElection using Thompson Sampling (AASETS), which performs active arm selection and traffic allocation in an online setting, under a fixed budget of arms in each experiment round. The key novelty of AASETS is the use of a low-order interaction model to learn dependencies between arms on the factorial design space. This model allows an experimenter to (i) adaptively add good arms and remove bad arms from experimentation, and (ii) leverage conversion data over all arms for effective traffic allocation. We show that AASETS outperforms several industry benchmark methods by a large margin under arm budget constraints, both in simulated examples and a real-world problem. 

 

Chapter 2 proposed a new statistical testing method based on conditional main-effect for conversion rate optimization. Conversion rate optimization has become more important because of the rapid growth of e-commerce revenue. Traditional conversion rate optimization, including AB testing and multivariate testing, tends to isolate factors and treat them the same regardless of their positions in the web system. In this chapter, we will discuss a new framework, called Conditional main-effect based funnel testing, where factor’s effects and level settings are analyzed and optimized based on their position on the webpage. We called the new approach CFO: Conditional effect based Funnel testing for conversion rate optimization. The new approach has better interpretability of the factorial effect and achieves better result in conversion rate optimization. 

 

The Gaussian process is a standard tool for building emulators for computer experiments. However, due to its lack of ability to model large-scale and non-stationary data, Gaussian process is greatly limited in practice. In chapter 3, We provide a new approach to approximate emulation of large computer experiments. By taking advantage of the learning ability and strong tolerance to input noise of radial basis function, we derive a sequential learning scheme that dynamically optimizes the basis function's location, scale, and coefficient. L-1 penalty is utilized to ensure our emulator's simplicity. We applied our method to study solar irradiance computer model and physical measurements data. We demonstrate that the proposed model enjoy marked advantage over existing emulation tools in both emulation accuracy and data capability in terms of non-nationality and sample size. The final predictor based on combining physical measurement data and computer experiment data is used to forecast the solar irradiance level in the U.S.  

 

TRIPPA (trial of economic incentives to promote physical activity) was a four-arm, 6 month randomized controlled trial with a 6-month post-intervention follow-up period, conducted in 13 organizations spanning industries and sectors of government, to investigate the effects of an activity tracker, with or without cash or charitable incentives, on physical activity and health outcomes among full-time workers in Singapore. In chapter 4, we conduct a follow-up study of TRIPPA to assess the causal effects of physical activity levels on health outcomes, including systolic blood pressure (SBP), BMI, VO2MAX and quality-of-life. We extended the original g-formula framework that deals with time-varying confounding to include non-linear models, which allows us to use statistical models that are more robust compared to linear models. 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:02/19/2019
  • Modified By:Tatianna Richardson
  • Modified:02/19/2019

Categories

Keywords