PhD Defense by Zhaowei She

Primary tabs

Thesis Title: Healthcare Data Analytics for Social Good



Dr. Turgay Ayer, School of Industrial and Systems Engineering, Georgia Tech 


Committee members (ordered alphabetically):  

Dr. Atalay Atasu, Technology and Operations Management Area, INSEAD

Dr. Bilal Gokpinar, School of Management, UCL

Dr. Daniel Montanera, Seidman College of Business, GVSU

Dr. Beril Toktay, Scheller College of Business, Georgia Tech

Dr. He Wang, School of Industrial and Systems Engineering, Georgia Tech 


Date and Time: Monday, October 18, 2021, 12:00 pm (ET) 

Location: Groseclose 304 


Meeting URLhttps://bluejeans.com/358777355/8544

Meeting ID:  358777355 (BlueJeans)  ​


Abstract:  ​


Healthcare problems, ranging from soaring medical costs to the COVID-19 pandemic, present major challenges to our society. Better solutions to these problems can potentially improve the lives and livelihood of tens of millions of people. This thesis consists of three essays on using healthcare data analytics to address pressing social challenges. Specifically, the first two essays focus on evaluation and improvement of risk adjustment designs in healthcare capitation programs, while the third essay develops a machine learning algorithm to detect county-level COVID-19 outbreaks.


In Chapter 2, we analyze a market design problem in Medicare Advantage (MA), the largest risk-adjusted capitation payment program in the U.S. healthcare market. There is evidence that that MA unintentionally incentivizes health plans to cherry pick profitable patient types, which is referred to as “risk selection". The existing literature primarily attributes the observed risk selection in MA market to data limitations and low explanatory power (e.g. low R^2) of the current risk adjustment design in the MA market. With the availability of big data and advancements in machine learning (ML) techniques, it is commonly believed that risk selection due to imperfect risk adjustment is expected to gradually disappear from the MA market. To examine this belief, we construct a game-theoretical model to study this problem. Surprisingly, our study shows that big data and ML alone cannot cure risk selection in the MA capitation program. More specifically, we show that even if the current MA risk adjustment design becomes informationally perfect (e.g. R^2=1) through availability of big data and advanced ML algorithms, health plans still have incentives to conduct risk selection through strategically subsidizing some subgroups of patients using capitation payments collected from other subgroups, which we call “risk selection induced by cross subsidization".


In Chapter 3, we empirically examine the theoretical model presented in Chapter 2. Specifically, we are interested in the following two empirical questions. First, can cross subsidization practice in MA be empirically identified? Second, is there an association between cross subsidization practice and the risk selection problem in MA? To answer these questions, we gain access to a large commercial insurance database containing claims from more than 2 million MA enrollees. By exploiting an exogenous policy shock on MA capitation payments through a Difference-in-Difference (DID) design, we identify, the first time in the literature, this reverse cross subsidization practice in MA. Furthermore, we show that the reverse cross subsidization practice is associated with the risk selection problem in MA, where low-risk patients are more likely to enroll in MA compared to the high-risk patients.


In Chapter 4, we develop a machine learning model to detect county-level COVID-19 outbreaks. Specifically, we resolve a practical challenge in outbreak detection to balance the speed and accuracy tradeoff of the detection. In particular, while estimation accuracy improves with longer

fitting windows, speed degrades. This paper presents a machine learning framework to balance this tradeoff using generalized random forests (GRF) and applies it to detect county level COVID-19 outbreaks. This algorithm chooses an adaptive fitting window size for each county based on relevant features affecting the disease spread, such as changes in social distancing policies. Experiment results show that our method outperforms any non-adaptive window size choices in 7-day ahead COVID-19 outbreak case number predictions.


  • Workflow Status: Published
  • Created By: Tatianna Richardson
  • Created: 10/04/2021
  • Modified By: Tatianna Richardson
  • Modified: 10/04/2021