TITLE: Propensity Score Estimation with Boosted Regression
SPEAKER: Beth Ann Griffin
The theory of propensity score analysis (PSA) is elegant. Provided strong ignorability holds, the single propensity score is all that is required to control for pretreatment differences between two treatment groups or a treatment and a control group. In practice, use of propensity scores is more complicated because the propensity score function and its functional form are unknown and must be estimated from the data. Logistic regression has been the standard approach to estimating propensity scores. In this talk, I will demonstrate the use of the generalized boosting model (GBM) as an alternative to logistic regression for estimating propensity scores. GBM is a machine learning approach used primarily for predicting dichotomous outcomes. It combines many simple regression trees to provide a smooth and flexible propensity score model. It automatically conducts variable and feature selection as part of its iterative estimation procedure. Tools for implementing these methods are available in R, SAS, and Stata. I will also summarize recent comparisons of GBM to alternative methods for propensity score estimation, namely the covariate balance propensity scores (CBPS) estimation methods of Imai and Ratkovic (2014). I will contrast the methods in terms of covariate balance, and the bias and mean squared error of the treatment effects estimated by propensity score weighting. CBPS generally outperforms GBM in terms of covariate balance and bias in the absence of the need of non-linear transformation of the covariates. However, we find that in terms of mean squared error, GBM appears to be advantageous in the commonly encountered situation of propensity score model building in the presence of many candidate confounders, some of which may not actually be related to the outcomes of interest.