Title: A modern maximum-likelihood approach for high-dimensional logistic regression

Abstract: Logistic regression is arguably the most widely used and studied non-linear model in statistics. Classical maximum likelihood theory provides asymptotic distributions for the maximum likelihood estimate (MLE) and the likelihood ratio test (LRT), which are universally used for inference. Our findings reveal, however, when the number of features p and the sample size n both diverge, with the ratio p/n converging to a positive constant, classical results are far from accurate. For a certain class of logistic models, we observe, (1) the MLE is biased, (2) variability of the MLE is much higher than classical results and (3) the LRT is not distributed as a Chi-Squared. We develop a new theory that quantifies the asymptotic bias and variance of the MLE, and characterizes asymptotic distribution of the LRT under certain assumptions on the distribution of the covariates. Empirical results demonstrate that our predictions are extremely accurate in finite samples. These novel predictions depend on the underlying regression coefficients through a single scalar, the overall signal strength, which can be estimated efficiently. This is based on joint work with Emmanuel Candes and Yuxin Chen.

My bio:

I am a fifth year Ph.D. candidate in the Dept. of Statistics, Stanford University, advised by Prof. Emmanuel Candes. During my Ph.D. I have specialized on the area of high-dimensional statistical inference.