ISyE Statistics Seminar - Pragya Sur

Event Details
  • Date/Time:
    • Monday October 15, 2018
      2:00 pm - 3:00 pm
  • Location: Groseclose Room 402
  • Phone:
  • URL: ISyE Building Complex
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: A modern maximum-likelihood approach for high-dimensional logistic regression

Full Summary: Abstract: Logistic regression is arguably the most widely used and studied non-linear model in statistics. Classical maximum likelihood theory provides asymptotic distributions for the maximum likelihood estimate (MLE) and the likelihood ratio test (LRT), which are universally used for inference. Our findings reveal, however, when the number of features p and the sample size n both diverge, with the ratio p/n converging to a positive constant, classical results are far from accurate. For a certain class of logistic models, we observe, (1) the MLE is biased, (2) variability of the MLE is much higher than classical results and (3) the LRT is not distributed as a Chi-Squared. We develop a new theory that quantifies the asymptotic bias and variance of the MLE, and characterizes asymptotic distribution of the LRT under certain assumptions on the distribution of the covariates. Empirical results demonstrate that our predictions are extremely accurate in finite samples. These novel predictions depend on the underlying regression coefficients through a single scalar, the overall signal strength, which can be estimated efficiently. This is based on joint work with Emmanuel Candes and Yuxin Chen.

Title: A modern maximum-likelihood approach for high-dimensional logistic regression 

 

Abstract: Logistic regression is arguably the most widely used and studied non-linear model in statistics. Classical maximum likelihood theory provides asymptotic distributions for the maximum likelihood estimate (MLE) and the likelihood ratio test (LRT), which are universally used for inference. Our findings reveal, however, when the number of features p and the sample size n both diverge, with the ratio p/n converging to a positive constant, classical results are far from accurate. For a certain class of logistic models, we observe, (1) the MLE is biased, (2) variability of the MLE is much higher than classical results and (3) the LRT is not distributed as a Chi-Squared. We develop a new theory that quantifies the asymptotic bias and variance of the MLE, and characterizes asymptotic distribution of the LRT under certain assumptions on the distribution of the covariates. Empirical results demonstrate that our predictions are extremely accurate in finite samples. These novel predictions depend on the underlying regression coefficients through a single scalar, the overall signal strength, which can be estimated efficiently. This is based on joint work with Emmanuel Candes and Yuxin Chen.

 

My bio: 

I am a fifth year Ph.D. candidate in the Dept. of Statistics, Stanford University, advised by Prof. Emmanuel Candes. During my Ph.D. I have specialized on the area of high-dimensional statistical inference. 

Additional Information

In Campus Calendar
Yes
Groups

H. Milton Stewart School of Industrial and Systems Engineering (ISYE)

Invited Audience
Faculty/Staff, Public, Graduate students, Undergraduate students
Categories
Seminar/Lecture/Colloquium
Keywords
asymptotics, non-linear model
Status
  • Created By: sbryantturner3
  • Workflow Status: Published
  • Created On: Oct 4, 2018 - 2:47pm
  • Last Updated: Oct 10, 2018 - 2:32pm