**TITLE: **UPS
Delivers Optimal Phase Diagram for High Dimensional Variable Selection

**SPEAKER:** Jiashun Jin

**ABSTRACT: **

Consider a linear regression model \begin{equation*}

Y = X \beta + z, \qquad z \sim N(0, I_n), \qquad X = X_{n, p},

\end{equation*} where both $p$ and $n$ are large but $p > n$. The vector $\beta$ is unknown but is sparse in the sense that only a small proportion of its coordinates is nonzero, and we are interested in identifying these nonzero ones. We
model the coordinates of $\beta$ as samples from a two-component
mixture $(1 - \eps) \nu_0 + \eps {\pi}$, and the rows of $X$ as
samples from $N(0, \frac{1}{n}\Omega)$, where $\nu_0$ is the point mass at $0$, $\pi$ is a distribution, and $\Omega$ is a $p$ by $p$ correlation matrix which is unknown but is presumably sparse.

We
propose a two-stage variable selection procedure which we call the {\it
UPS}. This is a Screen and Clean procedure, in which we screen
with the Univariate thresholding, and clean with the Penalized MLE. In many situations, the UPS possesses two important properties: Sure
Screening and Separable After Screening (SAS). These properties enable
us to reduce the original regression problem to many small-size
regression problems that can be fitted separately. As a result, the UPS
is effective both in theory and in computation.

We measure the performance of variable selection procedure by the Hamming distance, and use an asymptotic framework where $p \goto \infty$ and $(\eps, \pi, n, \Omega)$ depend on $p$. We find that in many situations, the UPS achieves the optimal rate of convergence.

We also find that in the $(\eps_p, \pi_p)$ space, there is a three-phase diagram shared by many choices of $\Omega$. In the first phase, it is possible to recover all signals. In the second phase, exact recovery is impossible, but it is possible to recover most of the signals.

In the third phase, successful variable selection is impossible. The UPS partitions the phase space in the same way that the optimal procedures do, and recovers most of the signals

as long as successful variable selection is possible.

The lasso and the subset selection (also known as the $L^1$- and
$L^0$-penalization methods, respectively) are well-known approaches to
variable selection. However,

somewhat surprisingly, there are regions
in the phase space where neither the lasso nor the subset selection is
rate optimal, even for very simple $\Omega$. The lasso is non-optimal
because it is too loose in filtering out fake signals (i.e. noise that
is highly correlated with a signal), and the subset selection is
non-optimal because it tends to kill one or more signals in
correlated pairs, triplets, etc..