{"65151":{"#nid":"65151","#data":{"type":"event","title":"UPS Delivers Optimal Phase Diagram for High Dimensional Variable  Selection","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETITLE:\u0026nbsp;\u0026nbsp; \u003C\/strong\u003EUPS\nDelivers Optimal Phase Diagram for High Dimensional Variable Selection\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ESPEAKER:\u003C\/strong\u003E\u0026nbsp; Jiashun Jin\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EABSTRACT: \u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EConsider a linear\u0026nbsp; regression model\u0026nbsp; \\begin{equation*}\u003Cbr \/\u003EY =\u0026nbsp; X \\beta + z, \\qquad z \\sim N(0, I_n),\u0026nbsp;\u0026nbsp; \\qquad X = X_{n, p},\u003Cbr \/\u003E\\end{equation*} where both $p$ and $n$ are large but $p \u0026gt;\u0026nbsp; n$.\u0026nbsp;\u0026nbsp; The vector $\\beta$ is unknown but is\u0026nbsp; sparse\u0026nbsp; in the sense that only a small proportion of\u0026nbsp;\u0026nbsp; its coordinates is\u0026nbsp; nonzero, and we are interested in\u0026nbsp; identifying these nonzero ones.\u0026nbsp; We\n model the coordinates of $\\beta$ as\u0026nbsp; samples from a two-component \nmixture $(1 - \\eps) \\nu_0 + \\eps\u0026nbsp; {\\pi}$,\u0026nbsp;\u0026nbsp; and the rows of $X$ as\u0026nbsp; \nsamples from $N(0, \\frac{1}{n}\\Omega)$, where $\\nu_0$ is the point mass at $0$,\u0026nbsp; $\\pi$ is a\u0026nbsp; distribution, and $\\Omega$ is a $p$ by $p$ correlation matrix which is unknown but is presumably sparse.\u003Cbr \/\u003E\u003Cbr \/\u003EWe\n propose a two-stage variable selection procedure which we call the {\\it\n UPS}.\u0026nbsp;\u0026nbsp; This\u0026nbsp; is a Screen and Clean procedure,\u0026nbsp; in which\u0026nbsp;\u0026nbsp; we screen \nwith the\u0026nbsp; Univariate thresholding, and clean with the Penalized MLE. In many situations,\u0026nbsp; the UPS possesses two important properties: Sure \nScreening and Separable After Screening (SAS). These properties enable \nus to reduce\u0026nbsp; the original regression problem to many small-size \nregression problems that can be fitted separately.\u0026nbsp; As a result, the UPS\n is effective both in theory and in computation.\u003Cbr \/\u003E\n\u003Cbr \/\u003EWe measure the performance of\u0026nbsp; variable selection procedure by the Hamming distance,\u0026nbsp; and use an asymptotic framework where $p \\goto \\infty$ and $(\\eps, \\pi, n, \\Omega)$ depend on $p$. We find that\u0026nbsp; in many situations, the UPS achieves the optimal rate of convergence.\u003Cbr \/\u003E\nWe also find that in the $(\\eps_p, \\pi_p)$ space, there is a\u0026nbsp; three-phase diagram shared\u0026nbsp;\u0026nbsp;\u0026nbsp; by many choices of $\\Omega$.\u0026nbsp; In the first phase, it is possible to\u0026nbsp;\u0026nbsp; recover\u0026nbsp; all\u0026nbsp;\u0026nbsp; signals. In the second phase, exact recovery is impossible, but it is possible to recover most of the signals.\u003Cbr \/\u003E\nIn the third phase, successful variable selection is impossible. The UPS partitions the phase space\u0026nbsp; in the same way that\u0026nbsp; the optimal procedures do, and recovers most of the signals\u003Cbr \/\u003Eas long as successful variable selection is possible.\u003Cbr \/\u003E\n\u003Cbr \/\u003EThe lasso and the subset selection (also known as the $L^1$- and \n$L^0$-penalization methods, respectively) are well-known approaches to \nvariable selection. However,\u003Cbr \/\u003Esomewhat surprisingly, there are regions\n in the phase space where\u0026nbsp; neither the lasso nor the subset selection is\n rate optimal, even\u0026nbsp; for very simple $\\Omega$. The lasso is non-optimal \nbecause it is too loose in\u0026nbsp; filtering out\u0026nbsp; fake signals (i.e. noise that\n is highly correlated with a signal), and the subset selection is \nnon-optimal\u0026nbsp; because it tends to kill one or more signals\u0026nbsp;\u0026nbsp; in \ncorrelated pairs, triplets, etc..\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003E\u003Cbr \/\u003E\u003C\/strong\u003E\u003C\/p\u003E\u003Ctable border=\u00220\u0022 cellspacing=\u00220\u0022 cellpadding=\u00220\u0022 width=\u0022563\u0022\u003E\u003Ctbody\u003E\u003Ctr\u003E\u003Ctd align=\u0022left\u0022 valign=\u0022top\u0022\u003E\n  \u003C\/td\u003E\n \u003C\/tr\u003E\n\u003C\/tbody\u003E\u003C\/table\u003E","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"UPS Delivers Optimal Phase Diagram for High Dimensional Variable  Selection"}],"uid":"27187","created_gmt":"2011-03-24 14:27:18","changed_gmt":"2016-10-08 01:54:38","author":"Anita Race","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2011-04-07T12:00:00-04:00","event_time_end":"2011-04-07T13:00:00-04:00","event_time_end_last":"2011-04-07T13:00:00-04:00","gmt_time_start":"2011-04-07 16:00:00","gmt_time_end":"2011-04-07 17:00:00","gmt_time_end_last":"2011-04-07 17:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"1242","name":"School of Industrial and Systems Engineering (ISYE)"}],"categories":[],"keywords":[],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}