{"122601":{"#nid":"122601","#data":{"type":"event","title":"Ph.D. Defense of Dissertation:  Dongryeol Lee","body":[{"value":"\u003Cp\u003EPh.D. Defense of Dissertation Announcement\u003Cbr \/\u003E------------------------------------------------------------------\u003Cbr \/\u003EDongryeol Lee\u003Cbr \/\u003ESchool of Computational Science and Engineering\u003Cbr \/\u003ECollege of Computing\u003Cbr \/\u003EGeorgia Institute of Technology\u003Cbr \/\u003E\u003Ca href=\u0022mailto:dongryel@cc.gatech.edu\u0022\u003Edongryel@cc.gatech.edu\u003C\/a\u003E\u003Cbr \/\u003E\u003Cbr \/\u003ETitle: \u003Cstrong\u003EA Distributed Kernel Summation Framework for Machine Learning\u003C\/strong\u003E\u003Cbr \/\u003E\u003Cbr \/\u003EDate: Friday, May 4, 2012\u003Cbr \/\u003ETime: 10 AM - 12 PM EST\u003Cbr \/\u003ELocation: KACB 1212\u003Cbr \/\u003E\u003Cbr \/\u003E\u003Cstrong\u003ECommittee:\u003C\/strong\u003E\u003C\/p\u003E\u003Cul\u003E\u003Cli\u003EProfessor Alexander Gray (Advisor, School of Computational Science and Engineering, Georgia Tech)\u003C\/li\u003E\u003Cli\u003EProfessor Edmond Chow (School of Computational Science and Engineering, Georgia Tech)\u003C\/li\u003E\u003Cli\u003EProfessor Christos Faloutsos (School of Computer Science, Carnegie Mellon University, Georgia Tech)\u003C\/li\u003E\u003Cli\u003EProfessor Haesun Park (School of Computational Science and Engineering, Georgia Tech)\u003C\/li\u003E\u003Cli\u003EProfessor Richard Vuduc (School of Computational Science and Engineering, Georgia Tech)\u003C\/li\u003E\u003C\/ul\u003E\u003Cp\u003E\u003Cbr \/\u003E\u003Cstrong\u003EAbstract:\u003C\/strong\u003E\u003Cbr \/\u003EThe class of computational problems I consider in my thesis share the common trait of requiring consideration of pairs (or higher-order tuples) of data points. For problems modeling pairwise interactions, we consider accelerating the operations on N by N matrices of the form: $K = { k(x_i, xj )}_{i,j}$ where k(\u2022, \u2022) is the function that outputs a real value given $x_i$ and $x_j$ from the data set. I focus on the problem of kernel summation operations ubiquitous in many data mining and scientific algorithms.\u003Cbr \/\u003E\u003Cbr \/\u003EIn machine learning, kernel summations appear in popular kernel methods which can model \u003Cbr \/\u003Enonlinear structures in data. Kernel methods include many non-parametric methods such as \u003Cbr \/\u003Ekernel density estimation, kernel regression, Gaussian process regression, kernel PCA, and \u003Cbr \/\u003Ekernel support vector machines (SVM). In computational physics, the kernel summation appears as the classical N -body problem for simulating positions of a set of celestial bodies or atoms.\u003Cbr \/\u003E\u003Cbr \/\u003EMy thesis attempts to marry, for the first time, the best relevant techniques in parallel computing, where kernel summations are in low dimensions, with the best general-dimension algorithms from the machine learning literature. We provide a unified, efficient parallel kernel summation framework that can utilize:\u003C\/p\u003E\u003Col\u003E\u003Cli\u003EVarious types of deterministic and probabilistic approximations that may be suitable for both low and high-dimensional problems with a large number of data points.\u003C\/li\u003E\u003Cli\u003EIndexing the data using any multi-dimensional binary tree with both distributed memory (MPI) and shared memory (OpenMP\/Intel TBB) parallelism.\u003C\/li\u003E\u003Cli\u003EA dynamic load balancing scheme to adjust work imbalances during the computation.\u003C\/li\u003E\u003C\/ol\u003E\u003Cp\u003EI will first summarize my previous research in serial kernel summation algorithms. This work started from Greengard\/Rokhlin\u0027s earlier work on fast multipole methods for the purpose of approximating potential sums of many particles. The contributions of this part of my thesis include the followings: \u003Cbr \/\u003E(1) reinterpretation of Greengard\/Rokhlin\u0027s work for the computer science community; (2) the \u003Cbr \/\u003Eextension of the algorithms to use a larger class of approximation strategies, i.e. probabilistic error bounds via Monte Carlo techniques; (3) the multibody series expansion: the generalization of the theory of fast multipole methods to handle interactions of more than two entities; (4) the first $O(N)$ proof of the batch approximate kernel summation using a notion of intrinsic dimensionality. \u003Cbr \/\u003EThen I move onto the problem of parallelization of the kernel summations.\u003Cbr \/\u003E\u003Cbr \/\u003EThe artifact of this thesis has contributed to an open-source machine learning package called\u003Cbr \/\u003EMLPACK which has been first demonstrated at the NIPS 2008 and subsequently\u003Cbr \/\u003Eat the NIPS 2011 Big Learning Workshop. Completing a portion of this thesis involved utilization of high performance computing resource at XSEDE (eXtreme Science and Engineering Discovery Environment) and NERSC (National Energy Research Scientific Computing Center).\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"A Distributed Kernel Summation Framework for Machine Learning"}],"uid":"1","created_gmt":"2012-04-06 10:37:22","changed_gmt":"2016-10-08 01:58:41","author":"Jupiter","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2012-05-04T15:00:00-04:00","event_time_end":"2012-05-04T15:00:00-04:00","event_time_end_last":"2012-05-04T15:00:00-04:00","gmt_time_start":"2012-05-04 19:00:00","gmt_time_end":"2012-05-04 19:00:00","gmt_time_end_last":"2012-05-04 19:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"47223","name":"College of Computing"},{"id":"50877","name":"School of Computational Science and Engineering"}],"categories":[],"keywords":[],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003E\u003Ca href=\u0022mailto:dongryel@cc.gatech.edu\u0022\u003EDongryeol Lee\u003C\/a\u003E\u003C\/p\u003E","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}