<node id="122601">
  <nid>122601</nid>
  <type>event</type>
  <uid>
    <user id="1"><![CDATA[1]]></user>
  </uid>
  <created>1333708642</created>
  <changed>1475891921</changed>
  <title><![CDATA[Ph.D. Defense of Dissertation:  Dongryeol Lee]]></title>
  <body><![CDATA[<p>Ph.D. Defense of Dissertation Announcement<br />------------------------------------------------------------------<br />Dongryeol Lee<br />School of Computational Science and Engineering<br />College of Computing<br />Georgia Institute of Technology<br /><a href="mailto:dongryel@cc.gatech.edu">dongryel@cc.gatech.edu</a><br /><br />Title: <strong>A Distributed Kernel Summation Framework for Machine Learning</strong><br /><br />Date: Friday, May 4, 2012<br />Time: 10 AM - 12 PM EST<br />Location: KACB 1212<br /><br /><strong>Committee:</strong></p><ul><li>Professor Alexander Gray (Advisor, School of Computational Science and Engineering, Georgia Tech)</li><li>Professor Edmond Chow (School of Computational Science and Engineering, Georgia Tech)</li><li>Professor Christos Faloutsos (School of Computer Science, Carnegie Mellon University, Georgia Tech)</li><li>Professor Haesun Park (School of Computational Science and Engineering, Georgia Tech)</li><li>Professor Richard Vuduc (School of Computational Science and Engineering, Georgia Tech)</li></ul><p><br /><strong>Abstract:</strong><br />The class of computational problems I consider in my thesis share the common trait of requiring consideration of pairs (or higher-order tuples) of data points. For problems modeling pairwise interactions, we consider accelerating the operations on N by N matrices of the form: $K = { k(x_i, xj )}_{i,j}$ where k(•, •) is the function that outputs a real value given $x_i$ and $x_j$ from the data set. I focus on the problem of kernel summation operations ubiquitous in many data mining and scientific algorithms.<br /><br />In machine learning, kernel summations appear in popular kernel methods which can model <br />nonlinear structures in data. Kernel methods include many non-parametric methods such as <br />kernel density estimation, kernel regression, Gaussian process regression, kernel PCA, and <br />kernel support vector machines (SVM). In computational physics, the kernel summation appears as the classical N -body problem for simulating positions of a set of celestial bodies or atoms.<br /><br />My thesis attempts to marry, for the first time, the best relevant techniques in parallel computing, where kernel summations are in low dimensions, with the best general-dimension algorithms from the machine learning literature. We provide a unified, efficient parallel kernel summation framework that can utilize:</p><ol><li>Various types of deterministic and probabilistic approximations that may be suitable for both low and high-dimensional problems with a large number of data points.</li><li>Indexing the data using any multi-dimensional binary tree with both distributed memory (MPI) and shared memory (OpenMP/Intel TBB) parallelism.</li><li>A dynamic load balancing scheme to adjust work imbalances during the computation.</li></ol><p>I will first summarize my previous research in serial kernel summation algorithms. This work started from Greengard/Rokhlin's earlier work on fast multipole methods for the purpose of approximating potential sums of many particles. The contributions of this part of my thesis include the followings: <br />(1) reinterpretation of Greengard/Rokhlin's work for the computer science community; (2) the <br />extension of the algorithms to use a larger class of approximation strategies, i.e. probabilistic error bounds via Monte Carlo techniques; (3) the multibody series expansion: the generalization of the theory of fast multipole methods to handle interactions of more than two entities; (4) the first $O(N)$ proof of the batch approximate kernel summation using a notion of intrinsic dimensionality. <br />Then I move onto the problem of parallelization of the kernel summations.<br /><br />The artifact of this thesis has contributed to an open-source machine learning package called<br />MLPACK which has been first demonstrated at the NIPS 2008 and subsequently<br />at the NIPS 2011 Big Learning Workshop. Completing a portion of this thesis involved utilization of high performance computing resource at XSEDE (eXtreme Science and Engineering Discovery Environment) and NERSC (National Energy Research Scientific Computing Center).</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[A Distributed Kernel Summation Framework for Machine Learning]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2012-05-04T15:00:00-04:00]]></value>
      <value2><![CDATA[2012-05-04T15:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[<p><a href="mailto:dongryel@cc.gatech.edu">Dongryeol Lee</a></p>]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>47223</item>
          <item>50877</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[College of Computing]]></item>
          <item><![CDATA[School of Computational Science and Engineering]]></item>
      </og_groups_both>
  <field_categories>
      </field_categories>
  <field_keywords>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
