{"390781":{"#nid":"390781","#data":{"type":"event","title":"CSIP Seminar","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ESpeaker:\u003C\/strong\u003E Ryan Curtin (CSIP)\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ETitle:\u003C\/strong\u003E\u0026nbsp;\u003Cem\u003EDual-tree k-means with bounded single-iteration runtime\u003C\/em\u003E\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EAbstract:\u003Cbr \/\u003E\u003C\/strong\u003Ek-means is a widely used clustering algorithm, but for k clusters and a dataset size of N, each iteration of Lloyd\u0027s algorithm costs O(kN) time.\u0026nbsp; Although there are tree-based techniques to accelerate single Lloyd iterations, none of these techniques are tailored to the case of large k, which is increasingly common as dataset sizes grow. We propose a dual-tree algorithm that gives the exact result of Lloyd iterations; when using cover trees, we are able to bound the worst-case runtime of each algorithm as O(N + k log k) (this bound also depends on dataset-dependent quantities). To our knowledge these are the first sub-O(kN) bounds for exact Lloyd iterations. We then show that these theoretically favorable algorithms significantly outperform other approaches in practice, especially for large N and k.\u003Cstrong\u003E\u003Cbr \/\u003E\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ESpeaker Bio:\u003C\/strong\u003E\u003Cbr \/\u003ERyan Curtin is a nearly-finished Ph.D. student in the School of Electrical and Computer Engineering at Georgia Tech; his research focuses on the acceleration of core primitives that make up machine learning algorithms, generally via the use of trees and other hierarchical structures.\u0026nbsp; He is also the primary developer and maintainer of the mlpack machine learning library (\u003Ca href=\u0022http:\/\/www.mlpack.org\/\u0022\u003Ehttp:\/\/www.mlpack.org\/\u003C\/a\u003E).\u0026nbsp; He doesn\u0027t enjoy writing biographies, and also is a pinball wizard.\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Dual-tree k-means with bounded single-iteration runtime"}],"uid":"27842","created_gmt":"2015-03-26 09:07:48","changed_gmt":"2017-04-13 21:19:39","author":"Ashlee Gardner","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2015-04-03T16:00:00-04:00","event_time_end":"2015-04-03T17:00:00-04:00","event_time_end_last":"2015-04-03T17:00:00-04:00","gmt_time_start":"2015-04-03 20:00:00","gmt_time_end":"2015-04-03 21:00:00","gmt_time_end_last":"2015-04-03 21:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"1255","name":"School of Electrical and Computer Engineering"}],"categories":[],"keywords":[],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1795","name":"Seminar\/Lecture\/Colloquium"}],"invited_audience":[{"id":"78751","name":"Undergraduate students"},{"id":"78761","name":"Faculty\/Staff"},{"id":"174045","name":"Graduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EAndrew Massimino\u003C\/p\u003E\u003Cp\u003E\u003Ca href=\u0022mailto:massimino@gatech.edu\u0022\u003Emassimino@gatech.edu\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}