{"676890":{"#nid":"676890","#data":{"type":"event","title":"PhD Proposal by William Jonghoon Won","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle:\u003C\/strong\u003E Software-Hardware Optimizations for Efficient Collective Communications in Distributed Machine Learning Platforms\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EDate\u003C\/strong\u003E: Monday, September 23, 2024\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ETime:\u003C\/strong\u003E 9:00 AM \u2013 11:00 AM ET\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ELocation:\u003C\/strong\u003E Klaus 1212, (hybrid)\u0026nbsp;\u003Ca href=\u0022https:\/\/gatech.zoom.us\/j\/94843770067?pwd=1kRevvLZLDTxm0N59mBoW70EdL1fbw.1\u0022\u003Ehttps:\/\/gatech.zoom.us\/j\/94843770067?pwd=1kRevvLZLDTxm0N59mBoW70EdL1fbw.1\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EWilliam Jonghoon Won\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EPh.D. Student\u003C\/p\u003E\u003Cp\u003ESchool of Computer Science\u003C\/p\u003E\u003Cp\u003ECollege of Computing\u003C\/p\u003E\u003Cp\u003EGeorgia Institute of Technology\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ECommittee:\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EDr. Tushar Krishna (advisor) - School of Electrical and Computer Engineering \u0026amp; School of Computer Science, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Yingyan (Celine) Lin - School of Computer Science, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Divya Mahajan - School of Computer Science \u0026amp; School of Electrical and Computer Engineering, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Manya Ghobadi - Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Bradford Beckmann - Research and Advanced Development, Advanced Micro Devices\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EAbstract:\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EThe advancement of large-scale Machine Learning (ML) models and their massive resource requirements has driven the development of specialized, distributed High-Performance Computing (HPC) platforms tailored to ML workloads. These platforms integrate multiple Neural Processing Units (NPUs) interconnected through custom network fabrics. Since ML models and data are distributed, frequent synchronization of activations and gradients among NPUs is required. This synchronization presents a major bottleneck in distributed ML, making efficient collective communication a pivotal research challenge.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EGiven the tightly coupled co-design space of distributed ML, judicious software-hardware optimization approaches are essential. To address this, I first present (i) ASTRA-sim2.0, an end-to-end simulation and modeling framework that facilitates design space exploration of the distributed ML stack. Next, I present (ii) LIBRA, an analytical modeling framework that captures the end-to-end execution time of distributed ML on multi-dimensional networks. Through integration with optimizers, LIBRA identifies optimal multi-dimensional network design points. Finally, I introduce (iii) TACOS, an autonomous topology-aware collective algorithm synthesizer that leverages time-expanded network representation and link-chunk matching algorithms to automatically generate optimized collective algorithms for arbitrary target topologies.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003ESoftware-Hardware Optimizations for Efficient Collective Communications in Distributed Machine Learning Platforms\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Software-Hardware Optimizations for Efficient Collective Communications in Distributed Machine Learning Platforms"}],"uid":"27707","created_gmt":"2024-09-17 16:40:50","changed_gmt":"2024-09-17 16:41:38","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2024-09-23T09:00:00-04:00","event_time_end":"2024-09-23T11:00:00-04:00","event_time_end_last":"2024-09-23T11:00:00-04:00","gmt_time_start":"2024-09-23 13:00:00","gmt_time_end":"2024-09-23 15:00:00","gmt_time_end_last":"2024-09-23 15:00:00","rrule":null,"timezone":"America\/New_York"},"location":"Klaus 1212, ","extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}