{"618980":{"#nid":"618980","#data":{"type":"event","title":"Ph.D. Dissertation Defense - Hardik Sharma","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle\u003C\/strong\u003E\u003Cem\u003E:\u0026nbsp; \u003C\/em\u003E\u003Cem\u003EAccelerate Deep Learning for the Edge-to-cloud Continuum: A Specialized Full Stack Derived from Algorithms\u003C\/em\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ECommittee:\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Hadi Esmaeilzadeh, ECE, Chair , Advisor\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Hyesoon Kim, CoC\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Milos Prvulovic, CoC\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Tushar Krishna, ECE\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Vikas Chandra, Facebook\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EAbstract: \u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAdvances in high-performance computer architecture design has been a major driver for\u0026nbsp;the rapid evolution of Deep Neural Networks (DNN). Due to their insatiable demand for\u0026nbsp;compute power, naturally, both the research community as well the\u0026nbsp;industry have turned to\u0026nbsp;accelerators to accommodate modern DNN computation. Furthermore, DNNs are gaining\u0026nbsp;prevalence and have found applications across a wide spectrum of devices, from commod-\u0026nbsp;ity smartphones to enterprise cloud\u0026nbsp;platforms. However, there is no one-size-fits-all solu-\u0026nbsp;tion for this continuum of devices that can meet the strict energy\/power\/chip-area budgets\u0026nbsp;for edge devices\u0026nbsp;and\u0026nbsp;meet the high performance requirements for enterprise-grade servers.\u0026nbsp;This thesis\u0026nbsp;designs a specialized compute stack for DNN acceleration across the edge-\u0026nbsp;to-cloud continuum that flexibly matches the varying constraints for different devices and\u0026nbsp;simultaneously exploit algorithmic properties to maximize the benefits from\u0026nbsp;acceleration.\u0026nbsp;To this end, this thesis first explores a tight integration of Neural Network (NN) accelerators\u0026nbsp;within\u0026nbsp;the massively-parallel GPUs with a minimal area overhead. We show that a tight-\u0026nbsp;coupling of NN-accelerators and GPUs can provide a\u0026nbsp;significant gain in performance and\u0026nbsp;energy efficiency across a diverse set of applications through neural acceleration, by ap-\u0026nbsp;proximating regions of approximation-amenable code using a neural networks. Next, this\u0026nbsp;thesis develop a full-stack for\u0026nbsp;accelerating DNN\u0026nbsp;inference\u0026nbsp;on FPGAs that encompasses (1)\u0026nbsp;high-level algorithmic abstractions, (2) a flexible template accelerator architecture, and (3)\u0026nbsp;a compiler that automatically and efficiently optimizes the template architecture to max-\u0026nbsp;imize\u0026nbsp;DNN performance using the limited resources available on the FPGA die. Next,\u0026nbsp;this thesis explores scale-out acceleration of\u0026nbsp;training\u0026nbsp;using cloud-scale FPGAs for a wide\u0026nbsp;range of machine learning algorithms, including neural networks. The\u0026nbsp;challenge here is\u0026nbsp;to design an accelerator architecture that can scale-up to efficiently use the large pool of\u0026nbsp;compute resources available on modern cloud-grade FPGAs. To tackle this challenge, this\u0026nbsp;thesis explores multi-threading to maximize\u0026nbsp;efficiency from FPGA acceleration by running\u0026nbsp;multiple parallel threads of training. Then, this thesis builds upon the algorithmic insight\u0026nbsp;that bitwidth of operations in DNNs can be reduced without compromising their classi-\u0026nbsp;fication accuracy.\u0026nbsp;However, to prevent loss of accuracy, the bitwidth varies significantly\u0026nbsp;across DNNs and it may even be adjusted for each layer individually. To alleviate these\u0026nbsp;deficiencies, the second thrust introduces dynamic bit-level fusion\/decomposition as a\u0026nbsp;new\u0026nbsp;dimension in the design of DNN accelerators. This flexibility in the architecture enables\u0026nbsp;minimizing the computation and the communication at the finest granularity possible with\u0026nbsp;no loss in accuracy. Finally, this thesis explores mixed-signal\u0026nbsp;acceleration to push accelerator efficiency to its limits. While mixed-signal circuitry promises significant efficiency\u0026nbsp;benefits, they suffer from limited range for information encoding, susceptibility to noise,\u0026nbsp;and Analog to Digital (A\/D) conversion\u0026nbsp;overheads. This thesis addresses these challenges\u0026nbsp;by offering and leveraging the insight that a vector dot-product (the basic operation in\u0026nbsp;DNNs) can be bit-partitioned into groups of spatially parallel low-bitwidth operations, and\u0026nbsp;interleaved across\u0026nbsp;multiple elements of the vectors. As such, the building blocks of our accelerator become a group of wide, yet low-bitwidth multiply-accumulate units that operate\u0026nbsp;in the analog domain and share a single A\/D converter. Using this bit-partitioned\u0026nbsp;building\u0026nbsp;block, we design a 3D-stacked accelerator architecture that can provide significant gains\u0026nbsp;in efficiency over purely-digital state-of-the-art 3D-stacked accelerator, without losing any classification accuracy.\u0026nbsp;\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Accelerate Deep Learning for the Edge-to-cloud Continuum: A Specialized Full Stack Derived from Algorithms "}],"uid":"28475","created_gmt":"2019-03-07 22:26:23","changed_gmt":"2019-03-07 22:26:23","author":"Daniela Staiculescu","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2019-03-15T11:00:00-04:00","event_time_end":"2019-03-15T13:00:00-04:00","event_time_end_last":"2019-03-15T13:00:00-04:00","gmt_time_start":"2019-03-15 15:00:00","gmt_time_end":"2019-03-15 17:00:00","gmt_time_end_last":"2019-03-15 17:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"434381","name":"ECE Ph.D. Dissertation Defenses"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"},{"id":"1808","name":"graduate students"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}