<node id="618980">
  <nid>618980</nid>
  <type>event</type>
  <uid>
    <user id="28475"><![CDATA[28475]]></user>
  </uid>
  <created>1551997583</created>
  <changed>1551997583</changed>
  <title><![CDATA[Ph.D. Dissertation Defense - Hardik Sharma]]></title>
  <body><![CDATA[<p><strong>Title</strong><em>:&nbsp; </em><em>Accelerate Deep Learning for the Edge-to-cloud Continuum: A Specialized Full Stack Derived from Algorithms</em></p>

<p><strong>Committee:</strong></p>

<p>Dr. Hadi Esmaeilzadeh, ECE, Chair , Advisor</p>

<p>Dr. Hyesoon Kim, CoC</p>

<p>Dr. Milos Prvulovic, CoC</p>

<p>Dr. Tushar Krishna, ECE</p>

<p>Dr. Vikas Chandra, Facebook</p>

<p><strong>Abstract: </strong></p>

<p>Advances in high-performance computer architecture design has been a major driver for&nbsp;the rapid evolution of Deep Neural Networks (DNN). Due to their insatiable demand for&nbsp;compute power, naturally, both the research community as well the&nbsp;industry have turned to&nbsp;accelerators to accommodate modern DNN computation. Furthermore, DNNs are gaining&nbsp;prevalence and have found applications across a wide spectrum of devices, from commod-&nbsp;ity smartphones to enterprise cloud&nbsp;platforms. However, there is no one-size-fits-all solu-&nbsp;tion for this continuum of devices that can meet the strict energy/power/chip-area budgets&nbsp;for edge devices&nbsp;and&nbsp;meet the high performance requirements for enterprise-grade servers.&nbsp;This thesis&nbsp;designs a specialized compute stack for DNN acceleration across the edge-&nbsp;to-cloud continuum that flexibly matches the varying constraints for different devices and&nbsp;simultaneously exploit algorithmic properties to maximize the benefits from&nbsp;acceleration.&nbsp;To this end, this thesis first explores a tight integration of Neural Network (NN) accelerators&nbsp;within&nbsp;the massively-parallel GPUs with a minimal area overhead. We show that a tight-&nbsp;coupling of NN-accelerators and GPUs can provide a&nbsp;significant gain in performance and&nbsp;energy efficiency across a diverse set of applications through neural acceleration, by ap-&nbsp;proximating regions of approximation-amenable code using a neural networks. Next, this&nbsp;thesis develop a full-stack for&nbsp;accelerating DNN&nbsp;inference&nbsp;on FPGAs that encompasses (1)&nbsp;high-level algorithmic abstractions, (2) a flexible template accelerator architecture, and (3)&nbsp;a compiler that automatically and efficiently optimizes the template architecture to max-&nbsp;imize&nbsp;DNN performance using the limited resources available on the FPGA die. Next,&nbsp;this thesis explores scale-out acceleration of&nbsp;training&nbsp;using cloud-scale FPGAs for a wide&nbsp;range of machine learning algorithms, including neural networks. The&nbsp;challenge here is&nbsp;to design an accelerator architecture that can scale-up to efficiently use the large pool of&nbsp;compute resources available on modern cloud-grade FPGAs. To tackle this challenge, this&nbsp;thesis explores multi-threading to maximize&nbsp;efficiency from FPGA acceleration by running&nbsp;multiple parallel threads of training. Then, this thesis builds upon the algorithmic insight&nbsp;that bitwidth of operations in DNNs can be reduced without compromising their classi-&nbsp;fication accuracy.&nbsp;However, to prevent loss of accuracy, the bitwidth varies significantly&nbsp;across DNNs and it may even be adjusted for each layer individually. To alleviate these&nbsp;deficiencies, the second thrust introduces dynamic bit-level fusion/decomposition as a&nbsp;new&nbsp;dimension in the design of DNN accelerators. This flexibility in the architecture enables&nbsp;minimizing the computation and the communication at the finest granularity possible with&nbsp;no loss in accuracy. Finally, this thesis explores mixed-signal&nbsp;acceleration to push accelerator efficiency to its limits. While mixed-signal circuitry promises significant efficiency&nbsp;benefits, they suffer from limited range for information encoding, susceptibility to noise,&nbsp;and Analog to Digital (A/D) conversion&nbsp;overheads. This thesis addresses these challenges&nbsp;by offering and leveraging the insight that a vector dot-product (the basic operation in&nbsp;DNNs) can be bit-partitioned into groups of spatially parallel low-bitwidth operations, and&nbsp;interleaved across&nbsp;multiple elements of the vectors. As such, the building blocks of our accelerator become a group of wide, yet low-bitwidth multiply-accumulate units that operate&nbsp;in the analog domain and share a single A/D converter. Using this bit-partitioned&nbsp;building&nbsp;block, we design a 3D-stacked accelerator architecture that can provide significant gains&nbsp;in efficiency over purely-digital state-of-the-art 3D-stacked accelerator, without losing any classification accuracy.&nbsp;</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Accelerate Deep Learning for the Edge-to-cloud Continuum: A Specialized Full Stack Derived from Algorithms ]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2019-03-15T11:00:00-04:00]]></value>
      <value2><![CDATA[2019-03-15T13:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>434381</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[ECE Ph.D. Dissertation Defenses]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
          <item>
        <tid>1808</tid>
        <value><![CDATA[graduate students]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
