<node id="682722">
  <nid>682722</nid>
  <type>event</type>
  <uid>
    <user id="28475"><![CDATA[28475]]></user>
  </uid>
  <created>1749246798</created>
  <changed>1749246868</changed>
  <title><![CDATA[Ph.D. Dissertation Defense - Woohong Byun]]></title>
  <body><![CDATA[<p><strong>Title</strong><em>:&nbsp; Energy-Efficient Hardware Acceleration of Transformer-Based Models</em></p><p><strong>Committee:</strong></p><p>Dr. Saibal Mukhopadhyay, ECE, Chair, Advisor</p><p>Dr. Shimeng Yu, ECE</p><p>Dr. Visvesh Sathe, ECE</p><p>Dr. Callie Hao, ECE</p><p>Dr. Hyesoon Kim, CoC</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Energy-Efficient Hardware Acceleration of Transformer-Based Models ]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>The objective of this research is to develop a software-hardware co-optimization framework for energy-efficient deployment of transformer-based language models, such as BERT and generative LLMs, on resource-constrained platforms such as FPGAs. This work addresses memory and computation challenges through novel quantization algorithms and custom accelerator designs. For BERT, a Hessian-based parameter-wise mixed-precision quantization method is proposed, assigning optimal precision to each parameter based on second-order sensitivity. To enhance hardware efficiency, a Hessian-driven row-wise weight quantization scheme is introduced, enabling mixed-precision matrices to be separated into two uniform-precision matrices, allowing all parameters to fit on-chip with the proposed FPGA accelerator. For generative LLMs, where memory demands scale with sequence length, a Weight-Hessian-aware KV cache quantization strategy is presented, applying intra-layer mixed-precision using precomputed Hessians to eliminate runtime overhead. To further reduce hardware complexity, a Query-Key coupled activation quantization method aligns bit precision of outer product pairs through Query-Key coupled Hessian analysis. A concurrent quantization approach jointly optimizes row-wise weight and Query-Key activation precision using multi-precision formats, improving compression and energy efficiency. These techniques are supported by a novel multi-precision FPGA accelerator for BERT and GPT-2, capable of handling both power-of-two and non-power-of-two bit-widths. With optimized dataflow, the design minimizes off-chip memory access and significantly outperforms existing solutions in energy efficiency and inference performance.</p>]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2025-06-16T11:00:00-04:00]]></value>
      <value2><![CDATA[2025-06-16T13:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[Online]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
          <item>
        <url>https://teams.microsoft.com/l/meetup-join/19%3ameeting_OTM3MWZjZmMtY2UxMS00MzBkLWFiYTgtOWE2MjhiMDdhMjlj%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%224f74ada8-7c29-4bba-a4ad-2cf7214f2aa0%22%7d</url>
        <link_title><![CDATA[Microsoft Teams Meeting link]]></link_title>
      </item>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>434381</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[ECE Ph.D. Dissertation Defenses]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
          <item>
        <tid>1808</tid>
        <value><![CDATA[graduate students]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
