<node id="681481">
  <nid>681481</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1743447935</created>
  <changed>1743448696</changed>
  <title><![CDATA[PhD Defense by Zhongzhi Yu]]></title>
  <body><![CDATA[<p>Title: Enhancing Foundation Models with Self-Guided Techniques: From Attention to Adapters to Agents</p><p>Date: Thursday, April 10th<br>Time: 10:00 AM – 11:30 AM (Eastern Time)<br>Location: Klaus 1212, Klaus Advanced Computing Building<br>Zoom Link: https://gatech.zoom.us/j/9960405372</p><p>Zhongzhi Yu<br>Ph.D. Student<br>School of Computer Science<br>Georgia Institute of Technology</p><p>Committee:<br>Dr. Yingyan (Celine) Lin (Advisor, School of Computer Science, Georgia Tech)<br>Dr. Chao Zhang (School of Computational Science &amp; Engineering, Georgia Tech)<br>Dr. Haoxing (Mark) Ren (Nvidia Corporation)<br>Dr. Pavlo Molchanov (Nvidia Corporation)<br>Dr. Zsolt Kira (School of Interactive Computing, Georgia Tech)</p><p>Abstract:</p><p>Foundation models, a class of large-scale transformers pretrained on large-scale datasets, have achieved remarkable performance across various applications. However, the growing demand to deploy foundation models in real-world applications with diverse resource and capability requirements highlights three critical challenges hindering their broader adoption: (1) the accuracy-efficiency trade-off, where improving accuracy through scaling leads to prohibitive computational costs; (2) inefficient adaptation strategies that require heavy supervision and resources, hindering use in resource-constrained environments; and (3) limited capabilities in handling complex tasks, such as automated hardware code generation and multi-agent collaboration.</p><p>This thesis addresses these challenges by leveraging our insight that foundation models encode rich representations, which, if effectively extracted, can enable self-guided optimization. Specifically, we introduce a set of techniques across three complementary levels, each targeting one of the aforementioned challenges: (1) At the attention level, addressing the accuracy-efficiency trade-off, we introduce the Attention Calibration Technique (ACT), which refines suboptimal attention distributions to improve performance without training, and SpotVLM, which reduces visual token redundancy in video-language models through attention-based selection. (2) At the adapter level, targeting adaptation efficiency, we present Master-ASR, which enables dynamic selection and composition of adapters to support efficient model adaptation. (3) At the agent level, targeting complex tasks that require knowledge retrieval and reasoning, we propose Instant-RAG, a retrieval-augmented generation system that hides retrieval overhead within the standard generation workflow to enable efficient knowledge access, and Spec2RTL-Agent, which addresses the challenging task of directly generating Register Transfer Level (RTL) code from specification documents by coordinating multiple foundation models to achieve advanced reasoning capabilities. Together, these techniques form a comprehensive framework for self-guided optimization that addresses key challenges limiting the broader deployment of foundation models, enabling more accessible and capable models in real-world scenarios.</p><p>&nbsp;</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Enhancing Foundation Models with Self-Guided Techniques: From Attention to Adapters to Agents]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>Enhancing Foundation Models with Self-Guided Techniques: From Attention to Adapters to Agents</p>]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2025-04-10T10:00:00-04:00]]></value>
      <value2><![CDATA[2025-04-10T12:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[Klaus 1212, Klaus Advanced Computing Building]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
