<node id="675584">
  <nid>675584</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1721850618</created>
  <changed>1721850646</changed>
  <title><![CDATA[PhD Proposal by Zhongzhi Yu]]></title>
  <body><![CDATA[<p><strong>Title: Improving Large-Scale Foundation Models Via Attention-Aware Techniques</strong></p><p><strong>&nbsp;</strong></p><p><strong>Date:&nbsp;</strong>Thursday, August 1st, 2024</p><p><strong>Time:&nbsp;</strong>2:00 pm - 3:00 pm ET</p><p><strong>Location:&nbsp;</strong>Virtually via Zoom (<a href="https://gatech.zoom.us/j/9960405372?pwd=bzhIbVdWRkxweW9naUh0aUt4ci9WZz09" target="_blank">https://gatech.zoom.us/j/9960405372?pwd=bzhIbVdWRkxweW9naUh0aUt4ci9WZz09</a>)</p><p><strong>&nbsp;</strong></p><p><strong>PhD Student:</strong></p><p>Zhongzhi Yu, School of Computer Science, Georgia Institute of Technology</p><p><strong>&nbsp;</strong></p><p><strong>Committee Members:</strong></p><p>Dr. Yingyan (Celine) Lin (Advisor) – School of Computer Science, Georgia Institute of Technology</p><p>Dr. Chao Zhang&nbsp;– School of Computational Science and Engineering, Georgia Institute of Technology</p><p>Dr. Judy Hoffman&nbsp;– School of Interactive Computing, Georgia Institute of Technology</p><p>Dr. Pavlo Molchanov&nbsp;– Nvidia Corporation</p><p><strong>&nbsp;</strong></p><p><strong>Abstract:</strong></p><p>Foundation models, which are a series of large-scale transformer models, have shown impressive performance across a diverse range of applications, from natural language processing to computer vision. The key enabler behind their success is the attention module, which controls how these models extract relationships among input tokens. However, despite the importance of the attention module, our understanding of its role during the inference and fine-tuning stages remains limited, leading to challenges such as potentially sub-optimal model performance and a lack of interpretability.</p><p>&nbsp;</p><p>My thesis research focuses on understanding the potentially suboptimal attention distributions generated by foundation models and developing attention-aware techniques to improve their performance. The primary insight from my research is that certain high-attention tokens can negatively affect foundation model performance during both fine-tuning and inference. Building on this insight, my research presents state-of-the-art solutions to enhance the performance of foundation models, including an attention-aware data augmentation technique that enhances the data efficiency of the fine-tuning process and an attention calibration technique that improves inference accuracy.</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Improving Large-Scale Foundation Models Via Attention-Aware Techniques]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p><strong>Improving Large-Scale Foundation Models Via Attention-Aware Techniques</strong></p>]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2024-08-01T14:00:00-04:00]]></value>
      <value2><![CDATA[2024-08-01T15:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[ZOOM]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>102851</tid>
        <value><![CDATA[Phd proposal]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
