<node id="681745">
  <nid>681745</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1744390998</created>
  <changed>1744391072</changed>
  <title><![CDATA[PhD Defense by Alex Bukharin]]></title>
  <body><![CDATA[<p>Title: Robust and Flexible Reward Modeling for LLM Alignment<br><br>Date: April 21st, 2025<br>Time:11:00 am – 1:00 pm (EST)&nbsp;<br>Location: ISyE Main 224&nbsp;<br>Zoom link: https://gatech.zoom.us/j/91835542508<br><br>Alexander Bukharin<br>Machine Learning PhD Candidate&nbsp;<br>H. Milton Stewart School of Industrial and Systems Engineering<br>Georgia Institute of Technology&nbsp;<br><br>Committee<br>1. Dr. Tuo Zhao (ISYE, Georgia Tech) (Advisor)&nbsp;<br>2. Dr. Chao Zhang (CSE, Georgia Tech)&nbsp;<br>3. Dr. Bo Dai (CSE, Georgia Tech)&nbsp;<br>4. Dr. Sen Na (ISYE, Georgia Tech)<br>5. Dr. Olivier Delalleau Liu (NVIDIA)&nbsp;<br><br>Abstract<br>As large language models grow increasingly more capable, ensuring their alignment with human values is of utmost importance. One of the most promising ways to align language models is by designing a reward function that measures alignment with human values, and training the language model to maximize this reward. In this thesis, we focus on two approaches reward design: reward design from external feedback signals and reward learning from human annotated datasets. In this first chapter we develop a reward design framework, HERON, that eases reward function design by exploiting hierarchical relationships between feedback signals. In the second chapter, we propose an algorithm to learn reward functions from datasets with corrupted human annotations. In the last chapter, we develop an adversarial attack approach that automatically discovers flaws in state-of-the-art reward functions, and then subsequently use these attacks to train more robust reward models. Altogether, these contributions advance the scalability and robustness of reward modeling.<br><br><br>&nbsp;</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[ Robust and Flexible Reward Modeling for LLM Alignment  ]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p><strong>&nbsp;Robust and Flexible Reward Modeling for LLM Alignment</strong></p><p>&nbsp;</p>]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2025-04-21T11:00:00-04:00]]></value>
      <value2><![CDATA[2025-04-21T13:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[ISyE Main 224 ]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
