<node id="652985">
  <nid>652985</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1637245043</created>
  <changed>1637245043</changed>
  <title><![CDATA[PhD Defense by Jiachen Yang]]></title>
  <body><![CDATA[<p><strong>Title:&nbsp;</strong>Cooperation in Multi-Agent Reinforcement Learning</p>

<p>&nbsp;</p>

<p><strong>Date:&nbsp;</strong>November 30th, Tuesday, 2021</p>

<p><strong>Time:&nbsp;</strong>7:00-9:00 PM Eastern Time (4:00-6:00 PM Pacific Time)</p>

<p><strong>Location</strong>:&nbsp;<a href="https://bluejeans.com/773753749/7843">https://bluejeans.com/773753749/7843</a></p>

<p>&nbsp;</p>

<p><strong>Jiachen Yang</strong></p>

<p>Machine Learning PhD Candidate</p>

<p>School of Computational Science and Engineering<br />
Georgia Institute of Technology</p>

<p>&nbsp;</p>

<p><strong>Committee</strong></p>

<p>1. Dr. Hongyuan Zha (Advisor),&nbsp;School of Computational Science and Engineering, Georgia Institute of Technology |&nbsp;Executive Dean of School of Data Science, Chinese University of Hong Kong, Shenzhen</p>

<p>2. Dr. Tuo Zhao (Co-Advisor), School of Industrial and Systems Engineering, Georgia Institute of Technology</p>

<p>3. Dr. Charles Isbell, Dean of College of Computing, School of Interactive Computing, Georgia Institute of Technology</p>

<p>4. Dr. Matthew Gombolay, School of Interactive Computing, Georgia Institute of Technology</p>

<p>5. Dr. Daniel Faissol, Computational Engineering Division, Lawrence Livermore National Laboratory</p>

<p>&nbsp;</p>

<p><strong>Abstract</strong></p>

<p>As progress in reinforcement learning (RL) gives rise to increasingly general and powerful artificial intelligence, society needs to anticipate a possible future in which multiple RL agents learn and interact in a shared multi-agent environment. When a single principal has oversight of the multi-agent system, how should agents learn to cooperate via centralized training to achieve individual and global objectives? When agents belong to self-interested principals with imperfectly aligned objectives, how can cooperation emerge from fully-decentralized learning? This dissertation addresses both questions by proposing novel methods for multi-agent reinforcement learning (MARL) and demonstrating the empirical effectiveness of these methods in high-dimensional simulated environments.</p>

<p>&nbsp;</p>

<p>To address the first case, we propose new algorithms for fully-cooperative MARL in the paradigm of centralized training with decentralized execution. Firstly, we propose a method based on multi-agent curriculum learning and multi-agent credit assignment to address the setting where global optimality is defined as the attainment of all individual goals. Secondly, we propose a hierarchical MARL algorithm to discover and learn interpretable and useful skills for a multi-agent team to optimize a single team objective. Extensive experiments with ablations show the strengths of our approaches over state-of-the-art baselines.</p>

<p>&nbsp;</p>

<p>To address the second case, we propose learning algorithms to attain cooperation within a population of self-interested RL agents. We propose the design of a new agent who is equipped with the new ability to incentivize other RL agents and explicitly account for the other agents&#39; learning process. This agent overcomes the challenging limitation of fully-decentralized training and generates emergent cooperation in difficult social dilemmas. Then, we extend and apply this technique to the problem of incentive design, where a central incentive designer explicitly optimizes a global objective only by intervening on the rewards of a population of independent RL agents. Experiments on the problem of optimal taxation in a simulated market economy demonstrate the effectiveness of this approach.</p>

<p>&nbsp;</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Cooperation in Multi-Agent Reinforcement Learning]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2021-11-30T19:00:00-05:00]]></value>
      <value2><![CDATA[2021-11-30T21:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[https://bluejeans.com/773753749/7843]]></url>
      <title><![CDATA[Bluejeans]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
