<node id="656963">
  <nid>656963</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1649080581</created>
  <changed>1649080581</changed>
  <title><![CDATA[PhD Defense by Zaiwei Chen]]></title>
  <body><![CDATA[<p><strong>Title: </strong>A Unified Lyapunov Framework for Finite-Sample Analysis of&nbsp;Reinforcement Learning Algorithms<br />
<br />
<strong>Date:</strong>&nbsp;04/07/2022<br />
<strong>Time:&nbsp;</strong>1:00 - 2:30 pm EST<br />
<strong>Location:</strong> Groseclose 402, or virtually at&nbsp;<a href="https://gatech.zoom.us/j/9849731860?pwd=K29BSStGekgvYkxlK1ZRZVp1QUlLdz09">https://gatech.zoom.us/j/9849731860?pwd=K29BSStGekgvYkxlK1ZRZVp1QUlLdz09</a>&nbsp;(Meeting ID: 984 973 1860 Passcode: 7n46MA).<br />
<strong>Student Name: </strong>Zaiwei Chen<br />
Machine Learning PhD Student<br />
School of Industrial &amp; Systems Engineering<br />
Georgia Institute of Technology<br />
<br />
<strong>Committee</strong><br />
1 Dr. Siva Theja Maguluri (Advisor)<br />
2 Dr. John-Paul Clarke (Co-advisor)<br />
3 Dr. Justin Romberg<br />
4 Dr.&nbsp;Ashwin Pananjady<br />
5 Dr. Benjamin Van Roy<br />
<br />
<strong>Abstract: </strong>In this thesis, we develop a unified Lyapunov approach for establishing finite-sample guarantees of&nbsp;reinforcement learning (RL) algorithms. Since most of the RL algorithms can be modeled by stochastic approximation (SA) algorithms under Markovian noise, we first provide a Lyapunov framework for analyzing Markovian SA algorithms. The key idea is to construct a novel Lyapunov function (called generalized Moreau envelop) to capture the dynamics of the corresponding SA algorithm, and establish a negative drift inequality, which then can be repeatedly used to derive finite-sample bounds. We use our SA results to design RL algorithms and perform finite-sample analysis. Specifically, for tabular RL, we establish finite-sample bounds for Q-learning, variants of on-policy TD-learning algorithms such as n-step TD and TD(\lambda), and off-policy TD-learning algorithms such as Retrace(\lambda), Q^\pi(\lambda), and V-trace, etc. As by-products, we provide theoretical insight into the problem of efficiency of bootstrapping in on-policy TD-learning, and demonstrate the bias-variance trade-off in off-policy TD. For RL with linear function approximation, we design convergent variants of Q-learning and TD-learning in the presence of the deadly triad, and derive finite-sample guarantees. The TD-learning algorithm was later used in a general policy-based framework (including approximate policy iteration and natural policy gradient) to eventually find an optimal policy of the RL algorithm with an O(\epsilon^{-2}) sample complexity.</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2022-04-07T14:00:00-04:00]]></value>
      <value2><![CDATA[2022-04-07T16:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Faculty/Staff]]></value>
      </item>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[https://gatech.zoom.us/j/9849731860?pwd=K29BSStGekgvYkxlK1ZRZVp1QUlLdz09]]></url>
      <title><![CDATA[ZOOM]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
