<node id="634556">
  <nid>634556</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1587404386</created>
  <changed>1587404386</changed>
  <title><![CDATA[PhD Defense by Wenhao Yu]]></title>
  <body><![CDATA[<p><strong>Title</strong>: Learning to Walk using Deep Reinforcement Learning and Transfer Learning</p>

<p>&nbsp;&nbsp;</p>

<p>Wenhao&nbsp;Yu</p>

<p>School of Interactive Computing</p>

<p>College of Computing</p>

<p>Georgia Institute of Technology</p>

<p>&nbsp;</p>

<p>&nbsp;</p>

<p><strong>Date</strong>: Tuesday, April 28, 2020</p>

<p><strong>Time</strong>: 2:00 PM-4:00 PM (EST)</p>

<p><strong>BlueJeans</strong>: <a href="https://bluejeans.com/206491397">https://bluejeans.com/206491397</a></p>

<p>**Note: this&nbsp;defense&nbsp;is remote-only due to the institute&#39;s guidelines on COVID-19**</p>

<p>&nbsp;</p>

<p><strong>Committee:</strong></p>

<p>Dr. Greg Turk (Advisor, School of Interactive Computing, Georgia Tech)</p>

<p>Dr. C. Karen Liu (Advisor, School of Engineering, Stanford University / School of Interactive Computing, Georgia Tech)</p>

<p>Dr. Charlie Kemp (Department &nbsp;of Biomedical Engineering / School of Interactive Computing, Georgia Tech)</p>

<p>Dr. Sergey Levine (Department &nbsp;of &nbsp;Electrical &nbsp;Engineering and Computer Sciences,&nbsp;University of California, Berkeley)</p>

<p>Dr. Michiel van de Panne (Department of Computer Science, University of British Columbia)</p>

<p>&nbsp;</p>

<p>&nbsp;</p>

<p>&nbsp;</p>

<p><strong>Abstract:</strong></p>

<p>In this dissertation, we seek to develop computational tools to reproduce the locomotion of humans and animals in complex and unpredictable environments. Such tools can have a significant impact in computer graphics, robotics, machine learning, and biomechanics. However, there are two main hurdles in achieving this goal. First, synthesizing a successful locomotion policy requires precise control of a high-dimensional under-actuated system and striking a balance among a set of conflicting goals such as walking forward, energy efficiency, and keeping balance. Second, the synthesized locomotion policy needs to generalize to new environments that were not present during optimization and training in order to cope with unexpected situations during execution.</p>

<p>&nbsp;</p>

<p>To achieve this goal, we first introduce a Deep Reinforcement Learning (DRL) approach for learning locomotion controllers for simulated legged creatures without using motion data. We propose a loss term in DRL objective that encourages the agent to exhibit symmetric behavior and a curriculum learning approach that provides modulated physical assistance in order to achieve successful training of energy-efficient controllers. We demonstrate the results of this approach across a variety of simulated characters that, when we combine the two proposed ideas, achieve low-energy and symmetric locomotion gaits without relying on external motion data.</p>

<p>&nbsp;</p>

<p>Next, we introduce a set of Transfer Learning (TL) algorithms that generalize the learned locomotion controllers to novel environments. Specifically, we focus on the problem of transferring a simulation-trained locomotion controller to a real legged robot, also known as the Sim-to-Real transfer problem. We first introduce a transfer learning algorithm that can successfully operate in unknown and changing dynamics within the training dynamics. To allow successful transfer outside the training environments, we further propose an algorithm that uses a limited amount of samples in the testing environments to adapt the simulation-trained policy. We demonstrate Sim-to-Real transfer for a biped robot, Robotis Darwin OP2, and a quadruped robot, Ghost Robotics Minitaur, respectively.</p>

<p>&nbsp;</p>

<p>Finally, we consider the problem of safety during policy execution and transfer. We propose the training of a universal safe policy (USP) that controls the robot to avoid unsafe states from a diverse set of states, and an algorithm to combine a USP and a task policy to complete the task while acting safely. We demonstrate that the resulting algorithm can allow policies to adapt to notably different simulated dynamics with at most two failure trials, suggesting a promising path towards learning robust and safe control policies for sim-to-real transfer.</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Learning to Walk using Deep Reinforcement Learning and Transfer Learning   ]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2020-04-28T15:00:00-04:00]]></value>
      <value2><![CDATA[2020-04-28T17:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Faculty/Staff]]></value>
      </item>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[https://bluejeans.com/206491397]]></url>
      <title><![CDATA[BlueJeans Link]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
