<node id="616081">
  <nid>616081</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1546979869</created>
  <changed>1546979869</changed>
  <title><![CDATA[PhD Defense by Ashley Edwards]]></title>
  <body><![CDATA[<p><strong>Title:</strong> Emulation and Imitation via Perceptual Goal<br />
Specifications</p>

<p>&nbsp;</p>

<p>Ashley D. Edwards</p>

<p>Ph.D. Student</p>

<p>School of Interactive Computing</p>

<p>College of Computing</p>

<p>Georgia Institute of Technology</p>

<p>&nbsp;</p>

<p>Date: Monday, January 14th, 2019</p>

<p>Time: 12:30 PM to 2:30PM (EST)</p>

<p>Location: TBA, College of Computing Building</p>

<p>&nbsp;</p>

<p><strong>Committee:</strong></p>

<p>---------------</p>

<p>Dr. Charles Isbell (Advisor), School of<br />
Interactive Computing, Georgia Institute of Technology</p>

<p>Dr. Tucker Balch, School of Interactive<br />
Computing, Georgia Institute of Technology</p>

<p>Dr. Sonia Chernova, School of Interactive<br />
Computing, Georgia Institute of Technology</p>

<p>Dr. Mark Riedl, School of Interactive<br />
Computing, Georgia Institute of Technology</p>

<p>Dr. Pieter Abbeel, Department of Electrical<br />
Engineering and Computer Sciences, University of California, Berkeley</p>

<p>&nbsp;</p>

<p><strong>Summary:</strong></p>

<p>---------------</p>

<p>Much of the power behind reinforcement<br />
learning is that we can use a single signal, known as the reward, to indicate<br />
desired behavior. However, defining these rewards can often be difficult. This<br />
dissertation introduces an alternative to the typical reward design mechanism.<br />
In particular, we introduce four methods that allow one to focus on specifying<br />
perceptual goals, rather than scalar rewards. By removing domain-specific<br />
aspects of the problem, we demonstrate that goals can be expressed while being<br />
agnostic to the reward function, action-space, or state-space of the agent&rsquo;s<br />
environment.</p>

<p>&nbsp;</p>

<p>First, we will introduce perceptual reward<br />
functions and describe how we can utilize a hand-defined similarity metric to<br />
enable learning from goals that are different from the agent&rsquo;s. We show how we<br />
can use this method to train a simulated robot to learn from videos of humans.</p>

<p>&nbsp;</p>

<p>Next, we will introduce cross-domain<br />
perceptual reward functions and describe how we can learn a reward function for<br />
cross-domain goal specifications. We show how we can use this method to train<br />
an agent in a maze to reach goals specified through speech and hand gestures.</p>

<p>&nbsp;</p>

<p>Next, we will introduce perceptual value<br />
functions and describe how we can learn a value function from sequences of<br />
expert observations without access to ground-truth actions. We show how we can<br />
use this method to infer values from observation for a maze and pouring task,<br />
and to train an agent to solve unseen levels within a platform game.</p>

<p>&nbsp;</p>

<p>Finally, we will introduce latent policy<br />
networks and describe how we can learn a policy from sequences of expert<br />
observations without access to ground-truth actions. We show how we can use<br />
this method to infer a policy from observation and train an agent to solve<br />
classic control tasks and a platform game.</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Emulation and Imitation via Perceptual Goal Specifications]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2019-01-14T12:30:00-05:00]]></value>
      <value2><![CDATA[2019-01-14T14:30:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Faculty/Staff]]></value>
      </item>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
