<![CDATA[PhD Defense by Ashley Edwards]]>

616081 event 1546979869 1546979869 <![CDATA[PhD Defense by Ashley Edwards]]> Title: Emulation and Imitation via Perceptual Goal
Specifications

Ashley D. Edwards

Ph.D. Student

School of Interactive Computing

College of Computing

Georgia Institute of Technology

Date: Monday, January 14th, 2019

Time: 12:30 PM to 2:30PM (EST)

Location: TBA, College of Computing Building

Committee:

---------------

Dr. Charles Isbell (Advisor), School of
Interactive Computing, Georgia Institute of Technology

Dr. Tucker Balch, School of Interactive
Computing, Georgia Institute of Technology

Dr. Sonia Chernova, School of Interactive
Computing, Georgia Institute of Technology

Dr. Mark Riedl, School of Interactive
Computing, Georgia Institute of Technology

Dr. Pieter Abbeel, Department of Electrical
Engineering and Computer Sciences, University of California, Berkeley

Summary:

---------------

Much of the power behind reinforcement
learning is that we can use a single signal, known as the reward, to indicate
desired behavior. However, defining these rewards can often be difficult. This
dissertation introduces an alternative to the typical reward design mechanism.
In particular, we introduce four methods that allow one to focus on specifying
perceptual goals, rather than scalar rewards. By removing domain-specific
aspects of the problem, we demonstrate that goals can be expressed while being
agnostic to the reward function, action-space, or state-space of the agent’s
environment.

First, we will introduce perceptual reward
functions and describe how we can utilize a hand-defined similarity metric to
enable learning from goals that are different from the agent’s. We show how we
can use this method to train a simulated robot to learn from videos of humans.

Next, we will introduce cross-domain
perceptual reward functions and describe how we can learn a reward function for
cross-domain goal specifications. We show how we can use this method to train
an agent in a maze to reach goals specified through speech and hand gestures.

Next, we will introduce perceptual value
functions and describe how we can learn a value function from sequences of
expert observations without access to ground-truth actions. We show how we can
use this method to infer values from observation for a maze and pouring task,
and to train an agent to solve unseen levels within a platform game.

Finally, we will introduce latent policy
networks and describe how we can learn a policy from sequences of expert
observations without access to ground-truth actions. We show how we can use
this method to infer a policy from observation and train an agent to solve
classic control tasks and a platform game.

]]> <![CDATA[]]> 221981 1788 100811