Ph.D. Defense by Arya Irani

Ph.D. Dissertation Defense Announcement

Title: Utilizing Negative Policy Information to Accelerate Reinforcement Learning

Arya Irani
School of Interactive Computing
College of Computing
Georgia Institute of Technology

Date: Monday, November 10, 2014
Time: 12:30pm - 2:30pm EST
Location: CCB 345

Committee:
Dr. Charles Isbell (Advisor; School of Interactive Computing, Georgia Institute of Technology)
Dr. Andrea Thomaz (School of Interactive Computing, Georgia Institute of Technology)
Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)
Dr. Karen Feigh (School of Aerospace Engineering, Georgia Institute of Technology)
Dr. Doina Precup (School of Computer Science, McGill University)

Abstract:

A pilot study on Markov Decision Problem (MDP) task decomposition by humans revealed that participants would break down tasks into both short-term subgoals (with a defined end-condition), and long-term considerations and invariants (no end-condition). In the context of MDPs, behaviors having clear start and end conditions are well-modeled by options (Precup, 2000), but no abstraction exists in the literature for continuous requirements imposed on the agent's behavior. By modeling such policy restrictions and incorporating this information into an agent’s exploration, learning speedup can be achieved. Two proposed representations for such continuous requirements are the state constraint (a set or predicate identifying states that the agent should avoid), and the state-action constraint (identifying state-action pairs that should not be taken).

We will demonstrate that the composition of options with constraints forms a powerful combination — a naïve option designed to perform well in a best-case scenario may still be used to benefit in domains where the best-case scenario is not guaranteed. This separation of concerns simplifies design and learning. We present the results of a study focusing on two classic video game inspired domains, in which participants with no AI experience construct and record examples of states to avoid; the examples are used to train predictors which implement a state constraint. We also demonstrate that constraints can in many cases be formulated by software engineers and given as modules to the RL system, eliminating one machine learning layer. We will discuss schemes for overcoming imperfectly defined constraints that would prevent an optimal policy, considerations in creating domain-appropriate schemes, as well as several future directions.

Media

No media selected

Summary

Details

Monday

Nov 10 2014

11:30am - 01:30pm

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Danielle Ramirez
Created:11/03/2014
Modified By:Fletcher Moore
Modified:10/07/2016

Mercury (Hg)

Ph.D. Defense by Arya Irani

Log in

Georgia Institute of Technology

Ph.D. Defense by Arya Irani

Primary tabs

Log in

Georgia Institute of Technology