Ph.D. Defense by Arya Irani

Primary tabs

Ph.D. Dissertation Defense Announcement

Title: Utilizing Negative Policy Information to Accelerate Reinforcement Learning

Arya Irani
School of Interactive Computing
College of Computing
Georgia Institute of Technology

Date: Monday, November 10, 2014
Time: 12:30pm - 2:30pm EST
Location: CCB 345

Dr. Charles Isbell (Advisor; School of Interactive Computing, Georgia Institute of Technology)
Dr. Andrea Thomaz (School of Interactive Computing, Georgia Institute of Technology)
Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)
Dr. Karen Feigh (School of Aerospace Engineering, Georgia Institute of Technology)
Dr. Doina Precup (School of Computer Science, McGill University)


A pilot study on Markov Decision Problem (MDP) task decomposition by humans revealed that participants would break down tasks into both short-term subgoals (with a defined end-condition), and long-term considerations and invariants (no end-condition).  In the context of MDPs, behaviors having clear start and end conditions are well-modeled by options (Precup, 2000), but no abstraction exists in the literature for continuous requirements imposed on the agent's behavior.  By modeling such policy restrictions and incorporating this information into an agent’s exploration, learning speedup can be achieved.  Two proposed representations for such continuous requirements are the state constraint (a set or predicate identifying states that the agent should avoid), and the state-action constraint (identifying state-action pairs that should not be taken).

We will demonstrate that the composition of options with constraints forms a powerful combination — a naïve option designed to perform well in a best-case scenario may still be used to benefit in domains where the best-case scenario is not guaranteed.  This separation of concerns simplifies design and learning.  We present the results of a study focusing on two classic video game inspired domains, in which participants with no AI experience construct and record examples of states to avoid; the examples are used to train predictors which implement a state constraint.  We also demonstrate that constraints can in many cases be formulated by software engineers and given as modules to the RL system, eliminating one machine learning layer.  We will discuss schemes for overcoming imperfectly defined constraints that would prevent an optimal policy, considerations in creating domain-appropriate schemes, as well as several future directions.


  • Workflow Status:
  • Created By:
    Danielle Ramirez
  • Created:
  • Modified By:
    Fletcher Moore
  • Modified:

Target Audience

    No target audience selected.