PhD Proposal by Spencer Frazier
Title: Learning Social Norms from Stories: A Prior for Value-Aligned AI
Date: Tuesday, October 4th, 2022
Time: 8:00am-10:00am (ET)
Join our Cloud HD Video Meeting
Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a publicly traded company headquartered in San Jose, CA.
Zoom Meeting ID: 950 0617 3248
Zoom Passcode: 129937
COC: Interactive Computing
Georgia Institute of Technology
Dr. Mark Riedl (Advisor, School of Interactive Computing, Georgia Institute of Technology)
Dr. Sonia Chernova (School of Interactive Computing, Georgia Institute of Technology)
Dr. Ayanna Howard (Dean, College of Engineering, The Ohio State University)
Dr. Henny Admoni (Robotics Institute, Carnegie Mellon University)
Dr. Brent Harrison (College of Engineering, University of Kentucky)
No matter their size or type, all artificial systems may be vulnerable to bias, toxic behavior or harmful output. A concern held by many is that any autonomous agent may develop policies which may be “optimal” for a task but do not adhere to human preferences or produce toxic, undesirable output. This problem can manifest itself in various ways as outlined in existing literature regarding AI Safety, AI Ethics, Moral AI, Human-Centered AI or any work related to ethically constrained autonomous systems. Truly “Human-Centered AI” must adequately address all of these concerns. “Value Alignment” is a property of an intelligent agent indicating that it can only pursue goals and activities that are beneficial to humans. Existing value alignment methods include preference learning, imitation learning, inverse reinforcement learning, learning from demonstration and/or other techniques which align models to the values of humans through observation of their behavior. While traces of human behavior can guide autonomous systems, this data is costly to acquire.
This thesis introduces a number of complementary techniques which use value-aligned priors extracted from text. These are extracted from narratives – stories written by humans. These stories can be children's books, comics, TV or movie scripts or any other media from which humans learn prosocial behavior. Stories are an abundant source of content which encode social norms. Large language models and the latent knowledge they encode are a primary enabling technology for this work. The size of these models and size of datasets used to train them make them uniquely suited for further fine-tuning on stories. In existing work, 1) We have introduced a technique in which a value-aligned prior (classification model) is trained using stories encoding societal norms. This preliminary work first validates the possibility of social norm extraction and few/zero-shot transfer among various datasets. 2) We demonstrate use of a normative prior classification model to guide reinforcement learning agents to less non-normative behavior in a TextWorld (Game) Environment. We further show it is possible to augment the loss function of generative text models (e.g. GPT) to produce less toxic output. 3) We show how we have begun to address the issue of identifying normativity across time and multiple events. We also address the non-binary nature of socially acceptable behavior through “principles” and also “alignments” for classification and generation of new stories.
There are situations where no prior can be learned in advance (i.e. novel events and context). To address the issues with this approach, the final work of this thesis will focus on addressing a number of concerns. First, we must consider how to bound social-normative systems even when aligned via stories. Sequential normativity - behavior over time - must also be examined as individuals’ behavior cannot be identified simply via discrete, one-off events. Complementary to this, a sufficiently large dataset will be produced using books, TV scripts, movie scripts and other narratives to provide motivation for future work. This thesis proposes a comprehensive set of approaches which will validate these hypotheses related to social norm extraction and classification as a prior for AI value alignment.
- Workflow Status: Published
- Created By: Tatianna Richardson
- Created: 09/27/2022
- Modified By: Tatianna Richardson
- Modified: 09/27/2022