event

PhD Defense by Sarah Wiegreffe

Primary tabs

Title: Interpreting Neural Networks for and with Natural Language

Date: Monday, May 9th, 2022

Time: 2:00-4:00pm (ET)

Location (hybrid): CODA C1108 Brookhaven and Zoom

 

Sarah Wiegreffe

PhD Candidate in Computer Science

School of Interactive Computing
College of Computing
Georgia Institute of Technology

Committee:
Dr. Mark Riedl (advisor, School of Interactive Computing, Georgia Institute of Technology)
Dr. Alan Ritter (School of Interactive Computing, Georgia Institute of Technology)
Dr. Wei Xu (School of Interactive Computing, Georgia Institute of Technology)
Dr. Noah Smith (Paul G. Allen School of Computer Science & Engineering, University of Washington)
Dr. Sameer Singh (Bren School of Information and Computer Sciences, University of California at Irvine)

Abstract:

In the last decade, real-world applications of NLP technologies have become more widespread and more useful than ever before, in large part thanks to advances in deep learning. The increasing size and nonlinearity of these models results in an opacity that hinders efforts by machine learning practitioners and lay-users alike to understand model internals and derive meaning or trust from their predictions.

 

The fields of explainable artificial intelligence and more specifically explainable NLP have emerged as an active area for remedying this opacity and for ensuring models' reliability and trustworthiness in high-stakes scenarios. Models that produce justifications can be inspected for the purposes of debugging, quantifying bias and fairness, understanding model behavior, and ascertaining robustness and privacy. Textual explanations, such as highlights and free-text explanations, are uniquely valuable because of the natural communicative affordances language provides over other modalities.

 

In this dissertation, I propose test suites for evaluating the quality of model explanations under two definitions of meaning: faithfulness and human acceptability. I introduce new ways of evaluating faithfulness of highlight explanations with model-based adversarial search and non-contextual probing models, and of free-text explanations with robustness equivalence and feature importance agreement. I show that a natural language bottleneck increases the likelihood of faithful highlights in neural architectures. I introduce new ways of evaluating human acceptability with crowdsourcing methods inspired by the psychology of explanation, and show that an overgeneration-plus-filtration system improves the acceptability of model-generated free-text explanations. This work strives to increase the likelihood of positive use and outcomes when AI systems are deployed in practice.  

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:04/20/2022
  • Modified By:Tatianna Richardson
  • Modified:04/20/2022

Categories

Keywords