PhD Defense by Terrance Law
Title: Exploring User Perception of Causality in Automated Data Insights
PhD Candidate in Computer Science
School of Interactive Computing
Georgia Institute of Technology
Date: April 9th, 2021 (Friday)
Time: 4:00 PM - 6:00 PM (ET)
Dr. Alex Endert (advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. John Stasko (advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. Duen Horng (Polo) Chau - School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Enrico Bertini - NYU Tandon School of Engineering, New York University
Dr. Jian Zhao - Cheriton School of Computer Science, University of Waterloo
To facilitate data exploration and analysis, researchers have studied and developed systems that aim to automatically communicate data insights to users. For a data set of cars, these systems may proactively recommend a bar chart that depicts interesting data relationships and generate a description of the chart such as "US cars have a higher average horsepower than Japanese cars." Such automated insight functionality has emerged in commercial visualization platforms, enabling thousands of analysts to tap into its power.
Despite the wide deployment of these automated insight systems, we lack an understanding of their side effects on users during data analysis. The use of these systems is concerning because automated analysis could be unreliable: These systems may generate questionable claims about the data due to poor data quality, violation of model assumptions, causal inference from observational data, lack of domain knowledge, and sampling variability. The unreliability could lead to the potential misinterpretation of data and other associated consequences (e.g., poor decisions and financial losses).
This dissertation intends to advance our knowledge about the misleading side effect of automated insight systems. First, I define automated insights by reviewing the prevailing definitions of insight and highlight the criticality of creating automated insight systems with ethical considerations in mind. To understand the landscape of automated insight systems, I conducted a literature survey, identified the types of statistical information these systems provide, and summarized the sources of unreliability when this statistical information is automatically generated.
With an understanding of what are automated insights and why they could be unreliable, I interviewed 23 professional users of visualization systems to learn about their concerns with the use of automated insight systems. Some interviewees were worried about being misled by these systems. Motivated by this finding, I study whether automated insights could mislead users in reality and if they could, how to promote a correct interpretation. Using automated insights about causation as a case study, I conducted crowdsourced studies with more than 400 participants to investigate the scenarios when misinterpretation occurs and the effectiveness of warning in preventing misinterpretation.
More broadly, this dissertation pertains to the conceptualization of human-centered artificial intelligence in data analysis systems. Throughout the dissertation, I highlight the potential ethical consequences as system developers attempt to automate aspects of data analysis. My dissertation provides empirical evidence from interviews and controlled experiments to illustrate that when automating data analysis, harmful consequences such as data misinterpretation could happen. Based on the empirical findings, it offers guidance on designing more usable and safer artificial intelligence in data analysis systems.