event

PhD Proposal by Annabel Rothschild

Primary tabs

Safe from the Start: Developing Pro-Social AI Training Datasets Through Data Workers' Critical Perspectives

 

Annabel Rothschild

Ph.D. Student in Human-centered Computing

School of Interactive Computing

Georgia Institute of Technology

 

Date: 01 April 2024

Time: 09-11am EST

Location: TSRB #223, virtual link

 

Committee:

Dr. Betsy DiSalvo (advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Carl DiSalvo (advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Shaowen Bardzell, School of Interactive Computing, Georgia Institute of Technology

Dr. Ellen Zegura, School of Computer Science, Georgia Institute of Technology

Dr. Richmond Wong, School of Literature, Media, and Communication, Georgia Institute of Technology

Dr. Lauren Klein, Department of English, Emory University

 

Abstract:

 

AI and ML systems are increasingly ubiquitous, with recent advances in LLMs and image generators, such as OpenAI’s ChatGPT and DALL·E, creating new urgency in future of work conversations. My work explores how the massive datasets used to train these systems, collected and curated by a global workforce of data workers, come into being. Specifically, I examine what the perspective and lived experience of a data worker contributes to the data labors they perform.

 

In my research, I build ways to integrate data worker perspective into dataset development at two levels. On the micro level, I trace the impact of CDL on worker perspective and dataset development, and propose a tool to help solidify that process in spreadsheet-based data work. On a macro scale, I advocate for more pro-social treatment of data workers on digital task platforms, such as Amazon MTurk, emphasizing that the benefits of CDL cannot be felt without data workers being made full partners in the AI and ML system development process. My past work includes defining the terms of pro-social task building, for data work requesters using platforms like Amazon MTurk, and understanding how requesters conceptualize workers on these platforms. My proposed work for this macro thread is an investigation of the current communication infrastructure of these platforms, and how it can be leveraged to support the inclusion of data work observation and reflection on tasks completed

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:03/18/2024
  • Modified By:Tatianna Richardson
  • Modified:03/18/2024

Categories

Keywords

Target Audience