{"681071":{"#nid":"681071","#data":{"type":"event","title":"PhD Defense by Annabel Rothschild","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle:\u003C\/strong\u003E\u0026nbsp;SAFE FROM THE START: DEVELOPING PRO-SOCIAL AI TRAINING DATASETS THROUGH DATA WORKERS\u2019 CRITICAL PERSPECTIVES\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EDate:\u003C\/strong\u003E\u0026nbsp;Monday, March 24, 2025\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ETime:\u003C\/strong\u003E\u0026nbsp;10.00-13.00 ET\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EIn-person Location:\u0026nbsp;\u003C\/strong\u003ETSRB 223 (Spark Studio)\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EZoom link:\u003C\/strong\u003E\u0026nbsp;\u003Ca href=\u0022https:\/\/gatech.zoom.us\/j\/91909341662?pwd=7YtWXcSPZXuiXlwgkiJkuObAjrOeWz.1\u0022\u003Ehttps:\/\/gatech.zoom.us\/j\/91909341662?pwd=7YtWXcSPZXuiXlwgkiJkuObAjrOeWz.1\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EAnnabel Rothschild\u003C\/p\u003E\u003Cp\u003EPh.D. Candidate, Human-Centered Computing\u003C\/p\u003E\u003Cp\u003ESchool of Interactive Computing\u003C\/p\u003E\u003Cp\u003ECollege of Computing\u003C\/p\u003E\u003Cp\u003EGeorgia Institute of Technology\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ECommittee:\u0026nbsp;\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EDr. Betsy DiSalvo (advisor),\u0026nbsp;College of Computing, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Carl DiSalvo (co-advisor),\u0026nbsp;College of Computing, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Shaowen Bardzell,\u0026nbsp;College of Computing, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Ellen Zegura,\u0026nbsp;College of Computing,\u0026nbsp;Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Richmond Wong, Ivan Allen College of Liberal Arts, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Lauren Klein, Department of Quantitative Theory \u0026amp; Methods and English, Emory University\u003C\/p\u003E\u003Cp\u003EDr. Ding Wang, Google Research\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ESummary:\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EAI and ML systems are increasingly ubiquitous, with recent advances in LLMs and image generators, such as OpenAI\u2019s ChatGPT and DALL\u00b7E, creating new urgency in future of work conversations [1, 2, 3, 4, 5]. My work explores how the massive datasets used to\u003C\/p\u003E\u003Cp\u003Etrain these systems, collected and curated by a global workforce of data workers, come into being. Specifically, I examine what the perspective and lived experience of a data worker contributes to the data labors they perform.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u2002\u2002\u2002\u2002\u2002\u2002The perspectives of data workers who build the datasets for data-intensive systems, such as AI and ML systems, frequently goes unappreciated. Data workers have a unique on-the-ground view of the dataset and how it has been designed and developed, given that they are the executors of this work. Many of the problems we see with \u201cbiased\u201d AI and ML systems can be traced back to issues with the dataset on which the system was trained. Consider the case of ImageNet, one of the most impactful computer vision (CV) bench-marking datasets to have been developed, facilitated by the labor of Amazon Mechanical Turk (AMT) workers (Turkers) [6]. The labels Turkers were offered to label images were based on WordNet [7], which has been in wide circulation since 2011. These labels, as demonstrated by Prabhu \u0026amp; Birhane, included terms that are offensive and not safe for work (NSFW), along with a host of nonconsensual pornographic terms [8]. Did the Turkers who annotated ImageNet\u2019s entries come across these terms? Could they have alerted the ImageNet designers to problems with the use of WordNet labels before ImageNet became a critical benchmark dataset for CV systems?\u003C\/p\u003E\u003Cp\u003E\u2002\u2002\u2002\u2002\u2002\u2002Having seen the role that data workers equipped with CDL can play in positively shaping datasets, both in technical detail and sociocultural premise, I believe that building healthier, most pro-social AI and ML systems begins with intellectual partnership with data workers in dataset creation and development. My work is motivated by the role that data worker perspective can play when data workers are empowered to practice critical data literacy (CDL), as I observed during my ethnographic fieldwork with DataWorks, a combined work-training program, data services provider, and research platform [9]. CDL goes a step beyond regular data literacy, which refers to a skillset for reading and understanding data statistics and data visualizations [10]. In addition to those skills, practicing CDL requires developing a critical consciousness [11], in the tradition of Paulo Freire [12], which means being able to question how these data summaries were arrived at, what might be behind the motivation for their creation, and whom they benefit. Finally, to practice CDL also requires a workplace that supports this critical practice, namely in the form of encouraging workers to speak up and out about problems or concerns they have with dataset development.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u2002\u2002\u2002\u2002\u2002\u2002My overarching research question (RQ) is: what is the role of perspective in data work, and how can we incorporate the perspective of data workers as partners in dataset contextualization? My consequential work answers three subquestions:\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u2022 RQ1: why do we need better contextualization practices in data work, and what is the current state of data work annotation practices?\u003C\/p\u003E\u003Cp\u003E\u2022 RQ2: what is the relationship between critical data literacy and properly localized AI and ML systems?\u003C\/p\u003E\u003Cp\u003E\u2022 RQ3: how we can collect and integrate more varied perspectives to relocate our AI and ML systems?\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EContributions: My work facilitates the development of healthier, more pro-social AI and ML systems. Situated within critical data studies, the work described in this dissertation builds out approaches to the integration of worker perspective in datasets in large-scale dataset development sites.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E--\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EAnnabel Rothschild, she\/her\u003C\/p\u003E\u003Cp\u003EPhD candidate, Georgia Tech\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003ESAFE FROM THE START: DEVELOPING PRO-SOCIAL AI TRAINING DATASETS THROUGH DATA WORKERS\u2019 CRITICAL PERSPECTIVES\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"SAFE FROM THE START: DEVELOPING PRO-SOCIAL AI TRAINING DATASETS THROUGH DATA WORKERS\u2019 CRITICAL PERSPECTIVES"}],"uid":"27707","created_gmt":"2025-03-11 16:22:51","changed_gmt":"2025-03-11 16:23:32","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2025-03-24T10:00:00-04:00","event_time_end":"2025-03-24T13:00:00-04:00","event_time_end_last":"2025-03-24T13:00:00-04:00","gmt_time_start":"2025-03-24 14:00:00","gmt_time_end":"2025-03-24 17:00:00","gmt_time_end_last":"2025-03-24 17:00:00","rrule":null,"timezone":"America\/New_York"},"location":": TSRB 223 (Spark Studio)","extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}