event

PhD Proposal by Hantian Zhang

Primary tabs

Title

Data-Centric Bias Mitigation in Machine Learning

 

Hantian Zhang

Ph.D. Candidate in Computer Science

School of Computer Science

Georgia Institute of Technology

 

Date/Time: Nov 16, 2023, 8:00 AM to 10:00AM Eastern Time (US and Canada)

Location: Klaus 3100 or join with zoom via https://gatech.zoom.us/j/98209258105?pwd=VWp1ZmhIdlN2dWMzV2EwVnJjc0xmUT09

Join our Cloud HD Video Meeting

Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a publicly traded company headquartered in San Jose, CA.

gatech.zoom.us

 

 

 

Committee:

Dr. Xu Chu(co-advisor), School of Computer Science, Georgia Institute of Technology

Dr. Kexin Rong(co-advisor), School of Computer Science, Georgia Institute of Technology

Dr. Joy Arulraj, School of Computer Science, Georgia Institute of Technology

Dr. Shamkant Navathe, School of Computer Science, Georgia Institute of Technology

Dr. Steven Whang, School of Electrical Engineering, KAIST

 

 

Abstract:

As Machine Learning (ML) becomes increasingly central to decision-making processes in our society, it is crucial to acknowledge the potential of these ML models to inadvertently perpetuate biases, disproportionately impacting certain demographic groups and individuals. For instance, some ML models used in judicial systems have shown biases against African Americans when predicting recidivism rates. Therefore, addressing the inherent biases and ensuring fairness in ML models is imperative. While enhancements in fairness can be implemented by changing the ML models directly, I argue that a more foundational solution lies in correcting the data as biased data is often the root cause of unfairness.

 

In my proposed thesis, I aim to systematically understand and mitigate biases in ML models in the full ML life-cycle, from data preparation (pre-processing), to model training (in-processing) and model validation (post-processing). First, I develop a pioneering system, iFlipper, that optimizes for individual fairness in ML. iFlipper enhances training data during data preparation by adjusting the labels, thus mitigating inconsistencies that arise when similar individuals receive varying outcomes. Subsequently, I introduce a declarative system OmniFair that aims at bolstering group fairness in ML. OmniFair allows users to define specific group fairness constraints and change the weight of each training sample during the training process to achieve given group fairness constraints. Finally, I propose to discover and explain semantically coherent subsets (slices) of unstructured data where the ML models underperform after the models are trained . With a good understanding of where the ML models are doing poorly, we can improve the ML models by augmenting the dataset and more examples for that specific slice.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/09/2023
  • Modified By:Tatianna Richardson
  • Modified:11/09/2023

Categories

Keywords

Target Audience