New Training Data Labeling System for Machine Learning Helps Developers


Tess Malone, Communications Officer

Sidebar Content
No sidebar content submitted.

Summary Sentence:

A team of Georgia Tech researchers has created a system that allows users to more effectively label a training dataset with higher accuracy than current methods.

Full Summary:

No summary paragraph submitted.

  • Goggles Goggles

Machine learning (ML) has become one of the most prominent forms of data analysis for everything from fraud detection to visual quality control. Yet the analytic results can often suffer from insufficiently labeled training data.

A team of Georgia Tech researchers has created a system that allows users to more effectively label a training dataset with higher accuracy than current methods.

“We are looking at the problem from a data management perspective,” said School of Computer Science (SCS) Assistant Professor Xu Chu. “In contrast to a lot of ML research that tries to tackle the lack of sufficient training data from an ML algorithm design perspective, we aim at building a system that helps users effectively label a dataset.”

The system, called GOGGLES, labels datasets using affinity coding, a paradigm that allows ML engineers to use various affinity functions that input two unlabeled examples and output a real-valued score.

“You can think of affinity as similarity,” said Chu. “The core premise of the work is that two examples share the same label if they are similar according to some affinity functions (or similarity functions).”

The benefits of affinity coding

GOGGLES uses a set of affinity functions that can capture various affinities found in the image. Next, using a new unlabeled dataset and these affinity functions, GOGGLES constructs an affinity matrix, from which it can assign classes to unlabeled images. This doesn’t require any metadata or developer intervention like previous .

For each new dataset, users can potentially reuse many of the existing affinity functions already in the library, making GOGGLES a domain-agnostic labeling system. Users and developers can always add more affinity functions to increase the labeling power of GOGGLES.

On five common image classifying tasks, GOGGLES reaches up to 98 percent accuracy without requiring extensive developer effort. It also outperforms other well-known data programming systems by up to 21 percent.

Chu co-wrote the paper, GOGGLES: Automatic Image Labeling with Affinity Coding,  with Ph.D. students Nilaksh Das and Renzhi Wu, master’s alumni Sanya Chaba and Sakshi Gandhi, and School of Computational Science and Engineering Professor Polo Chau. They presented it at Association for Computing Machinery's Special Interest Group on Management of Data (SIGMOD) and Symposium on Principles of Database Systems (PODS) held virtually from June 14 to 19.  


Additional Information


College of Computing, School of Computational Science and Engineering, School of Computer Science

No categories were selected.
Related Core Research Areas
No core research areas were selected.
Newsroom Topics
No newsroom topics were selected.
No keywords were submitted.
  • Created By: Tess Malone
  • Workflow Status: Published
  • Created On: Jun 29, 2020 - 5:17pm
  • Last Updated: Jun 29, 2020 - 5:19pm