{"677155":{"#nid":"677155","#data":{"type":"news","title":"The Impact of Data Augmentation: Georgia Tech Researchers Lead NSF Study","body":[{"value":"\u003Cp\u003EIn the past year, Georgia Tech researchers\u0026nbsp;\u003Ca href=\u0022https:\/\/vmuthukumar.ece.gatech.edu\/\u0022\u003E\u003Cstrong\u003EVidya Muthukumar\u003C\/strong\u003E\u003C\/a\u003E\u0026nbsp;and\u0026nbsp;\u003Ca href=\u0022https:\/\/bme.gatech.edu\/bme\/faculty\/Eva-Dyer\u0022\u003E\u003Cstrong\u003EEva Dyer\u003C\/strong\u003E\u003C\/a\u003E\u0026nbsp;have made a powerful impression on the National Science Foundation (NSF), forging partnerships between their labs and the foundation that may ultimately lead to more efficient, equitable, human-centered, and human-like artificial intelligence, or AI.\u003C\/p\u003E\u003Cp\u003EWorking at the forefront of research in AI and machine learning, the two are both recent\u0026nbsp;\u003Ca href=\u0022https:\/\/coe.gatech.edu\/news\/2023\/03\/nsf-awards-sought-after-career-funding-5-engineering-faculty\u0022\u003E\u003Cstrong\u003ENSF CAREER Award winners\u003C\/strong\u003E\u003C\/a\u003E\u0026nbsp;\u2013 and are collaborators in a multi-institutional, three-year, $1.2 million effort supported by the NSF\u2019s Division of Information and Intelligent Systems.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cOur goal is to provide a precise understanding of the impact of data augmentation on generalization,\u201d said Muthukumar, assistant professor in the\u0026nbsp;\u003Ca href=\u0022https:\/\/ece.gatech.edu\/\u0022\u003E\u003Cstrong\u003ESchool of Electrical and Computer Engineering\u003C\/strong\u003E\u003C\/a\u003E, and the\u0026nbsp;\u003Ca href=\u0022https:\/\/www.isye.gatech.edu\/\u0022\u003E\u003Cstrong\u003ESchool of Industrial and Systems Engineering\u003C\/strong\u003E\u003C\/a\u003E. She\u2019s also principal investigator of the NSF project called,\u0026nbsp;\u003Ca href=\u0022https:\/\/www.nsf.gov\/awardsearch\/showAward?AWD_ID=2212182\u0026amp;HistoricalAwards=false\u0022\u003E\u003Cstrong\u003E\u201cDesign principles and theory for data augmentation.\u201d\u003C\/strong\u003E\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003EGeneralization is a hallmark of basic human intelligence \u2013 if you eat a food that makes you sick, you\u2019ll likely avoid foods that look or smell like that food in the future. That\u2019s generalization at work, something that we do naturally, but takes a greater effort to do efficiently in artificial intelligence.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ETo build more generalizable AI, developers use data augmentation (DA), in which new data samples are generated from existing datasets to improve the performance of machine learning models. For example, data augmentation is often used in computer vision \u2013 existing image data is augmented through techniques like rotation, cropping, flipping, resizing, and so forth.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EBasically, data augmentation artificially increases the amount of training data used in machine learning models. The idea is, a machine learning model trained on augmented images of dogs is better equipped to recognize dogs in different environments, poses, and angles, even if the environments, poses, and angles are different from those seen during initial model training.\u003C\/p\u003E\u003Cp\u003E\u201cBut data augmentation procedures are currently done in an in an ad-hoc manner,\u201d said Muthukumar. \u201cIt\u2019s like, let\u2019s apply this and see if it works.\u201d\u003C\/p\u003E\u003Cp\u003EThey are designed and tested on a dataset-by-dataset basis, which isn\u2019t very efficient. Also, augmented data does not always have the desired effects \u2013 it can do more harm than good. So, Muthukumar, Dyer, and their collaborators are developing a theory, a set of fundamental principles to understand DA and its impact on machine learning and AI.\u003C\/p\u003E\u003Cp\u003E\u201cOur aim is to leverage what we learn to design novel augmentations that can be used across multiple applications and domains,\u201d said Dyer, assistant professor in the\u0026nbsp;\u003Ca href=\u0022https:\/\/bme.gatech.edu\/bme\/\u0022\u003E\u003Cstrong\u003EWallace H. Coulter Department of Biomedical Engineering at Georgia Tech and Emory University\u003C\/strong\u003E\u003C\/a\u003E.\u003C\/p\u003E\u003Ch3\u003E\u003Cstrong\u003EGood, Bad, and Weird\u003C\/strong\u003E\u003C\/h3\u003E\u003Cp\u003EMuthukumar became interested in data augmentation when she was a graduate student at University of California at Berkeley.\u003C\/p\u003E\u003Cp\u003E\u201cWhat I found intriguing was how everyone seemed to view the role of data augmentation so differently,\u201d she said. During a summer internship she was part of an effort to resolve racial disparities in a machine\u2019s classification of facial images, \u201ca commonly encountered problem in which the computer might perform well with classifying white males, but not so well with dark-skinned females.\u201d\u003C\/p\u003E\u003Cp\u003EThe researchers employed artificial data augmentation techniques \u2013 essentially, boosting their learning model\u2019s dataset by adding virtualized facial images with different skin tones and colors. But to Muthukumar\u2019s surprise, the solution didn\u2019t work very well.\u0026nbsp; \u201cThis was an example of data augmentation not living up to its promise,\u201d she said. \u201cWhat we\u2019re finding is, sometimes data augmentation is good, sometimes it\u2019s bad, sometimes it\u2019s just weird.\u201d\u003C\/p\u003E\u003Cp\u003EThat assessment, in fact, is almost the title of a paper Muthukumar and Dyer have submitted to a leading journal: \u201cThe good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective.\u201d Currently under revision before publication, the paper lays out their foundational theory for understanding how DA impacts machine learning.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EThe work is the latest manifestation of a research partnership that began when Muthukumar arrived at Georgia Tech in January 2021, and connected with\u0026nbsp;Dyer,\u0026nbsp;whose\u0026nbsp;\u003Ca href=\u0022https:\/\/dyerlab.gatech.edu\/\u0022\u003E\u003Cstrong\u003ENerDS Lab\u003C\/strong\u003E\u003C\/a\u003E\u0026nbsp;has a wide-angled focus, spanning the areas of machine learning, neuroscience, and neuro AI (her work is fostering a knowledge loop \u2013 the development of new AI tools for brain decoding and new neuro-inspired AI systems).\u003C\/p\u003E\u003Cp\u003E\u201cWe started talking about how data augmentation does something very subtle to a dataset, changing what the learning model does at a very fundamental level,\u201d Muthtukumar said. \u201cWe asked, \u2018what the heck is this data augmentation doing? Why is it working, or why isn\u2019t it? And, what types of augmentation work and what types don\u2019t?\u2019\u201d\u003C\/p\u003E\u003Cp\u003EThose questions led to their current NSF project, supported through September 2025. Muthukumar is leading the effort, joined by co-principal investigators Dyer;\u0026nbsp;\u003Ca href=\u0022https:\/\/mdav.ece.gatech.edu\/\u0022\u003E\u003Cstrong\u003EMark Davenport\u003C\/strong\u003E\u003C\/a\u003E, professor in Georgia Tech\u2019s School of Electrical and Computer Engineering; and\u0026nbsp;\u003Ca href=\u0022http:\/\/www.cs.umd.edu\/~tomg\/\u0022\u003E\u003Cstrong\u003ETom Goldstein\u003C\/strong\u003E\u003C\/a\u003E, associate professor in the Department of Computer Science at the University of Maryland.\u003C\/p\u003E\u003Ch3\u003E\u003Cstrong\u003EClever, Informed DA\u003C\/strong\u003E\u003C\/h3\u003E\u003Cp\u003EThe four researchers comprise a kind of super-team of machine learning experts. Davenport, a member of the\u0026nbsp;\u003Ca href=\u0022https:\/\/ml.gatech.edu\/\u0022\u003E\u003Cstrong\u003ECenter for Machine Learning\u003C\/strong\u003E\u003C\/a\u003E\u0026nbsp;and the\u0026nbsp;\u003Ca href=\u0022https:\/\/csip.ece.gatech.edu\/\u0022\u003E\u003Cstrong\u003ECenter for Signal and Information Processing\u003C\/strong\u003E\u003C\/a\u003E\u0026nbsp;at Georgia Tech, aims his research on the complex interaction of signal processing, statistical inference, and machine learning. He\u2019s collaborated with both Dyer and Muthukumar on recent research papers.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EGoldstein\u2019s work lies at the intersection of machine learning and optimization. A member of the Institute for Advanced Computer Studies at Maryland, he was part of the research team that recently developed a \u201cwatermark\u201d that can expose text written by artificial intelligence.\u003C\/p\u003E\u003Cp\u003EDyer is a computational neuroscientist whose research has blurred the line between neuroscience and machine learning, and her lab has made advances in neural recording and gathering data. Muthukumar is orchestrating all of this expertise to thoroughly characterize data augmentation\u2019s impact on generalization in machine learning.\u003C\/p\u003E\u003Cp\u003E\u201cWe hope to gain a full understanding of its influence on learning \u2013 when it helps and when it hurts,\u201d Muthukumar said. Furthermore, the team aims to broaden the promise of data augmentation, expanding its effective use in other areas, such as neuroscience, graphs, and tabular data.\u003C\/p\u003E\u003Cp\u003E\u201cOverall, there\u2019s promise in being able to do a lot more with data augmentations, if we do it in a clever and informed kind of way,\u201d Dyer said. \u201cWe can build more robust brain-machine interfaces, we can improve fairness and transparency. This work can have tremendous long-range impact, especially regarding neuroscience and biomedical data.\u201d\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EGeorgia Tech\u0027s Vidya Muthukumar and Eva Dyer are spearheading a $1.2 million NSF-funded project to understand how data augmentation (DA) influences generalization in machine learning, a key component of AI\u0027s ability to make human-like decisions. Their research seeks to refine DA techniques for broader applications by developing more efficient and reliable methods across various domains.\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Georgia Tech researchers Vidya Muthukumar and Eva Dyer are leading a multi-institutional project to develop a theory for data augmentation, aiming to improve the generalization and fairness of AI systems."}],"uid":"28153","created_gmt":"2024-09-26 18:35:08","changed_gmt":"2024-09-26 18:49:50","author":"Jerry Grillo","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2024-06-15T00:00:00-04:00","iso_date":"2024-06-15T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"675133":{"id":"675133","type":"image","title":"VidyaEva","body":"\u003Cp\u003EVidya Muthukumar and Eva Dyer have formed a research partnership that may lead tohuman-centered, and human-like artificial intelligence. \u0026nbsp; \u0026nbsp; Photo by Jerry Grillo\u003C\/p\u003E","created":"1727375152","gmt_created":"2024-09-26 18:25:52","changed":"1727375300","gmt_changed":"2024-09-26 18:28:20","alt":"Vidya Muthukumar and Eva Dyer","file":{"fid":"258727","name":"VidyaEva.jpg","image_path":"\/sites\/default\/files\/2024\/09\/26\/VidyaEva.jpg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2024\/09\/26\/VidyaEva.jpg","mime":"image\/jpeg","size":3617213,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2024\/09\/26\/VidyaEva.jpg?itok=SwMY48HG"}},"675134":{"id":"675134","type":"image","title":"EvaVidya","body":"\u003Cp\u003EEva Dyer and Vidya Muthukumar\u003C\/p\u003E","created":"1727375315","gmt_created":"2024-09-26 18:28:35","changed":"1727375360","gmt_changed":"2024-09-26 18:29:20","alt":"Eva Dyer and Vidya Muthukumar","file":{"fid":"258729","name":"EvaVidya.jpg","image_path":"\/sites\/default\/files\/2024\/09\/26\/EvaVidya.jpg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2024\/09\/26\/EvaVidya.jpg","mime":"image\/jpeg","size":4246920,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2024\/09\/26\/EvaVidya.jpg?itok=CrMhbIPq"}}},"media_ids":["675133","675134"],"groups":[{"id":"1292","name":"Parker H. Petit Institute for Bioengineering and Bioscience (IBB)"},{"id":"1188","name":"Research Horizons"}],"categories":[{"id":"138","name":"Biotechnology, Health, Bioengineering, Genetics"},{"id":"153","name":"Computer Science\/Information Technology and Security"},{"id":"146","name":"Life Sciences and Biology"}],"keywords":[{"id":"187915","name":"go-researchnews"},{"id":"187423","name":"go-bio"},{"id":"193860","name":"Artifical Intelligence"},{"id":"192783","name":"data augmentation"},{"id":"177339","name":"AI machine learning"},{"id":"175946","name":"Eva Dyer"},{"id":"186736","name":"Vidya Muthukumar"}],"core_research_areas":[{"id":"193655","name":"Artificial Intelligence at Georgia Tech"}],"news_room_topics":[{"id":"71891","name":"Health and Medicine"},{"id":"71881","name":"Science and Technology"}],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003E\u003Ca href=\u0022mailto:jerry.grillo@ibb.gatech.edu\u0022\u003EJerry Grillo\u003C\/a\u003E\u003C\/p\u003E","format":"limited_html"}],"email":["jerry.grillo@ibb.gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}