{"678594":{"#nid":"678594","#data":{"type":"news","title":" Researchers Say AI Copyright Cases Could Have Negative Impact on Academic Research","body":[{"value":"\u003Cp\u003EDeven Desai and Mark Riedl have seen the signs for a while.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ETwo years since OpenAI introduced ChatGPT, dozens of lawsuits have been filed alleging technology companies have infringed copyright by using published works to train artificial intelligence (AI) models.\u003C\/p\u003E\u003Cp\u003EAcademic AI research efforts could be significantly hindered if courts rule in the plaintiffs\u0027 favor.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EDesai and Riedl are Georgia Tech researchers raising awareness about how these court rulings could force academic researchers to construct new AI models with limited training data. The two collaborated on a benchmark academic paper that examines the landscape of the ethical issues surrounding AI and copyright in industry and academic spaces.\u003C\/p\u003E\u003Cp\u003E\u201cThere are scenarios where courts may overreact to having a book corpus on your computer, and you didn\u2019t pay for it,\u201d Riedl said. \u201cIf you trained a model for an academic paper, as my students often do, that\u2019s not a problem right now. The courts could deem training is not fair use. That would have huge implications for academia.\u003C\/p\u003E\u003Cp\u003E\u201cWe want academics to be free to do their research without fear of repercussions in the marketplace because they\u2019re not competing in the marketplace,\u201d Riedl said.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Ca href=\u0022https:\/\/www.scheller.gatech.edu\/directory\/faculty\/desai\/index.html\u0022\u003E\u003Cstrong\u003EDesai\u003C\/strong\u003E\u003C\/a\u003E is the Sue and John Stanton Professor of Business Law and Ethics at the \u003Ca href=\u0022https:\/\/www.scheller.gatech.edu\/index.html\u0022\u003E\u003Cstrong\u003EScheller College of Business\u003C\/strong\u003E\u003C\/a\u003E. He researches how business interests and new technology shape privacy, intellectual property, and competition law. \u003Ca href=\u0022https:\/\/eilab.gatech.edu\/mark-riedl.html\u0022\u003E\u003Cstrong\u003ERiedl\u003C\/strong\u003E\u003C\/a\u003E is a professor at the College of Computing\u2019s \u003Ca href=\u0022https:\/\/ic.gatech.edu\/\u0022\u003E\u003Cstrong\u003ESchool of Interactive Computing\u003C\/strong\u003E\u003C\/a\u003E, researching human-centered AI, generative AI, explainable AI, and gaming AI.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ETheir paper, \u003Cem\u003EBetween Copyright and Computer Science: The Law and Ethics of Generative AI\u003C\/em\u003E, was published in the \u003Ca href=\u0022https:\/\/scholarlycommons.law.northwestern.edu\/njtip\/vol22\/iss1\/2\/\u0022\u003E\u003Cstrong\u003ENorthwestern Journal of Technology and Intellectual Property\u003C\/strong\u003E\u003C\/a\u003E on Monday.\u003C\/p\u003E\u003Cp\u003EDesai and Riedl say they want to offer solutions that balance the interests of various stakeholders. But that requires compromise from all sides.\u003C\/p\u003E\u003Cp\u003EResearchers should accept they may have to pay for the data they use to train AI models. Content creators, on the other hand, should receive compensation, but they may need to accept less money to ensure data remains affordable for academic researchers to acquire.\u003C\/p\u003E\u003Ch4\u003E\u003Cstrong\u003EWho Benefits?\u003C\/strong\u003E\u003C\/h4\u003E\u003Cp\u003EThe doctrine of fair use is at the center of every copyright debate. According to the U.S. Copyright Office, fair use permits the unlicensed use of copyright-protected works in certain circumstances, such as distributing information for the public good, including teaching and research.\u003C\/p\u003E\u003Cp\u003EFair use is often challenged when one or more parties profit from published works without compensating the authors.\u003C\/p\u003E\u003Cp\u003EAny original published content, including a personal website on the internet, is protected by copyright. However, copyrighted material is republished on websites or posted on social media innumerable times every day without the consent of the original authors.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EIn most cases, it\u2019s unlikely copyright violators gained financially from their infringement.\u003C\/p\u003E\u003Cp\u003EBut Desai said business-to-business cases are different. \u003Ca href=\u0022https:\/\/www.nytimes.com\/2023\/12\/27\/business\/media\/new-york-times-open-ai-microsoft-lawsuit.html\u0022\u003E\u003Cstrong\u003EThe New York Times\u003C\/strong\u003E\u003C\/a\u003E is one of many daily newspapers and media companies that have sued OpenAI for using its content as training data. Microsoft is also a defendant in The New York Times\u2019 suit because it invested billions of dollars into OpenAI\u2019s development of AI tools like ChatGPT.\u003C\/p\u003E\u003Cp\u003E\u201cYou can take a copyrighted photo and put it in your Twitter post or whatever you want,\u201d Desai said. \u201cThat\u2019s probably annoying to the owner. Economically, they probably wanted to be paid. But that\u2019s not business to business. What\u2019s happening with Open AI and The New York Times is business to business. That\u2019s big money.\u201d\u003C\/p\u003E\u003Cp\u003EOpenAI started as a nonprofit dedicated to the safe development of artificial general intelligence (AGI) \u2014 AI that, in theory, can rival human thinking and possess autonomy.\u003C\/p\u003E\u003Cp\u003EThese AI models would require massive amounts of data and expensive supercomputers to process that data. OpenAI could not raise enough money to afford such resources, so it created a for-profit arm controlled by its parent nonprofit.\u003C\/p\u003E\u003Cp\u003EDesai, Riedl, and many others argue that OpenAI ceased its research mission for the public good and began developing consumer products.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cIf you\u2019re doing basic research that you\u2019re not releasing to the world, it doesn\u2019t matter if every so often it plagiarizes The New York Times,\u201d Riedl said. \u201cNo one is economically benefitting from that. When they became a for-profit and produced a product, now they were making money from plagiarized text.\u201d\u003C\/p\u003E\u003Cp\u003EOpenAI\u2019s for-profit arm is valued at $80 billion, but content creators have not received a dime since the company has scraped massive amounts of copyrighted material as training data.\u003C\/p\u003E\u003Cp\u003EThe New York Times has posted warnings on its sites that its content cannot be used to train AI models. Many other websites offer a robot.txt file that contains instructions for bots about which pages can and cannot be accessed.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ENeither of these measures are legally binding and are often ignored.\u003C\/p\u003E\u003Ch4\u003E\u003Cstrong\u003ESolutions\u003C\/strong\u003E\u003C\/h4\u003E\u003Cp\u003EDesai and Riedl offer a few options for companies to show good faith in rectifying the situation.\u003C\/p\u003E\u003Cul\u003E\u003Cli\u003ESpend the money. Desai says Open AI and Microsoft could have afforded its training data and avoided the hassle of legal consequences.\u003Cbr\u003E\u003Cbr\u003E\u201cIf you do the math on the costs to buy the books and copy them, they could have paid for them,\u201d he said. \u201cIt would\u2019ve been a multi-million dollar investment, but they\u2019re a multi-billion dollar company.\u201d\u003Cbr\u003E\u0026nbsp;\u003C\/li\u003E\u003Cli\u003EBe selective. Models can be trained on randomly selected texts from published works, allowing the model to understand the writing style without plagiarizing.\u0026nbsp;\u003Cbr\u003E\u003Cbr\u003E\u201cI don\u2019t need the entire text of War and Peace,\u201d Desai said. \u201cTo capture the way authors express themselves, I might only need a hundred pages. I\u2019ve also reduced the chance that my model will cough up entire texts.\u201d\u003Cbr\u003E\u0026nbsp;\u003C\/li\u003E\u003Cli\u003ELeverage libraries. The authors agree libraries could serve as an ideal middle ground as a place to store published works and compensate authors for access to those works, though the amount may be less than desired.\u003Cbr\u003E\u003Cbr\u003E\u201cMost of the objections you could raise are taken care of,\u201d Desai said. \u201cThey are legitimate access copies that are secure. You get access to only as much as you need. Libraries at universities have already become schools of information.\u201d\u003C\/li\u003E\u003C\/ul\u003E\u003Cp\u003EDesai and Riedl hope the legal action taken by publications like The New York Times will send a message to companies that develop AI tools to pump the breaks. If they don\u2019t, researchers uninterested in profit could pay the steepest price.\u003C\/p\u003E\u003Cp\u003EThe authors say it\u2019s not a new problem but is reaching a boiling point.\u003C\/p\u003E\u003Cp\u003E\u201cIn the history of copyright, there are ways that society has dealt with the problem of compensating creators and technology that copies or reduces your ability to extract money from your creation,\u201d Desai said. \u201cWe wanted to point out there\u2019s a way to get there.\u201d\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003ETwo years since OpenAI introduced ChatGPT, dozens of lawsuits have been filed alleging technology companies have infringed copyright by using published works to train artificial intelligence (AI) models.\u003C\/p\u003E\u003Cp\u003EAcademic AI research efforts could be significantly hindered if courts rule in the plaintiffs\u0027 favor.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EDesai and Riedl are Georgia Tech researchers raising awareness about how these court rulings could force academic researchers to construct new AI models with limited training data. The two collaborated on a benchmark academic paper that examines the landscape of the ethical issues surrounding AI and copyright in industry and academic spaces.\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Deven Desai and Mark Riedl are Georgia Tech researchers raising awareness about how court rulings for AI copyright cases could force academic researchers to construct new AI models with limited training data."}],"uid":"36530","created_gmt":"2024-11-21 18:41:45","changed_gmt":"2024-12-11 18:51:23","author":"Nathan Deen","boilerplate_text":"","field_publication":"","field_article_url":"","location":"Atlanta, GA","dateline":{"date":"2024-11-21T00:00:00-05:00","iso_date":"2024-11-21T00:00:00-05:00","tz":"America\/New_York"},"extras":[],"hg_media":{"675713":{"id":"675713","type":"image","title":"006_Deven Desai + Mark Riedl_86A8863.jpg","body":null,"created":"1732214565","gmt_created":"2024-11-21 18:42:45","changed":"1732214565","gmt_changed":"2024-11-21 18:42:45","alt":"Deven Desai and Mark Riedl","file":{"fid":"259369","name":"006_Deven Desai + Mark Riedl_86A8863.jpg","image_path":"\/sites\/default\/files\/2024\/11\/21\/006_Deven%20Desai%20%2B%20Mark%20Riedl_86A8863.jpg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2024\/11\/21\/006_Deven%20Desai%20%2B%20Mark%20Riedl_86A8863.jpg","mime":"image\/jpeg","size":101688,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2024\/11\/21\/006_Deven%20Desai%20%2B%20Mark%20Riedl_86A8863.jpg?itok=il8z2cMB"}}},"media_ids":["675713"],"groups":[{"id":"47223","name":"College of Computing"},{"id":"1188","name":"Research Horizons"},{"id":"50876","name":"School of Interactive Computing"}],"categories":[{"id":"153","name":"Computer Science\/Information Technology and Security"},{"id":"151","name":"Policy, Social Sciences, and Liberal Arts"},{"id":"135","name":"Research"}],"keywords":[{"id":"192863","name":"go-ai"},{"id":"9153","name":"Research Horizons"},{"id":"187812","name":"artificial intelligence (AI)"},{"id":"193860","name":"Artifical Intelligence"},{"id":"10828","name":"copyright"},{"id":"190302","name":"copyright law"},{"id":"38031","name":"copyright lawsuits"},{"id":"43101","name":"Georgia Tech Scheller College of Business"},{"id":"187915","name":"go-researchnews"}],"core_research_areas":[{"id":"193655","name":"Artificial Intelligence at Georgia Tech"},{"id":"39501","name":"People and Technology"},{"id":"39511","name":"Public Service, Leadership, and Policy"}],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003ENathan Deen\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ECommunications Officer\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ESchool of Interactive Computing\u003C\/p\u003E","format":"limited_html"}],"email":["ndeen6@gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}