{"685554":{"#nid":"685554","#data":{"type":"news","title":"From Socrates to ChatGPT: The Ancient Lesson AI-powered Language Models Have Yet to Learn","body":[{"value":"\u003Cp\u003EAlthough developed by some of the brightest minds of the 21st century, AI-powered large language models (LLMs) could learn something from one of the greatest minds of the 1st century BCE.\u003C\/p\u003E\u003Cp\u003ESocrates, widely regarded as the founder of Western philosophy, declared, \u0022I know that I know nothing.\u0022 This simple statement highlights the wisdom of acknowledging the limits of one\u0027s own knowledge.\u003C\/p\u003E\u003Cp\u003EA simple statement, yes, but like some people, LLMs struggle with saying \u201cI don\u2019t know.\u201d In fact, LLMs often can\u0027t admit that they don\u0027t know something because of the way they are trained, according to a research team that includes a Georgia Tech computer science (CS) professor.\u003C\/p\u003E\u003Cp\u003EPre-training LLMs involves them learning to predict the next word correctly by training on massive datasets of text, images, or other data. Models are evaluated and adjusted based on their performance against standard benchmarks, which are \u0022rewarded\u0022 for preferred outputs or answers.\u003C\/p\u003E\u003Cp\u003ECurrent evaluation protocols, however, penalize non-responses the same as incorrect answers and do not include an \u0022I don\u0027t know\u0022 option.\u003C\/p\u003E\u003Cp\u003EAccording to CS Professor \u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/people\/santosh-vempala\u0022\u003E\u003Cstrong\u003ESantosh Vempala\u003C\/strong\u003E\u003C\/a\u003E, these pre- and post-training shortcomings are what lead LLMs to provide seemingly plausible but false responses known as hallucinations.\u003C\/p\u003E\u003Cp\u003EVempala is a co-author of \u003Ca href=\u0022https:\/\/arxiv.org\/abs\/2509.04664\u0022\u003E\u003Cem\u003E\u003Cstrong\u003EWhy Language Models Hallucinate\u003C\/strong\u003E\u003C\/em\u003E\u003C\/a\u003E, a research study from OpenAI and Georgia Tech, released in September. He says that there is a direct correlation between an LLM\u0027s hallucination rate and its misclassification rate regarding the validity of a given response.\u003C\/p\u003E\u003Cp\u003E\u0022This means that if the model can\u0027t tell fact from fiction, it will hallucinate,\u0022 Vempala said.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u0022The problem persists in modern post-training methods for alignment, which are based on evaluation benchmarks that penalize \u0027I don\u0027t know\u0027 as much as wrong answers.\u0022\u003C\/p\u003E\u003Cp\u003EBecause of the penalties for knowing that it knows nothing \u2013 to paraphrase Socrates \u2013 guessing is a more rewarding option for current LLMs than admitting uncertainty or ignorance.\u003C\/p\u003E\u003Cp\u003EThe research incorporates and builds on prior work from Vempala and \u003Ca href=\u0022https:\/\/kal.ai\/\u0022\u003E\u003Cstrong\u003EAdam Kalai\u003C\/strong\u003E\u003C\/a\u003E, an OpenAI researcher and lead author of the current paper. \u003Ca href=\u0022https:\/\/dl.acm.org\/doi\/10.1145\/3618260.3649777\u0022\u003E\u003Cstrong\u003ETheir earlier work found that LLM hallucinations are mathematically unavoidable for arbitrary facts, given current training methodologies\u003C\/strong\u003E\u003C\/a\u003E.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u0022We\u0027ve been talking about this for about two years. One corollary of our paper is that, for arbitrary facts, despite being trained only on valid data, the hallucination rate is determined by the fraction of missing facts in the training data,\u0022 said Vempala, Frederick Storey II Chair of Computing and professor in the \u003Ca href=\u0022https:\/\/scs.gatech.edu\/\u0022\u003E\u003Cstrong\u003ESchool of CS\u003C\/strong\u003E\u003C\/a\u003E.\u003C\/p\u003E\u003Cp\u003ETo illustrate this point, imagine you have a huge Pok\u00e9mon card collection. Pikachu is so familiar that you can confidently describe its moves and abilities. However, accurately remembering facts about Pikachu Libre, an extremely rare card, would likely be more difficult.\u003C\/p\u003E\u003Cp\u003E\u201cMore to the point, if your collection has a large fraction of unique cards, then it is likely that you are still missing a large fraction of the overall set of cards. This is known as the Good-Turing estimate,\u201d Vempala said.\u003C\/p\u003E\u003Ch6\u003E\u003Ca href=\u0022https:\/\/openai.com\/index\/why-language-models-hallucinate\/\u0022\u003E\u003Cstrong\u003E[OpenAI Blog: Why Language Models Hallucinate]\u003C\/strong\u003E\u003C\/a\u003E\u003C\/h6\u003E\u003Cp\u003EAccording to Kalai and Vempala, the same is true for LLMs based on current training protocols.\u003C\/p\u003E\u003Cp\u003E\u201cThink about country capitals,\u201d Kalai said. \u201cThey all appear many times in the training data, so language models don\u2019t tend to hallucinate on those.\u003C\/p\u003E\u003Cp\u003E\u201cOn the other hand, think about the birthdays of people\u2019s pets. When those are mentioned in the training data, it may just be once.\u003C\/p\u003E\u003Cp\u003E\u201cSo, pre-trained language models will hallucinate on those. However, post-training can and should teach the model not to guess randomly on facts like those.\u201d\u003C\/p\u003E\u003Cp\u003EVempala thinks tinkering with pre-training methods could be risky because, overall, they work well and deliver accurate results. However, he and his co-authors offered suggestions for reducing the occurrence of hallucinations with changes to the evaluation and post-training process.\u003C\/p\u003E\u003Cp\u003EAmong the team\u0027s recommended changes is that more value be placed on the accuracy of an LLM\u0027s responses rather than on how comprehensive their responses are. The team also suggests implementing what it refers to as \u201cbehavioral calibration.\u201d\u003C\/p\u003E\u003Cp\u003EUsing this methodology, LLMs would only answer if their confidence level exceeds target thresholds. These thresholds would be tuned for different user domains and prompts. They would also appropriately reduce penalties for \u201cI don\u2019t know\u201d responses, along with appropriate expressions of uncertainty and wrong answers.\u003C\/p\u003E\u003Cp\u003EVempala believes that implementing some of these modifications could result in LLMs that are trained to be more cautious and truthful. This shift could lead to more intelligent systems in the future that can handle nuanced, real-world conversations more effectively.\u003C\/p\u003E\u003Cp\u003E\u0022We hope our recommendations will lead to more trustworthy AI,\u0022 said Vempala. \u0022However, implementing these modifications to how LLMs are currently evaluated will require acceptance and support from AI companies and users.\u0022\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EIn an effort toward building more trustworthy AI, Georgia Tech CS Professor \u003Cstrong\u003ESantosh\u003C\/strong\u003E \u003Cstrong\u003EVempala\u003C\/strong\u003E is a co-author of a new OpenAI study that examines why large language models struggle to say, \u0027I don\u0027t know.\u0027\u0026nbsp;\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"A Georgia Tech CS professor is a co-author of a new OpenAI study that examines why large language models struggle to say, \u0027I don\u0027t know.\u0027"}],"uid":"32045","created_gmt":"2025-10-06 15:06:14","changed_gmt":"2025-10-09 01:29:46","author":"Ben Snedeker","boilerplate_text":"","field_publication":"","field_article_url":"","location":"Atlanta, GA","dateline":{"date":"2025-10-06T00:00:00-04:00","iso_date":"2025-10-06T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"678273":{"id":"678273","type":"image","title":"AI-generated image of Socrates, sculpted in marble, looking contempletively at a laptop.","body":"\u003Cp\u003EAn Adobe Stock AI-generated image of Socrates, sculpted in marble, looking contemplatively at a laptop.\u003C\/p\u003E","created":"1759763189","gmt_created":"2025-10-06 15:06:29","changed":"1759763189","gmt_changed":"2025-10-06 15:06:29","alt":"AI-generated image of Socrates, sculpted in marble, looking contemplatively at a laptop.","file":{"fid":"262277","name":"AdobeStock_622388016.jpeg","image_path":"\/sites\/default\/files\/2025\/10\/06\/AdobeStock_622388016.jpeg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2025\/10\/06\/AdobeStock_622388016.jpeg","mime":"image\/jpeg","size":67628,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2025\/10\/06\/AdobeStock_622388016.jpeg?itok=ZHGUtAFf"}},"678281":{"id":"678281","type":"image","title":"CS Professor Santosh Vempala is a co-author of a recent research study that explores the role current training and evaluation protocols play in causing LLMs to hallucinate.","body":"\u003Cp\u003E\u003Cem\u003ECS Professor Santosh Vempala is a co-author of a recent research study that explores the role current training and evaluation protocols play in causing LLMs to hallucinate. Photo by Terence Rushin\/College of Computing\u003C\/em\u003E\u003C\/p\u003E","created":"1759770095","gmt_created":"2025-10-06 17:01:35","changed":"1759770095","gmt_changed":"2025-10-06 17:01:35","alt":"CS Professor Santosh Vempala is a co-author of a recent research study that explores the role current training and evaluation protocols play in causing LLMs to hallucinate.","file":{"fid":"262286","name":"CRNCH-Summit-2024_86A0053.jpg","image_path":"\/sites\/default\/files\/2025\/10\/06\/CRNCH-Summit-2024_86A0053_0.jpg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2025\/10\/06\/CRNCH-Summit-2024_86A0053_0.jpg","mime":"image\/jpeg","size":53430,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2025\/10\/06\/CRNCH-Summit-2024_86A0053_0.jpg?itok=CAr5_wtm"}}},"media_ids":["678273","678281"],"groups":[{"id":"47223","name":"College of Computing"},{"id":"1188","name":"Research Horizons"},{"id":"50875","name":"School of Computer Science"}],"categories":[{"id":"194606","name":"Artificial Intelligence"}],"keywords":[{"id":"10199","name":"Daily Digest"},{"id":"187915","name":"go-researchnews"},{"id":"2556","name":"artificial intelligence"},{"id":"187812","name":"artificial intelligence (AI)"},{"id":"181991","name":"Georgia Tech News Center"}],"core_research_areas":[],"news_room_topics":[{"id":"71881","name":"Science and Technology"}],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EBen Snedeker, Comms. Mgr. II\u003Cbr\u003EGeorgia Tech College of Computing\u003Cbr\u003E\u003Ca href=\u0022mailto:albert.snedeker@cc.gatech.edu\u0022\u003Ealbert.snedeker@cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}