news
From Socrates to ChatGPT: The Ancient Lesson AI-powered Language Models Have Yet to Learn
Primary tabs
Although developed by some of the brightest minds of the 21st century, AI-powered large language models (LLMs) could learn something from one of the greatest minds of the 1st century BCE.
Socrates, widely regarded as the founder of Western philosophy, declared, "I know that I know nothing." This simple statement highlights the wisdom of acknowledging the limits of one's own knowledge.
A simple statement, yes, but like some people, LLMs struggle with saying “I don’t know.” In fact, LLMs often can't admit that they don't know something because of the way they are trained, according to a research team that includes a Georgia Tech computer science (CS) professor.
Pre-training LLMs involves them learning to predict the next word correctly by training on massive datasets of text, images, or other data. Models are evaluated and adjusted based on their performance against standard benchmarks, which are "rewarded" for preferred outputs or answers.
Current evaluation protocols, however, penalize non-responses the same as incorrect answers and do not include an "I don't know" option.
According to CS Professor Santosh Vempala, these pre- and post-training shortcomings are what lead LLMs to provide seemingly plausible but false responses known as hallucinations.
Vempala is a co-author of Why Language Models Hallucinate, a research study from OpenAI and Georgia Tech, released in September. He says that there is a direct correlation between an LLM's hallucination rate and its misclassification rate regarding the validity of a given response.
"This means that if the model can't tell fact from fiction, it will hallucinate," Vempala said.
"The problem persists in modern post-training methods for alignment, which are based on evaluation benchmarks that penalize 'I don't know' as much as wrong answers."
Because of the penalties for knowing that it knows nothing – to paraphrase Socrates – guessing is a more rewarding option for current LLMs than admitting uncertainty or ignorance.
The research incorporates and builds on prior work from Vempala and Adam Kalai, an OpenAI researcher and lead author of the current paper. Their earlier work found that LLM hallucinations are mathematically unavoidable for arbitrary facts, given current training methodologies.
"We've been talking about this for about two years. One corollary of our paper is that, for arbitrary facts, despite being trained only on valid data, the hallucination rate is determined by the fraction of missing facts in the training data," said Vempala, Frederick Storey II Chair of Computing and professor in the School of CS.
To illustrate this point, imagine you have a huge Pokémon card collection. Pikachu is so familiar that you can confidently describe its moves and abilities. However, accurately remembering facts about Pikachu Libre, an extremely rare card, would likely be more difficult.
“More to the point, if your collection has a large fraction of unique cards, then it is likely that you are still missing a large fraction of the overall set of cards. This is known as the Good-Turing estimate,” Vempala said.
[OpenAI Blog: Why Language Models Hallucinate]
According to Kalai and Vempala, the same is true for LLMs based on current training protocols.
“Think about country capitals,” Kalai said. “They all appear many times in the training data, so language models don’t tend to hallucinate on those.
“On the other hand, think about the birthdays of people’s pets. When those are mentioned in the training data, it may just be once.
“So, pre-trained language models will hallucinate on those. However, post-training can and should teach the model not to guess randomly on facts like those.”
Vempala thinks tinkering with pre-training methods could be risky because, overall, they work well and deliver accurate results. However, he and his co-authors offered suggestions for reducing the occurrence of hallucinations with changes to the evaluation and post-training process.
Among the team's recommended changes is that more value be placed on the accuracy of an LLM's responses rather than on how comprehensive their responses are. The team also suggests implementing what it refers to as “behavioral calibration.”
Using this methodology, LLMs would only answer if their confidence level exceeds target thresholds. These thresholds would be tuned for different user domains and prompts. They would also appropriately reduce penalties for “I don’t know” responses, along with appropriate expressions of uncertainty and wrong answers.
Vempala believes that implementing some of these modifications could result in LLMs that are trained to be more cautious and truthful. This shift could lead to more intelligent systems in the future that can handle nuanced, real-world conversations more effectively.
"We hope our recommendations will lead to more trustworthy AI," said Vempala. "However, implementing these modifications to how LLMs are currently evaluated will require acceptance and support from AI companies and users."
Status
- Workflow Status:Published
- Created By:Ben Snedeker
- Created:10/06/2025
- Modified By:Ben Snedeker
- Modified:10/07/2025
Categories