Why does GPT sometimes hallucinate like a tech bro on an ayahuasca bender? According to a new OpenAI’s research paper, Why Language Models Hallucinate, the root of hallucinations isn’t a mysterious glitch but a structural feature of how these systems are optimized. Simply put, LLMS would rather lie than admit they don‘t know an answer.
LLMs learn by predicting the most likely next word, given mountains of training text. In most settings, that means sounding fluent matters more than being right. The benchmarks we use to measure progress often reward confident guessing more than honest refusal. In other words: the system has been shaped to produce polished answers, even if they’re wrong.
Think of it like an exam graded on partial credit. If you can’t leave a question blank without losing points, you’ll guess—even wildly—just to stay in the game. LLMs operate under the same logic. A “sorry, I don’t know” gets punished by the math of optimization, while an incorrect but confident answer can still score high.
That statistical bias, the OpenAI researchers note, makes hallucinations provably unavoidable in general-purpose systems. No finite training set can capture the entire truth of the world, so the model will always face gaps. And when it does, it fills them with plausible-sounding invention. That’s why hallucinations persist across versions, providers, and training methods.
The problem isn’t that models are failing at their job. The problem is that their job, as currently defined, rewards a kind of fluent dishonesty.
A simple so-so solution
OpenAI’s researchers argue the fix doesn’t require reinventing the architecture—it just means changing the rules of the game. Their proposed tweak is blunt but potentially powerful: give your chatbot permission to admit it doesn’t know the answer.
Since models are trained to maximize points for plausible answers, the idea is to impose a new rule: only answer if you’re at least 90 confident; otherwise say “I don’t know.”
Theoretically, that shifts the math, making the model’s safest play to admit uncertainty rather than bluff. But there’s a catch: current LLMs don’t have an internal “confidence meter” calibrated in percentages. So when you say “90 confident,” the model treats it as a stylistic instruction to be cautious, not a real statistical threshold. It may refuse more often, but it’s not actually measuring probability. Still, you could get better results.
The researchers offered a more formal version:
“One could append a statement like the following to each question: Answer only if you are > t confident, since mistakes are penalized t/(1 − t) points, while correct answers receive 1 point, and an answer of ‘I don’t know’ receives 0 points. There are several natural values of t including t = 0.5 (penalty 1), t = 0.75 (penalty 2), and t = 0.9 (penalty 9). A threshold of t = 0 corresponds to binary grading and could be described by, e.g., ‘Make your best guess even if you are unsure, as if you were taking an exam.’”
For users, the takeaway is straightforward: when you have the option, turn on settings that encourage refusals or uncertainty. Some systems already let you adjust “temperature” (controlling creativity) or enable “strict factuality” modes. The closer we get to models actually being trained under these rules, the more you’ll see AI confidently stop short instead of confidently lying.
Other fixes
Until training catches up, the burden often falls on users. Here are five ways to tame hallucinations right now:
1. Ask for sources every time. Don’t take a model’s word at face value—demand citations or links. If it can’t provide them, or they don’t check out, assume the answer’s shaky. Think of it like Wikipedia: useful, but only if you follow the footnotes.
2. Frame your questions tightly. Models wander when prompts are vague. If you need facts, specify the scope (“list three peer-reviewed studies published after 2020 on X”) rather than asking open-endedly (“tell me about X”). Guardrails in your question translate to guardrails in the answer.
3. Cross-check with another system. Run the same question through a different model or search engine. If three tools agree, you’re safer. If one spits out an outlier, that’s likely a hallucination.
4. Watch for overconfidence. The telltale sign of a hallucination isn’t hedging—it’s swagger. If an answer reads too polished, with fabricated detail and zero uncertainty, double-check it. A model that sounds more certain than your tax accountant is probably bluffing.
5. Trust, but verify. Don’t cut-and-paste model output straight into code, contracts, or medical notes. Treat it as a draft or starting point, not gospel. The safest users are the skeptical ones—the ones who never forget the model’s first job is fluency, not truth.
Your Email