AI Explained · Concept

What is an AI hallucination?

A hallucination is when a model states something false with total confidence. It is not lying, there is no intent. Understanding why it happens tells you exactly when to trust the output and when to check.

What it is

An AI hallucination is output that is fluent, confident, and wrong. A made-up citation, an invented function, a plausible date that never happened. The unsettling part is not that models get things wrong, it is that they get things wrong in the same calm, authoritative voice they use when they are right.

Why it happens

A language model does not look anything up. It predicts the next token that is most likely to follow, based on patterns in its training data. It is optimizing for plausible, not for true. When it has seen a fact many times, the plausible continuation happens to be correct. When it has not, the model still produces the most plausible-sounding answer rather than stopping, because nothing in it says “I do not actually know this.” Add a fixed training cutoff (no knowledge of anything after it) and no live access to the world, and the gaps get filled with confident guesses.

Where the risk is highest

Hallucination risk is not uniform. It spikes wherever the answer is specific, checkable, and thinly represented in training data: exact quotes, citations and references, API signatures, version numbers, dates, statistics, prices, and anything recent or niche. It is lowest when the model is explaining a well-known idea, transforming text you provided (summarizing, rewriting, translating), or brainstorming, where there is no single fact to get wrong. Same model, very different trust level, depending on the task.

Hands-on: rate the hallucination risk

For each task, is the model likely to confidently make something up, or is this a safe use?

You cannot trust the tone

The single most useful habit: stop reading confidence as correctness. A model writes “the study by Henderson (2019) found…” with the exact same assurance whether that study exists or not. Tone is not a signal. Treat specific, checkable claims as unverified until you have checked them, especially when they are the part you are relying on.

How to reduce it

  • Ground it. Give the model the source material and tell it to answer only from that. This is what RAG does, and it is the biggest single lever.
  • Give it an out. “If the text does not say, answer ‘not stated’” turns a confident guess into an honest gap. The most effective one-line fix there is.
  • Ask for checkable citations, then actually check them. A reference you can click beats a reference that merely looks real.
  • Use tools for facts. Web search or a database (via MCP) gets the model past its cutoff and out of guessing.
  • Use structured outputs for extraction, and return null for missing fields instead of inventing them. See structured outputs.
  • Verify in the workflow. For anything automated, add a real check, the same verify gate that powers loops and evals.

You reduce hallucinations, you do not switch them off. The goal is a workflow where a confident wrong answer gets caught before it matters.

Where to go next