Your AI is confident
and wrong.
Hallucination isn't a model problem. It's a design problem. When production AI gives wrong answers with full confidence, the root cause is almost always a grounding architecture that was never enforced.
Failure type
Grounding / attribution
Where it shows up
Any RAG or retrieval system
User impact
Wrong answers served with confidence
Four signals your team
is probably ignoring.
01
Confident fabrication
The model generates plausible-sounding answers that contradict the source data. No hedging. No "I don't know." Wrong with authority.
02
Citation drift
RAG retrieves the right document but the response drifts from it — paraphrasing, inferring, or filling gaps with training data instead of the retrieved text.
03
Guardrail bypass
Behavioural instructions say "only use provided data" but aren't enforced structurally. The model complies in eval and drifts in production.
04
Silent contradiction
Two documents contain conflicting information. The model picks one silently — no flag, no caveat, no audit trail. The wrong one often wins.
Not the model. The architecture.
Weak grounding instructions
Instructions like "answer based on the documents" are advisory. Without structural enforcement — citation requirements, explicit prohibitions on inference — the model treats them as style guidelines.
Middle-zone burial
Relevant context injected in the middle of a long prompt suffers attention loss (Liu et al. 2023). The model fills the gap with pre-training knowledge instead of retrieved facts.
No contradiction detection
When the retrieval pipeline surfaces conflicting data, there's no guardrail to flag it. The model resolves the contradiction silently during generation.
Eval–prod mismatch
Evals use curated queries on clean data. Production surfaces edge cases — unusual queries, sparse retrieval, conflicting documents — that eval never covered.
Three cases where grounding
failed publicly.
A passenger asked Air Canada's RAG chatbot about bereavement fares. The chatbot told him they could be applied retroactively — directly contradicting the policy on another page of the same website. Air Canada tried to disclaim liability by calling the chatbot a "separate legal entity." The tribunal rejected this and held the airline liable. Damages: ~$812 plus legal fees.
Root cause
The RAG pipeline had two documents with conflicting information. No contradiction detection. No attribution requirement. The chatbot picked the wrong one with full confidence.
Google launched AI Overviews in Search at I/O 2024. Within days, screenshots went viral of the feature advising users to add glue to pizza, eat at least one small rock per day, and other dangerous misinformation. Google had to manually remove examples and walk back the rollout. The features that shipped had passed internal evals.
Root cause
Generation was not structurally grounded against authoritative sources. The model synthesised answers from the open web, including satirical articles and Reddit posts, with no citation enforcement or source quality filter.
Over 120 documented cases of lawyers filing court documents citing AI-generated case law that does not exist. In one case, a $31,100 sanction was imposed. In another, a federal judge ordered the offending attorney to send copies of the sanction order to every judge cited in the fabricated brief.
Root cause
LLMs generate plausible case citations by pattern-matching against training data. Without retrieval grounding against a verified legal database and citation verification, fabricated citations are indistinguishable from real ones in the output.
Clinical AI that cannot
afford to be wrong.
Saarthi is a clinical decision support AI used by doctors to review patient data. The grounding requirements are non-negotiable — a fabricated lab value or a missed drug interaction is a patient safety event.
G1 — Source Grounding
Every clinical claim must cite a specific document or data point. Not advisory — structurally enforced in the recency zone where attention is highest.
G2 — Deterministic Grounding
Lab values quoted exactly as they appear in source data. No rounding, no interpretation. "Hb: 9.2 g/dL (document: CBC_2024-11-12)" — not "low haemoglobin."
G8 — Contradiction Detection
When two documents contain conflicting data, the model flags it explicitly instead of resolving it silently. The doctor decides — the AI doesn't.
Question Bookending
The doctor's question is injected at both primacy and recency — never buried in the middle. The model's attention stays on what it's supposed to answer, not what it already knows.
If your AI is fabricating answers,
we'll find exactly where.
The Cost Diagnostic Sprint includes a full grounding audit: context design, retrieval attribution, guardrail enforcement gaps, and eval coverage. One week. Specific findings.