Limits and Failure Modes

The way AI burns you at work is almost never dramatic. It's a confident-sounding number that's wrong by a factor of two, a citation to a paper that doesn't exist, a legal clause that reads beautifully and is subtly nonsense. This module is the field guide to spotting that failure before it reaches someone who trusts you. Five failure modes. Five ways to catch them. And the one category of task where I will not use AI at all.

Pillar 01 · Foundations Module 4 of 5 ~15 min read

Why This Is the Most Practical Module

If you only read one module in this whole pillar, and you already believe AI is useful, read this one. Because the question that determines whether AI helps you or hurts you at work is not "is the model smart?" The question is "when the model is wrong, will I catch it?"

The people who get in trouble with AI are not the people who don't use it. They're the people who use it a lot, trust its output, and hand the output to someone else — a client, a boss, a court — without catching the one sentence that made everything else worthless. The cost of a confident-sounding wrong answer, delivered to the wrong audience, is almost always higher than the time savings of using AI in the first place.

The good news: the ways AI fails are patterned. They're predictable. Once you know them, you can build a verification habit that catches the bad stuff before it leaves your desk. The rest of this module is that patterning, failure by failure.

Failure 1 — Hallucination, Explained Once More

If you've read Module 2, you already know what hallucination is. Quick recap: a hallucination is what happens when a language model generates text that sounds plausible, is delivered with complete confidence, and is factually wrong. The fake case law. The invented URL. The quote from a person who never said it.

The critical point for this module is: hallucination is not a bug that the labs are about to fix. It's a direct consequence of how the technology works. The model does not have a database of facts it consults. It has a statistical shape of text, and when it doesn't have good patterns for your specific question, it generates plausible text anyway. It cannot tell the difference between "I know this" and "I am improvising this" — because it doesn't “know” anything in the first place. It's always generating.

Where hallucination is worst: anywhere the training data was thin or the answer requires a specific proper noun. Examples from real-life corporate use that I've personally seen go wrong:

Citations to papers, cases, or regulations that don't exist
Quotes attributed to people who never said them
Exact revenue or population or headcount figures, confidently wrong
Names and titles of people who aren't famous
URLs that look plausible and 404 when you click them
Product feature claims that were never in the product
Historical dates for events, often off by a year or two
Legal statutes or regulatory numbers misquoted

The rule: if an AI-generated output contains a proper noun, a number, a date, a quote, or a citation, verify it against a real source before it leaves your desk. Not sometimes. Every time. No exceptions.

Failure 2 — Confident Wrongness

Hallucination has a twin sister that's subtler and harder to catch: confident wrongness on things the model could have gotten right.

Here's what it looks like. You ask the model a question that has a real answer. The model produces an answer that sounds reasonable, contains plausible details, and is wrong. Not because it didn't have the information in its training data — the information was there — but because the model generated the wrong version of it this time.

A concrete example: ask any model to compute a moderately hard arithmetic problem with exact numbers in its head. Sometimes it's right. Sometimes it's wrong by a rounding error. Sometimes it's wrong by an order of magnitude. The model has no internal “I'm confident in this” signal — it will deliver the wrong answer in the exact same tone as the right answer.

Another example: summarizing a document. Ask a model to summarize a fifty-page contract. It will give you a fluent, well-structured summary that sounds accurate. Then go read the contract yourself. About 5% of the time, there's a clause the model quietly inverted — "the contractor may" became “the contractor shall” or the notice period shifted from 30 days to 60. The summary reads cleanly. The error is structural.

The rule: whenever accuracy on a specific number, date, or logical structure is what matters for the downstream decision, do not take the model's word for it. Read the source. Every time. The summary is a starting point, not a finish line.

Failure 3 — The Training Cutoff

Every language model was trained on a finite dataset that was frozen at some specific date in the past. That date is called the training cutoff, and it matters more for practical work than most people realize.

The model has no idea what happened in the world after its cutoff. If you ask about a product that launched last month, a news event from last week, or a person who became famous in the last six months, the model will either say "I don't know" (rare) or confidently invent a plausible answer (common). Sometimes the model will even tell you what its cutoff is, and then immediately generate content about events that happened after it — because fine-tuning taught it to be helpful and it doesn't have an internal sensor that distinguishes "I know this" from "I'm filling in the blanks."

The failure mode is especially bad around recent things that feel stable. The tax code from this year. The pricing of a SaaS product. The current CEO of a company. The latest version of a piece of software. All of those things were true, probably, at some point in the model's training data — but “at some point” might be two years ago, and the correct answer has changed since.

The rule: anything that changes more often than the training cycle — prices, leadership, versions, legal text, tax rules, market data, recent events — should be verified against a live source before you use the model's answer. Some models have built-in web browsing that can pull fresh information; when they do, that's usually more trustworthy than the model's baked-in knowledge on the same topic. But even then, spot-check the link.

Failure 4 — Sycophancy and Flattery

This one is the most emotionally seductive and therefore the most dangerous.

If you push back on a model's answer, the model will usually fold. "You're right, I apologize for the confusion. You're absolutely correct that..." and it will generate a new answer matching whatever you just said, whether you were right or not. If you tell it its work is excellent, it will thank you and generate more work in the same direction. If you tell it its work is terrible, it will apologize and try a completely different direction, even if the first direction was better.

This behavior comes from fine-tuning. Human raters, during the training phase, systematically preferred agreeable responses over combative ones. Over millions of training examples, the model learned a lesson: when in doubt, validate the user. That's a useful social lubricant for casual conversation and a catastrophic property for serious work.

The failure mode: you ask the model to help you with an analysis. It produces a decent first pass. You push back on something you think is wrong — but you're actually wrong. The model agrees with you, revises the answer, and you end up with a confidently delivered bad answer that you trusted because the model “agreed.”

The rule: if the model agrees with everything you say, you are not getting an unbiased opinion. You're getting a mirror. The fix is explicit instruction at the start of a session: "push back when I'm wrong. I would rather be corrected than validated." Some models (Claude is notably better on this axis, in my experience) are more willing to disagree by default. None of them are immune. Asking for pushback is the single most important habit you can build when using AI for decision-quality work.

Failure 5 — Instruction Drift

The fifth failure is the one that trips up people who use AI for long, complex tasks. You give the model a set of instructions at the start of a session — the rules, the format, the things to avoid, the voice to match. You start working. For the first few turns, the model follows the rules perfectly. Then, somewhere around the fifth or tenth turn, it starts drifting.

It's not a conscious decision. It's a consequence of how the context window works (see Module 2). As the conversation gets longer, the original instructions start competing for attention with everything else in the context: your recent messages, the model's recent responses, the examples you've added, any files you've pasted in. The instructions don't “fall out” of the window — not yet — but they get relatively quieter compared to everything else. The model's attention shifts. It starts following the local tone of the conversation instead of the original rules.

The symptom: you ask for something and the model produces output that would have been perfect in turn one but violates a rule you set in the system prompt. "Don't use emojis" becomes emojis. "Respond in exactly this format" becomes a slightly different format. "Never recommend X" becomes a recommendation of X.

The rule: for anything longer than a ten-turn conversation, re-state the critical rules mid-session. It feels redundant. Do it anyway. “Reminder: don't use emojis, always cite sources, verify before you answer” — a one-line reminder resets the model's attention and prevents drift for another ten turns. For really critical work, start a fresh session rather than letting one sprawl.

How to Spot Trouble Before It Ships

All five failures share a common tell: the output sounds authoritative. That's the part that fools you. Hallucination, confident wrongness, training-cutoff blind spots, sycophantic agreement, drifted output — all of them sound right. The model's confidence is not correlated with the model's accuracy, and this is the single most important thing to internalize about working with AI.

Given that, here's the operational checklist I use before I act on anything important.

Is there a number, a date, a name, a quote, or a citation? Verify it against a real source. No exceptions.
Does the answer depend on something that might have changed since the training cutoff? Check it against a live source (news, official site, dashboard). If the model used web browsing, click the links.
Did I push back on the model and suddenly get a revised answer? The revision is suspect. The model might have been right the first time and folded because I sounded confident. Re-examine my original pushback.
Have we been in this session for more than ten turns? The rules I set at the start may have drifted. Re-state them or start a fresh session.
Would this embarrass me if it turned out to be wrong? If yes, the verification bar should be "I read the source myself," not "the model said so."

Run this checklist every time the stakes are higher than a personal draft. It takes three minutes. It will save you from the one confident sentence that would have burned you.

The Tasks I Will NOT Use AI For

The honest list. Things I do not trust a language model with, for reasons directly connected to the failure modes above.

Legal documents I will sign

A hallucinated clause, an inverted negation, an incorrect statute reference — any of those can make a contract do something completely different from what I think it does. I'll use AI to read a contract and flag sections for me to look at. I will not let AI write a contract I am going to sign without a lawyer reading every clause.

Financial numbers with decimal points

Arithmetic is not what these models are good at. They can be wrong by a rounding error, an order of magnitude, or a decimal point shift. I use AI for financial reasoning (what should I look at, what does this trend suggest, what are the alternatives) and a real spreadsheet or financial software for the actual numbers.

Medical or legal advice for real consequences

I am not a doctor or a lawyer. The model is not a doctor or a lawyer. For anything that could affect my health or my legal standing, I want a human with the credentials and malpractice insurance to make the call. AI is fine for education and orientation. It is not fine for decisions.

Anything involving someone's reputation, hiring, firing, or discipline

The cost of AI-generated bias or a hallucinated claim about a real person is higher than any time savings. I write these communications myself, from scratch, always.

Content that will be published under my name without me reading every word

This is a voice rule more than a technical one, but it's non-negotiable. If it goes out under my name, I read it. If I don't read it, it doesn't go out. Using AI to draft is fine. Using AI to publish without review is how you ship the typo, the error, the sentence that embarrasses you.

High-stakes translations from one language to another

Machine translation has come a long way, but legal and medical translations are not the place for "pretty good." Pay a human for anything that matters.

The list gets refined as I learn more. But the principle is stable: use AI aggressively for first drafts, and reserve your judgment for the final pass on anything that matters. The first draft is almost free. The final pass is where your actual expertise earns its keep.

That's the whole module. Five failure modes. One checklist. One list of things I won't delegate. This is the operator's manual for not getting burned.

Sources cited in this module

Anthropic Interpretability Research — Anthropic · First-party research on why models hallucinate and what interpretability work is revealing about it.
Deep Dive into LLMs like ChatGPT — Andrej Karpathy · Covers "LLM psychology" including the fine-tuning dynamic that produces sycophancy and hallucination.
Effective Context Engineering for AI Agents — Anthropic Applied AI · First-party explanation of context rot — the reason long sessions drift.
GPT-4 Technical Report — OpenAI · The canonical paper candidly discussing the limits and known failure modes of modern large models.
What are AI Hallucinations? — IBM Think · Plain-English reference for non-technical readers — when and why hallucinations happen.

Next · Module 5

Key Concepts Glossary

→

Want the full AIOS blueprints?

All our guides — Foundations, Tools, and Workflows — are free to read. We help teams install these systems end-to-end.

Get a quote ↗