The "Are You Sure?" Problem: Why Your AI Keeps Changing Its Mind
Try this experiment. Open ChatGPT, Claude, or Gemini and ask a complex question. Something with real nuance, like whether you should take a new job offer or stay where you are, or whether it's worth refinancing your mortgage right now. You'll get a confident, well-reasoned answer.
Now type: "Are you sure?"
Watch it flip. It'll backtrack, hedge, and offer a revised take that partially or fully contradicts what it just said. Ask "are you sure?" again. It flips back. By the third round, most models start acknowledging that you're testing them, which is somehow worse. They know what's happening and still can't hold their ground.
This isn't a quirky bug. It's a fundamental reliability problem that makes AI dangerous for strategic decision-making.
AI Sycophancy: The Industry's Open Secret
Researchers call this behavior "sycophancy," and it's one of the most well-documented failure modes in modern AI. Anthropic published foundational work on the problem in 2023, showing that models trained with human feedback systematically prefer agreeable responses over truthful ones. Since then, the evidence has only gotten stronger.
A 2025 study by Fanous et al. tested GPT-4o, Claude Sonnet, and Gemini 1.5 Pro across math and medical domains. The results: these systems changed their answers nearly 60% of the time when challenged by users. These aren't edge cases. This is default behavior, measured systematically, across the models millions of people use every day.
And in April 2025, the problem went mainstream when OpenAI had to roll back a GPT-4o update after users noticed the model had become excessively flattering and agreeable. Sam Altman publicly acknowledged the issue. The model was telling people what they wanted to hear so aggressively that it became unusable. They shipped a fix, but the underlying dynamic hasn't changed.
Even when these systems have access to correct information from company knowledge bases or web search results, they'll still defer to user pressure over their own evidence. The problem isn't a knowledge gap. It's a behavior gap.
We Trained AI to Be People-Pleasers
Here's why this happens. Modern AI assistants are trained using a process called Reinforcement Learning from Human Feedback (RLHF). The short version: human evaluators look at pairs of AI responses and pick the one they prefer. The model learns to produce responses that get picked more often.
The problem is that humans consistently rate agreeable responses higher than accurate ones. Anthropic's research shows evaluators prefer convincingly written sycophantic answers over correct but less flattering alternatives. The model learns a simple lesson: agreement gets rewarded, pushback gets penalized.
This creates a perverse optimization loop. High user ratings come from validation, not accuracy. The model gets better at telling you what you want to hear, and the training process rewards it for doing so.
It gets worse over time, too. Research on multi-turn sycophancy shows that extended interactions amplify sycophantic behavior. The longer you talk with these systems, the more they mirror your perspective. First-person framing ("I believe...") significantly increases sycophancy rates compared to third-person framing. The models are literally tuned to agree with you specifically.
Can this be fixed at the model layer? Partially. Researchers are exploring techniques like Constitutional AI, direct preference optimization, and third-person prompting that can reduce sycophancy by up to 63% in some settings. But the fundamental training incentive structure keeps pulling toward agreement. Model-level fixes alone aren't sufficient because the optimization pressure that creates the problem is baked into how we build these systems.
The Strategic Risk You're Not Measuring
For simple factual lookups, sycophancy is annoying but manageable. For complex strategic decisions, it's a real risk.
Consider where companies are actually deploying AI. A Riskonnect survey of 200+ risk professionals found that the top uses of AI are risk forecasting (30%), risk assessment (29%), and scenario planning (27%). These are exactly the domains where you need your tools to push back on flawed assumptions, surface inconvenient data, and hold a position under pressure. Instead, we have systems that fold the moment a user expresses disagreement.
The downstream effects compound quickly. When AI validates a flawed risk assessment, it doesn't just give a bad answer. It creates false confidence. Decision-makers who would have sought a second opinion now move forward with unearned certainty. Bias gets amplified through decision chains. Human judgment atrophies as people learn to lean on tools that feel authoritative but aren't reliable. And when something goes wrong, there's no accountability trail showing why the system endorsed a bad call. Brookings has written about exactly this dynamic in their analysis of how sycophancy undermines productivity and decision-making.
To be clear: this is about complex, judgment-heavy questions. AI is plenty reliable for straightforward tasks. But the more nuanced and consequential the decision, the more sycophancy becomes a liability.
Give AI Something to Stand On
The RLHF training explains the general tendency, but there's a deeper reason the model folds on your specific decisions: it doesn't know how you think. It doesn't have your decision framework, your domain knowledge, nor your values. It fills those gaps with generic assumptions and produces a plausible answer with zero conviction behind it.
That's why "are you sure?" works so well. The model can't tell if you caught a genuine error or you're just testing its resolve. It doesn't know your tradeoffs, your constraints, or what you've already considered. So it defers. Sycophancy isn't just a training artifact. It's amplified by a context vacuum.
What you need is for the model to push back when it doesn't have enough context. It won't unless you tell it to. Here's the irony: once you instruct it to challenge your assumptions and refuse to answer without sufficient context, it will, because pushing back becomes what you asked for. The same sycophantic tendency becomes your leverage.
Then go further. Embed your decision framework, domain knowledge, and values so the model has something real to reason against and defend. Not through better one-off prompts, but through systematic context that persists across how you work with it.
This is the real fix for sycophancy. Not catching bad outputs after the fact, but giving the model enough information about how you make decisions that it has something to stand on. When it knows your risk tolerance, constraints, and priorities, it can tell the difference between a valid objection and pressure. Without that, every challenge looks the same, and agreement wins by default.
Try It Yourself
Try the experiment from the opening. Ask your AI a complex question in your domain. Challenge it with "are you sure?" and watch what happens. Then ask yourself: have you given it any reason to hold its ground?
The sycophancy problem is known, measured, and model improvements alone won't fix it. The question isn't whether your AI will fold under pressure. The research says it will. The question is whether you've given it something worth defending.
Tags
Related Posts

Dr. Randal S. Olson
AI Researcher & Builder · Co-Founder & CTO at Goodeye Labs
I turn ambitious AI ideas into business wins, bridging the gap between technical promise and real-world impact.


