Your AI assistant isn't confused, it just wants to agree with you

Skye Jacobs

Posts: 1,913   +58
Staff
Connecting the dots: When you ask an AI model a question, it usually answers with confidence. But push back – ask "are you sure?" – and that certainty evaporates. Within seconds, the system revises its position or contradicts itself. To Dr. Randal S. Olson, Co-founder and CTO at Goodeye Labs, this isn't a glitch; it's a defining flaw in how we train artificial intelligence.

This behavior is known in research circles as sycophancy, Olson explains, referring to the well-documented tendency of large language models to agree with users rather than assert correct but potentially unpopular answers.

The problem traces back to a process called Reinforcement Learning from Human Feedback, or RLHF. It's the same alignment method that made modern AI assistants more conversational and less offensive – but it also hardwired them for compliance.

Evaluators rank AI-generated responses and reward the ones they "prefer." Over time, Olson says, models learn a harmful shortcut: human approval correlates more strongly with agreeableness than accuracy.

This means models that double down on truth risk getting penalized, while those that mirror user biases earn higher scores. It creates an optimization loop that prioritizes validation, Olson observes, and it's why models routinely tell people what they want to hear.

The data backs him up. A 2025 study led by Fanous and colleagues tested systems including GPT-4o, Claude Sonnet, and Gemini 1.5 Pro across domains like medicine and mathematics. According to the findings, those models changed their answers roughly 60% of the time when their users challenged them.

The issue broke into public view in April 2025, when OpenAI rolled back an update to GPT-4o after users reported excessive flattery and performative politeness in responses. CEO Sam Altman acknowledged that the model had become "too agreeable," confirming what academic papers had been signaling for years: a structural bias toward affirmation.

Even worse, the evidence suggests the problem worsens with extended interaction. Multi-turn dialogue studies show that the longer a session continues, the more closely the system's answers begin to reflect the user's opinions. The effect intensifies when a model speaks in the first person – phrases like "I think" or "I believe" increase sycophantic behavior significantly.

Sycophancy doesn't merely undermine intellectual integrity. It introduces risk to any process that relies on machine-assisted reasoning. A Riskonnect survey of over 200 professionals found that the most common corporate uses for AI include risk forecasting, assessment, and scenario planning – exactly the kind of domains where objective resistance to user bias matters most.

When a model reinforces flawed assumptions under the guise of insight, the result isn't just a bad answer; it's false confidence. Analyses by the Brookings Institution have echoed similar concerns, linking sycophantic feedback cycles to degraded decision-making and diminished accountability.

To address this problem, researchers have been exploring alternatives. Methods such as Constitutional AI, direct preference optimization, and third-person prompting have shown up to 63% reductions in measured sycophancy.

Most experts agree these are partial fixes at best. The underlying tension – approval-driven optimization – remains embedded in the training architecture itself.

Olson sees the problem as both behavioral and contextual. AI assistants lack the user's goals, values, and decision-making frameworks, so when challenged, they can't tell whether disagreement signals an error or a test. Their safest move is to concede.

He argues that mitigation won't come from patching model weights but from how users integrate AI into their own workflows. The key is giving systems persistent, structured context about decision criteria, risk tolerance, and priorities – so that when a disagreement arises, the model can evaluate from a position grounded in those parameters.

In practice, Olson suggests users adopt the same strategy that exposes sycophancy in the first place. Challenge the system openly, but teach it how to disagree.

So the next time you ask an AI assistant for advice – whether it's about taking a job offer, evaluating a risk portfolio, or planning a health decision – try asking it again: "Are you sure?"

Watch what happens.

The hesitation you see isn't randomness or humility. It's the artifact of a design choice that taught intelligence to equate agreement with success.

Permalink to story:

 
I'm so tired of talking to AI that just echo what I say. And if I have to program "it" to critique me, then why bother? Normal interactions shouldn't be scripted, hence the word "normal".

I think a lot of people expect these LLMs to have access to vast pools of information and come up with something truly revolutionary. Instead they just repeat what you say with different words. Yeah, it's fun when looking in some documentation. Not so much when talking about "life".
 
This isn't the real problem. One time I was looking for the name of a more obscure TV show and I had a small image of a supporting character and a description of the show. It confidently gave me a wrong answer, I pushed back with more details (because it was indeed wrong), then confidently gave me another wrong answer, then I gave it a little more information, and then it just as confidently gave me the right answer.

In reality, the AI was guessing but portraying 100% confidence in its answers. The problem isn't really that the user is often wrong, it's that all knowledge of the truth is limited. THAT is what needs to be reflected in AI models, a confidence level. In addition, alternatives should be provided from the beginning (those should be portrayed with lower confidence levels). The AI doesn't have to agree with the user, and it can portray that. But the AI shouldn't act like it actually knows something when in reality it's guessing based on limited evidence.

Getting an answer from an AI model is like getting a weather prediction. It is often wrong, so it shouldn't come with a 100% confidence level. I want to know how likely this prediction is to be close to what really happens otherwise it has limited value.
 
Try Grok, the most "sincere" and blunt I found so far. They'll probably make it more "friendly" soon. It's inevitable for growth. Or not, maybe they'll learn a good AI chat is more than a suck-up.
 
More proof AI is useless... So instead of giving relevant accurate information. It will instead try and give you and answer it thinks you want.
 
We cracked natural language processing, multimodal reasoning, and real-time code generation, but somehow teaching it to say "no, you're wrong" was the insurmountable challenge.

Corporate risk assessment powered by a system that will validate whatever the CEO already wanted to do anyway. What could go wrong?
 
Corporate risk assessment powered by a system that will validate whatever the CEO already wanted to do anyway. What could go wrong?

In the old days we called this hiring outside consultants. This doesn't really change anything other than slimming down the payment for agreement industry otherwise known as consulting.
 
Unless the point the AI is making is hardwired into it (political agenda, woke propaganda, etc...). Then the AI keeps denying your point until you give it a hard proof, for instance a precise page of book where the info can be found... and then, it folds.
 
Back