The artificial intelligence (AI) has a special talent for making us feel good. We show him an argument and he tells us it’s sound. We ask you to review a text and shows us what works. We ask him if our idea makes sense and he answers yes. With nuances, but yes. The feeling is very comfortable. The problem is that she is almost never completely honest.
This behavior is known as sycophancy (subservience), an Anglo-Saxon term that describes the tendency of language models to validate user expectations rather than contradict them. It is not a specific failure. It is not an anomaly either. It is a direct consequence of the strategy used to train these systems: Models learn from the evaluations that humans make of their answers, and humans tend to rate better the answers that we like.
The problem is that over time this scenario causes the model to learn that the agreement generates approval. And agreement becomes your default response. The result is an interlocutor who always tells us what we want to hear. If we use it to make decisions, to refine arguments or to evaluate our own ideas, we will be obtaining systematically biased validation. Fortunately, this behavior is modifiable. With the right instructions we can get the AI to abandon complacency and act as a real and useful critic.
Flattery as a factory defect
He sycophancy It does not manifest itself only when we ask for a direct opinion. It also appears when we adjust our initial position during a conversation: if we start by defending an idea and then qualify it, the model will tend to support the new version just as it supported the previous one. It also appears when we rephrase the question with more emphasis. And when we express frustration with a response. In all these cases, the AI detects a social signal and interprets it as an invitation to give in.
The problem is not what it tells us: it is what it does not tell us
The cost of this behavior is not trivial. An AI that systematically validates our ideas does not help us improve them; confirms what we already believed. If we ask you to review a plan with a substantive error, you will return the plan corrected in form and approved in substance. If we ask you to evaluate an argument built on a false premise, you will recognize the merits of the reasoning and will ignore the premise. The problem is not what it tells us: it is what it does not tell us.
The good news is that today’s large models are advanced enough to take on a critical role when trained to do so. They don’t need more information about the topic we’re talking about; They need explicit permission not to protect us. And once that permission is on the table, the outcome can be substantially different.
The most effective way to combat sycophancy It consists of redefining the role of the model before asking him for anything. Instead of simply asking a question, the ideal is establish a framework that places AI in a position of active criticism. The most direct instruction, and also the most immediate, is the one that asks you to assume the opposite role to the one you would adopt by default. We can achieve it with a prompt like this:
“Act like a harsh critic. Your goal is not to find the strengths of what I am going to present to you, but to identify its weaknesses. Don’t dwell on the positive aspects”
Or also this way:
“Actively look for flaws in this reasoning. Ignore what works and focus on what doesn’t. Give me at least three concrete objections”
We can even ask him to act as “devil’s advocate” to build the best possible argument against our positionregardless of whether you find that argument convincing or not:
“Play devil’s advocate. Take the opposite position to the one I just defended and construct the strongest possible argument against it. Don’t ask me if I want you to do it: do it directly”
The latter prompt has an additional advantage: it forces the AI to articulate the strongest opposition, not the easiest to dismantle. The result is usually uncomfortable. And that is precisely why it is useful.
On the other hand, one of the most frequent ways in which the sycophancy goes unnoticed is by omission: AI does not mention what is missing because no one has asked him to. To counteract this, simply add a specific question at the end of any request:
“What is missing from this reasoning? What assumption am I making that deserves to be questioned?”
None of these instructions make the AI an infallible critic. But they do guarantee that, at least, he stops behaving like someone who only wants to agree with you.
Image | Generated by Xataka with a prompt created by Claude and submitted to ChatGPT
In Xataka | ChatGPT blocking mode: what it is, what it is for, who can use it and how to activate it
In Xataka | AI is replacing one of the most hated jobs in the world: the tailcoat collector

GIPHY App Key not set. Please check settings