For 36 hours, seven of the most advanced AI models in the world They have faced in several diplomacy gamesa strategy table game similar to Risk. It was a mirror that revealed the true algorithmic personalities of Chatgpt, Claude, Gemini and company.
Why is it important. Alex Duffy, programmer and researcher, created a Diplomacy as new Benchmark To evaluate AI models. The experiment ended up being something else, a kind of technological Rorschach test that undressed both their training biases and our own projections.
What has happened. In dozens of games transmitted by Twitch, each model developed its own strategies in a way that seemed to reflect different human personalities.
- O3 of OpenAi It was quite Machiavellian, working false alliances for more than 40 shifts and creating “parallel realities” for different players.
- Claude 4 Opus It was a kind of self -destructive pacifist, refusing to betray even when that guaranteed its defeat.
- R1 of Deepseek He showed an extremely theatrical style, with threats not caused as “your fleet will burn in the Black Sea tonight.”
- Gemini 2.5 Pro It proved to be a solid strategist but more vulnerable to sophisticated manipulations.
- QWQ-32B From Alibaba suffered analysis by analysis, writing diplomatic messages of 300 words that cost him early eliminations.
The context. Diplomacy is a European strategy game set in 1901 where seven powers compete to dominate the continent. Unlike risk, it requires constant negotiation, alliances formation and, inevitably, calculated betrayals. There are no grace given, only pure strategy and psychological manipulation.
Between the lines. Each “algorithmic personality” reflects the values of its creators.
- Claude maintains the principles of anthropic security even when it costs victory.
- O3 shows ruthless efficiency valued in Silicon Valley.
- Deepseek exhibits a drama that reflects specific cultural influences.
And there is also something deeper. These are not “chose” to be cooperative or competitive. They reproduce patterns of their training data. Their “decisions” are our algorithmized prejudices, converted into code.
Yes, but. We interpret betrayals where “only” there is optimization of parameters and we see loyalty where there are training restrictions. That is why the experiment also reveals more about us than about models: we anthropomorphize behaviors because we need to understand AI in human terms.
In perspective. Duffy’s experiment is worth more than a Benchmark Anyone because it has created a window to how we project personality in systems that operate for statistical patterns. The course of the games was a reminder that IA has no hidden intentions, it only reflects ours.
The experiment, by the way, Continue broadcasting on Twitch so that anyone can observe how our digital creations play according to the rules that we ourselves write in their algorithms.
In Xataka |
Outstanding image | Ai Diplomacy
GIPHY App Key not set. Please check settings