If the problem is too difficult, they give up immediately

Machines do not think, that’s an illusion. We do not say it, a group of Apple researchers who have just published a revealing study entitled precisely (‘The illusion of thinking‘). In it these experts have analyzed the performance of several AI models With the ability to “reason”and their conclusions are striking … and worrying.

Puzzles for the “reason”. The normal thing when evaluating the ability of an AI model is to use benchmarks with programming or mathematics tests, for example. Instead, Apple created several Tests based on logical puzzles that were totally new and that therefore could not be part of the training of these models. Claude Thinking, Deepseek-R1 and O3-mini participated in the evaluation.

Models that crash. In their tests They checked Like all these reasoning models, they ended up starring Bruces against a wall when they faced complex problems. In those cases, the accuracy of these models fell resorted to 0%. It was also not matched that you granted more resources to these models when trying to solve those problems. If they were of some difficulty, they could not with them.

They get tired of thinking. In fact, something curious happened. As the problems became more complicated, these models began to think no more, but less. They used less tokens to solve them and riddled before they could use unlimited resources.

Not with help. Apple researchers even tried to give the models An exact algorithm that guided the models to find the solution step by step. And here, another capital surprise: none of the models managed to solve problems despite having those guided solutions. They could not follow instructions consistently.

Screen capture 2025 06 09 at 12 47 03
Screen capture 2025 06 09 at 12 47 03

These graphs show the differences between models that do not reason (Deepseek-V3) with those who do (deepseek-r1) in low complexity (yellow), medium (blue) and high (red) problems. There are only advantages for “reasoning” in medium difficulty problems. In the high models they simply collapse. Source: Apple.

Three types of problems. In their evaluation they divided the problems to be solved in three classes and verified if the reasoning models really contributed something to the traditional models that do not “reason.”

  • Low complexity problems: reasoning models effectively surpassed those who did not have that reasoning capacity. Of course, they often think too much to solve these simple problems.
  • Average complexity problems: there was still some advantage over conventional models, but not too much.
  • High complexity problems: All models ended up starring these problems.

Thinking, nothing. According to these researchers, the reason for this failure when reasoning in complex problems is simple. These models do not “reason” at alland all they do is use advanced patterns recognition techniques to solve problems. That does not work with complex problems, and there the foundations of these models are completely falling apart. Given these problems, if a model is given clear instructions and more resources should improve and be able to try to solve them, but this study demonstrates otherwise.

Far from AGI. What these results suggest is that the expectation that these models have generated is undeserved: the current reasoning models simply fail to move from a certain barrier by adding data or computing. Some pointed to how reasoning models could be a possible way Towards the search for the AGIbut the conclusions of this study reveal that in fact we are not closer to achieving models that can be considered general artificial intelligence.

They do not find solutions, they memorize and copy them. In fact, the study corroborated something that others defended in the past: These models simply have knowledge, and reproduce the solution they already had memorized when they find corresponding patterns that lead to that solution. Thus, these models could solve the famous problem of the Hanoi towers From many movements because they once know the solution can be applied systematically. However, in other puzzles they failed to the few movements.

Stochastic parrots. Many of the critics of the AI ​​always They have defended That the generative models, reason or not, are basically parrots that repeat what has been taught. In the case of AI they detect patterns and are able to find/predict the following word/pixel when generating text or images. The result is usually convincing, but just because they have become extremely good when detecting these patterns and responding properly and coherently. But it is not new knowledge: it is to repeat the queya.

They don’t think. Other critical experts of these expectations have been alerting us to alert us for the dangers of anthropomorphism of the IAS. I explained it Subbarao Kambhampti, from the University of Arizona, which, for example, analyzed the “reasoning” process of these models and their “chain of thought”. We use verbs like “think”, when they don’t think. They do not understand what they do, and that contaminates all the assumptions we do about their capacity (or lack of it).

Do not trust what the AI ​​tells you. The behavior of these models confirms what is known since Chatgpt appeared on the scene. As convincing that these models may seem – “reason” or not – the reality is that they can make serious mistakes and make mistakes, although others certainly right. In fact there are cases in which these models do surprise by their ability to solve problems: In Scientific American A group of mathematicians were overcome by an AI model that managed to solve some of the most complex mathematical problems that they failed to solve, or that took longer to solve.

Image | Puzzle Guy

In Xataka | Copilot, Chatgpt and GPT-4 have changed the world of programming forever. This is thought of programmers

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.