usatoday24

The Swiss psychologist Jean Piaget had an especially interesting intelligence definition. He said that “intelligence is what you use when you don’t know what to do.” That can be a key element of a new trend to measure the ability of artificial intelligence. One that puts AI to play Pokémon.

How intelligent is artificial intelligence? There are already evidence that allows assessing the AI capacity when solving scientific, mathematical or programming problems. All this helps to “measure” the progress of these models, but in the face of all these techniques there is a unique idea: to measure the aforementioned capacity of AI playing Pokémon.

Claude began the trend. Those responsible for Anthropic had the occurrence of trying How would your AI model behave, Claude 3.7, when playing Pokémon Red. So they used their automatic navigation tool to see how they used their abilities to play the video game. They created A twitch channel And there is even a follow -up on how he is going In Reddit.

We have an AI problem: there is no reliable way to know if Chatgpt is better than Gemini, Copilot or Claude

And now Gemini Pro collects the glove. A developer who has no affiliation with Google has decided to apply the same idea, but with Gemini Pro 2.5 experimental as an AI model to test it. On your Twitch channel He is showing a game of Pokémon Blue (he was the one who knew the most this developer) executing himself in an emulator of the Game Boy Advance.

Who wins? At the moment Gemini Pro 2.5 experimental seems to be doing something better. Claude was stuck in a game phase a couple of timesfor example, which has forced to start its games again. Gemini seems to be advancing without as many problems, although he does not play in the same way as Claude and for example has access to a minimapa that according to its creator compensates for one of Gemini’s limitations, which does not have automatic navigation tools such as Claude.

Why Pokémon for Game Boy. The Pokémon version for the Game Boy Advance that is being used in these experiments is perfect to evaluate those capabilities of the LLM for several reasons. For example, it is a video game in turn, allowing the “thinking” of its next movement. But it is also a graphically simple game, which makes them easier for these models “see” the screen and understand what happens at every moment without this being very expensive at the resources level.

A surprisingly useful benchmark. This way of evaluating how intelligent an AI can be as revealing as programming or mathematics tests. Or more, even. If someone gives a 10 -year -old boy a Nintendo Switch, that child will learn to play any game in minutes. However, the IAS often have it especially difficult in this scenario, and end up executing illegal movements.

No memorization. Many of the benchmarks used to measure the ability of AI models is based on their “memory.” When they solve a problem it is normally because the solution is part of their training data set, or there was already a similar problem solved and can “replicate” or “regurgit”. In this approach the proposal is something different, and demands some ability to adapt to AI models.

ARC-AGI AND THE SNAKE GAME. In February, the ARC Prize Foundation, which develops an equally striking benchmark for AI models, experienced with another video game simple: a version of mythical snake that faced various AI models to see how they behaved. The reasoning models were the clear winners (78% of victories), and again this showed them the relevance of this type of video games to improve AI models in the future.

The AI learn to adapt. As we were saying, this type of benchmarks are especially interesting because they allow us to check if an AI model is able to adapt to new situations and challenges and overcome them. It is something that companies such as Deepmind have been doing with some of their developments for some time, and it is certainly an interesting alternative to explore for the developers of these models.

In Xataka | I have used Claude 3.7 for hours. It is the closest to a human brain that I have felt with an AI

Leave your vote

0 Points

Upvote Downvote

put to play Pokémon

Leave your vote

Leave a CommentCancel reply

Leave your vote

Leave a CommentCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections