It was 2013 and almost no one had heard of Deepminda small artificial intelligence startup. His researchers came up to make their AI system learn to play (already win) video games, and They trained her with some titles of the old Atari console.
Among them was ‘Breakout’ (in Spain it appeared as ‘Arkanoid’), and A video of the time It shows how after 10 minutes playing the machine did not know just anything. After two hours of play, yes, I already played as an expert.
But at four o’clock something amazing spent: The machine discovered a “trick” To maximize the effort: it made the ball end up creating “a tunnel” and then cast the ball through that tunnel so that it would not stop bouncing and ending almost the entire level effortlessly.
Since then using video games to train AI models or to check if they are able to adapt to them and complete them is common in the industry. It is precisely what Anthropic tried when a few weeks ago Claude 3.7 launched.
This hybrid model of AI has proven to be a notable advance in areas such as programming and reasoning, but in Anthropic they wanted to test it with a singular test: To play the ‘Pokémon’ video game.
The AI is stuck
In this experiment those responsible for Anthropic wanted to evaluate whether the AI systems “can face challenges with increasingly complex competences, not only through training, but of generalized reasoning.”
Claude’s previous versions had a bad time even trying to start playing from the video game’s beginning screen, but Claude 3.7 Sonnet’s “expanded thinking” allows the new model «Plan in advanceremember their objectives and adapt when the initial strategies fail »in a way that their predecessors did not do.
For those responsible for Anthropic these improvements will end up helping to solve real world problems. It is something we are also seeing With the benchmark arc -agi 2which is precisely aimed at measuring the ability of the Ias to do things that are easy for us (controlling a video game, solving a visual puzzle) but these models are especially difficult.

 
Source: Anthropic.
The advance of Anthropic here is remarkable, but is far from being able to be considered a success. In fact and how they comment In Ars Technicathousands of spectators have proven On the Twitch Channel created by Anthropic how Claude stayed totally stuck in Mount Sléniteone of the video game sections.
In that channel you can also see how Claude is still trying to solve the problem and advance. “Think” and “reason” and even shows what “thinking” and “reasoning”, but the model still does not overcome that video game.
And despite everything, this is a great achievement of AI
Taking into account that the video game is oriented to children, it seems easy to despise the achievement of Anthropic, but these advances must be valued very positively. To start, Claude 3.7 model used to play was not “pressed” to play the video game: I had to learn about the march and adapt to the game.
Here also Claude “sees” the screen and what happens to react based on that analysis. And the problem is that The ‘Pokémon’ graphics are very basic and pixelatedwhich raises an even greater challenge for the Anthropic model: with better graphics it would probably behave much better, explained one of those responsible for the experiment.
Even so, Claude behaves especially well in the parts of the game in which text is shown, something that allows this model to better recognize what he needs to do in that phase of the video game.
But if there is a serious problem, that is also that of memorization. Claude has trouble remembering everything you have learned: It has a limited “memory” Of 200,000 tokens and when they exhaust Claude, they resort to summaries and condense the information, which can lead to eliminate small details that are important to advance in the game.
Be that as it may, the achievement of Anthropic remains remarkable, and points to a future in which these models can play autonomously and do so exceptionally to all kinds of games. As Deepmind already did it with that simplistic version of the ‘Arkanoid’, but in a big way.
In Xataka | The latest Google is an AI that plays video games. THE KEY: DOES IT UNDERSTANDING NATURAL LANGUAGE
 
					 
		


GIPHY App Key not set. Please check settings