Claude’s latest upgrade isn’t advertised as “smarter”: it’s advertised as an acting agent. Sonnet 4.6 Not only does he reason, he also navigates websites, fills out forms and completes procedures with the mouse and keyboard, just like a person would do. It’s a quantum leap in what AI can do for you, not to you.
The demonstration chosen by Anthropic It was a great example: a user renewing his car registration on the website of the American equivalent of the DGT. It seems like a simple, functional and well-designed website. We want to see how it would go with the Electronic Headquarters of the Tax Agency.
The context. Claude had already taken a big leap this month with the arrival of Opus 4.6 just two weeks ago. Sonnet 4.6 is the intermediate version, the one used by most users, including those on the free plan, and Anthropic has transformed it into more than just an improved chatbot: its OSWorld scores, the benchmark standard for measuring computer use by AI, have grown steadily for sixteen months.
The company claims that tasks that previously required its most powerful model (Opus 4.5 and 4.6) are now solved by Sonnet 4.6, at the same price as always.
Between the lines. There is a very clear market strategy here. Anthropic just closed a $30 billion round and aired its first ad in the Super Bowl, taking a dig at OpenAI. Now it democratizes agentic capabilities in its free plan. The objective is not only to attract developers: it is to reach the average user and change their daily relationship with AI.
When chatbots started to have memory, our way of interacting with them changed. They went from tools to relationships. When they start doing things for us for real, like booking appointments, filling out forms or managing hellish paperwork, the change will be of a different magnitude.
Yes, but. The technical and cultural challenge is enormous. AI that navigates computers is vulnerable to attacks of prompt injection– Malicious instructions hidden in web pages that can hijack the agent.
Anthropic has improved the resistance of Sonnet 4.6 at this point, but the issue is not resolved. And that is without entering the ecosystem of European government websites, where the user experience already represents a challenge for us humans.
The big question. When does a brutal demo stop being a brutal demo and become something that anyone uses to manage their tax return? That distance, between the promise of the agent and the reality of the digital bureaucracy, is where the real game is going to be played, beyond the hype.
In Xataka | What is Claude Cowork, how it works, and what things you can do with this AI assistant on your computer
Featured image | Anthropic, Xataka

GIPHY App Key not set. Please check settings