Anthropic says that Claude Sonnet 4.5 can clone a service like Slack in 30 hours. Reality is more complicated

Anthropic has launched Claude Sonnet 4.5 ensuring that they put it to work 30 hours in a row to build a Slack replica. During that time, it generated 11,000 lines of code without supervision and only stopped when completing the task. In May, its Opus 4 model managed to operate for seven hours. The company presents it as “the best model in the world for agents, programming and use of computers.” Why is it important. Anthropic, Openai and Google free a battle to dominate Autonomous agents and programming tools. Those who convince will capture a lot of money in business licenses. Scott White, product manager, says that “at the level of a cabinet chief”: coordinates agendas, analyzes data, writes reports … Dianne Penn says he uses it to search for candidates on LinkedIn and generate spreadsheets. Yes, but. The developers tell another more nuanced story. Miguel Ángel Durán, known as @Midudevsummarizes it: “Claude Sonnet 4.5 Refactor my entire project in a Prompt. 20 minutes thinking. 14 new files. 1,500 modified lines. Applied clean architecture. Nothing worked. But how beautiful it was. “ Other developers They report the same: thousands of lines with an impeccable structure, but do not execute. Code that seems professional but collapses when compiling it. Between the lines. Anthropic has not shown the application of Slack working. He has only said that he built it. Nor has it shown that the code is operational. The difference between communicating something and demonstrating it, Underlined by Ed Zitron. The company is indirectly recognizing the problem: Claude Sonnet 4.5 arrives with extra infrastructure to build agents – virtual management, memory management, context management, multiagente support …–. Translation: Even with the most advanced model, developers need extra tools for agents to program reliably. In detail. Penn He explained to The Verge that the improvements surprised the internal team. The model is three times more skilled using computers than the October version. The team spent the last month working with feedback of github and cursor. Canva, Beta-fieldsHe says he helps with “complex long context tasks.” The contrast. There is a huge gap between marketing and technical reality. Anthropic promises an AI that operates 30 hours building complex software. Developers confirm that it generates very well structured but functionally broken code. This pattern is repeated throughout the industry. The models improve generating code that seems professional. They systematically fail generating code that really works without important human intervention. And now what. The question is still unanswered: when will we pass from Which generates beautiful but diffunctional code What generates functional code alone? Anthropic bets that his combination of powerful model and extra infrastructure closes that gap. At the moment we must continue waiting for concrete evidence to arrive, do not give without verifiable code. In Xataka | Openai signs with Samsung and SK Hynix for a potential chips demand of 900,000 wafers per month. It is an absurd figure Outstanding image | Anthropic

Anthropic launches Claude 3.7 Sonnet, a “hybrid” model that is better than ever. Not only that: also “reason”

Anthropic has announced The launch and availability of Claude 3.7 Sonnet, its new model of founding. The jump is promising, but stands out especially for one thing: they point to reasoning models. It is not Claude 4.0, it is Claude 3.7. The number of the new version confirms once again that the jump of benefits does not justify a more “round” number. Many expected Claude 4.0, but in Anthropic they make it clear that this is a much more evolutionary version than revolutionary. A hybrid model. In Anthropic they presume from having a hybrid model that does not differentiate between whether to talk and answer questions quickly, reason or any other application, because everything is based on the Claude 3.7 founding model, which does everything and behaves in that way Multidisciplinary. And as it does everything, it is somewhat more expensive than the competition: its API costs $ 3 per million input tokens and $ 15 per million departure tokens Claude can already “reason”. In a separate announcement Anthropic told us about his new mode of reasoning, called “Extended Thinking Mode”, which now becomes a more option among which we can display when using its model. If we activate it, the model “will think more deeply about complex questions.” As those responsible explain, this mode uses the same AI model, but does so by giving it more time and investing more effort to reach an answer. How Claude thinks. This mode of reasoning offers the possibility of seeing what the model is thinking when processing those answers. Here they warn that this information can be surprising, because we can see how AI can “think” incorrect things, but also show that process does not mean that the answer is only based on it. “Our results suggest that models often make decisions based on factors that are not explicitly discussed in their reasoning process.” Things are saved. That is: the model seems to keep things for yourself while thinking, but it is not clear which or why. There is another reason not to show everything: that raises security problems, since having all that information potentially gives resources to bad actors to take advantage of the model of inappropriate forms. Source: Anthropic You can play Pokémon alone. The new Anthropic model is also more “agéntico” than ever. It responds better to changes in the environment and continues to act until an open task has been completed. That makes The “Computer Use” function which allows AI to control our computer to be increasingly promising. They demonstrated it with Pokémon: Claude 3.7 came much further than previous models. Claude Code arrives. The Anthropic model has always highlighted in the scope of programming, and now they wanted to promote that capacity with Claude Code, a BASDA tool in Claude 3.7 Sonnet but specifically focused on helping programmers to develop their projects. A programming agent. This could also be considered as Anthropic’s first agent, because Claude Code is able to complete programming projects autonomously without needing user interaction. Thus, Claude can search between basis with code on which to base, read and edit files, write and execute tests, publish the code in Github repositories and execute commands on a console while informing developers of the entire process. He Anthropic demonstrative video It allows you to check some of those functions. Similar to Grok3 in performance. The new Grok 3 presented these days by XAI showed one more step in its performance in the most demanding benchmarks today, and Claude 3.7 is also in that line, which means that It is something superior In those tests to models such as O1 and O3-mini (from OpenAI) and Deepseek R1. In Xataka | I have tried Deepseek on the web and in my Mac. Chatgpt, Claude and Gemini have a problem

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.