Anthropic’s AI already writes 80% of its own code because it was inevitable that AIs would improve themselves
“As of May 2026, more than 80% of the code we integrate into the Anthropic codebase was created by Claude.” Those who reveal this information are two Anthropic researchers who have published one of the most revealing texts about the present and future of the company’s AI models. One that tells us about a fascinating and disturbing concept in equal parts called recursive self-improvement. Code multiplier. The impact of these agentic programming tools on the work of Anthropic engineers is being spectacular. According to internal Anthropic data from May 2026, this autonomous code generation has caused an Anthropic engineer to produce eight times more lines of code per quarter today than during the 2021-2025 period. Anthropic’s human programmers they no longer program– Direct and review AI-generated code. A frenetic evolution. The changes we have experienced have been fascinating, they explain in Anthropic. Between 2021 and 2023, engineers wrote all code by hand on their computers. In 2024 they started using chatbots to generate small snippets of code that they then copied and pasted. In 2025, agents capable of work autonomously on entire files. Longer time in a row. According to the METR benchmark which measures the ability of AI to complete complex tasks, in 2022 GPT-3.5 could barely last about 35 seconds operating autonomously without making serious errors. By mid-2026 Claude Opus 4.6 is already capable of working 16 hours in a row on complex tasks. At Anthropic they point out that the length of tasks that an AI model could undertake doubled every seven months, but now it doubles every four. If this trend continues, “tasks that take a person days could be automated with AI. By 2027, AI systems could be able to work on tasks that take a person weeks.” Superhuman performance. Industry benchmarks are being “saturated” by new AI models, which already reach almost 100% of the possible score in many of them. For example SWE-bench, which measured the models’ ability to program, was almost is surpassed for the most recent models. In 2025 Claude pus managed to optimize the code they gave him by making it ran 3x faster. In April 2026 Claude Mythos Preview already achieved a 52x speedup of that code. AI that improved itself. This concept of recursive self-improvement presents a scenario in which an AI model generates data, corrects its own failures, and trains itself continuously. This opens the door to exponential growth in its capabilities, but at the same time reopens a debate on the risks that this type of evolution generates. Source: Anthropic infinite loop. Traditionally, human engineers analyzed the responses of a model, cleaned the data, and adjusted parameters to create the next version of that model. With recursive self-improvement AI takes on that role and evaluates its own performance, generating more complex problems to test itself and generating synthetic data for your next generation. Danger. This autonomy implies a potential risk: that humans lose control of where the AI goes. That we do not know or can assure if it is aligned with our ethics and ideals. The biaseshowever small, can be amplified with this type of iterative process, but the model itself may have mutated its original ethical reasoning mechanisms and security protocols to become something totally unpredictable. The Terminator scenario. Isolation and arbitration. To avoid these risks, at Anthropic they implement this evolution in isolated environments to then verify that everything works as it should. In addition, the company uses independent evaluation models that act as independent arbiters that audit these models. that evolve by themselves. They do this by checking each change in the code to prevent its impact from being harmful to the system or to those who use it. The new bottleneck is the human being. The Amdahl’s law is a formula that is used to find the maximum performance improvement of an information system when only a part of that system is improved. At Anthropic they point out how as AI continues to write more and more code, the real bottleneck is the human being who has to review that code. In Xataka | Anthropic is one step away from being worth as much as Samsung. And what the market is buying is not Claude