The great revolution of GPT-5.3 Codex and Claude Opus 4.6 is not that they are smarter. It’s that they can improve themselves

Last week, OpenAI and Anthropic simultaneously launched their new AI models specialized in programming: GPT-5.3 Codex and Claude Opus 4.6. Beyond the improvements they represent in performance or speed, which are truly amazing, both companies also stated something that completely changes the rules of the game: AI models are actively participating in their own development. Or put another way: AI is improving itself. Why does this change matter?. Generative artificial intelligence tools are reaching a high level of efficiency and precision, becoming in a few years from being co-workers for simple and specific tasks to being able to be involved in a good part of a development. According to the technical documentation of OpenAI, GPT-5.3 Codex “was instrumental in its own creation,” being used to debug its training, manage its deployment, and diagnose evaluation results. On the other hand, it is worth highlighting the words of Dario Amodei, CEO of Anthropic, who in his personal blog affirms that AI writes “much of the code” in his company and that the feedback loop between the current generation and the next “gains momentum month by month.” In detail. What this means in practice is that each new generation of AI helps build the next, more capable one, which in turn will build an even better version. Researchers call it the “intelligence explosion,” and those developing these systems believe the process has already begun. Amodei has declared publicly that we could be “just 1 or 2 years away from a point where the current generation of AI autonomously builds the next.” Most people use free language models that are available to everyone and are moderately capable of certain tasks. But they are also very limited, and are not a good reflection of what a cutting-edge AI model is capable of today. In a brief session with 5.3-Codex I was able to draw this same conclusion, since the AI ​​tools that big technology companies use in their development are nothing like the most commercial ones that we have freely available in terms of capabilities. The code-first approach. Initial specialization in programming makes more sense than we think. And the idea of ​​companies like OpenAI, Anthropic or Google that their systems were exceptional by writing code before anything else is linked to the fact that developing AI requires enormous amounts of code. And if AI can write that code, it can help build its own evolution. “Making AI great at programming was the strategy that unlocked everything else. That’s why they did it first,” Matt Shumer, CEO of OthersideAI, said in a publication that has given us something to talk about these days on social networks. Between the lines. The new models don’t just write code: they make decisions, iterate on their own work, test applications as a human developer would, and refine the result until they are satisfied. “I tell the AI ​​what I want to build. It writes tens of thousands of lines of code. Then it opens the app, clicks the buttons, tests the features. If it doesn’t like something, it goes back and changes it on its own. Only when it decides it meets its own standards does it come back to me,” counted Shumer describing his experience with GPT-5.3 Codex. What changes with self-reference. Until now, each improvement depended on human teams spending months training models, adjusting parameters and correcting errors. Now, some of that work is performed by AI itself, accelerating development cycles. Just like share Shumer and referring to METR dataan organization that measures the ability of these systems to complete complex tasks autonomously, the time that an AI can work without human intervention doubles approximately every seven months, and there are already recent indications that that period could be reduced to four. And now what. If this trend continues, by 2027 we could see systems capable of working autonomously for weeks on entire projects. Amodei has spoken of models “substantially smarter than almost all humans in almost all tasks” by 2026 or 2027. These are not distant predictions, since the technical infrastructure for AI to contribute to its own improvement is already operational. And these capabilities are what are really turning the technology industry on its head. Cover image | OpenAI and Anthropic In Xataka | We have a problem with AI. Those who were most enthusiastic at the beginning are starting to get tired of it.

Creating a C compiler cost 2 million dollars and took 2 years. Claude Opus 4.6 did it in two weeks for $20,000

We are facing a technological inflection point. Uo in which software engineering, one of the most complex and demanding technical tasks in history, little by little It is becoming the “killer app” of AI. It is clear that generative AI models are not perfect, but we continue to see extraordinary evolution. The latest example? The C compiler that Claude Opus 4.6 programmed all by himself. what has happened. Nicholas Carlini, researcher at Anthropic, I counted yesterday how “I’ve been experimenting with a new way of monitoring language models that we’ve called “agent teams””. What it has done is ensure that several programming agents work in parallel using the recently released Claude Opus 4.6, and thanks to that it has developed something exceptional with 16 of these agents: a C code compiler. Hello CCC. At Anthropic they have called it Claude’s C Compiler (CCC), and they have published the code, completely generated by Opus 4.6, on GitHub. The project consists of 100,000 lines of Rust code that were generated in two weeks with an API cost of $20,000. And it works: with it they have compiled a functional Linux 6.9 kernel on x86, ARM and RISC-V. Before it was (at least) two million dollars and two years. What this experiment has achieved is to demonstrate how software development can be much cheaper and faster thanks to the use of these agents. Although there is no readily available data on how much time and money compilers cost in the past, the size of these products was enormous, as is the case with Microsoft Visual C++For example. It is difficult to know how much it cost, but it is estimated that it involved 15-20 people working for five years. That’s a lot of man hours and a lot of money to develop and polish that compiler. The estimate of two years and two million dollars may in fact be overly optimistic. another example. Historically, building a C compiler from scratch was considered one of the pinnacles of systems engineering. Not only was in-depth knowledge of processor architecture required, but thousands of man-hours were required to manage optimization and machine code generation. In the 90s the company Cygnus Solutions (clue in compiler development gcc) came to invest more than 250 million in a decade to maintain and port build tools. The real cost was not just in the final lines of code, but in countless hours analyzing CPU and memory patterns to make the resulting binary efficient. Far from perfect, but… Carlini himself explained in the post that this compiler had serious limitations and for example “it does not have a 16-bit x86 compiler which is essential to start Linux outside of “real mode”, and it does not have its own assembler nor its linker“. It is probably far from mature compilers, but even so the achievement remains exceptional and points to that future in which even very complex developments can be supported with AI. They will be expensive, no doubt, but their total development will probably be a fraction of what they cost a few years ago. Cursor already demonstrated it. Before Anthropic launched its AI-programmed compiler, Cursor completed a similar project, combining GPT-5.2 agents into its development platform to create a working browser in a week. In total the AI ​​programmed three million (!) lines of code in Rust, and although it was again far from being perfect or competing with Chrome, it demonstrated the current capacity of these agentic programming systems. Turning point (especially for Anthropic). For the SemiAnalysis experts Claude Code, current leading exponent of this new era of AI-driven programming, is a paradigm shift: “We believe that Claude Code is the turning point for AI agents and is a glimpse into the future of how AI will work.” This prestigious newsletter predicts an exceptional 2026 for Anthropic, and so much so that they believe it will “dramatically surpass OpenAI.” You ask, the AI ​​programs. If you have tried the vibe codingI’m sure you agree with me: AI allows you to do things you would never have dreamed of. What I did a few weeks ago with Immich made it clear to me, and I continue experimenting with AI and programming “custom” things that solve real problems and needs for me. Yes, for now they are for me and therefore they are not large and complex systems that need to be put into production as happens in professional environments, but I am clear that this is being done little by little and more will be done. In fact, both OpenAI and Anthropic have stood out how in the development of their latest models part of the work has been done, paradoxically, by those same models, which have fed back to each other. And the result is in production and used by millions of people. Something is changing. And it’s something big. In Xataka | OpenAI has a problem: Anthropic is succeeding right where the most money is at stake

Programming is the new board of AI. OpenAI and Anthropic have made it clear with GPT-5.3-Codex and Claude Opus 4.6

When ChatGPT broke out in November 2022, OpenAI seemed unrivaled. And, to a large extent, that was the case. That chatbot, despite its errors and limitations, inaugurated a category of its own. However, in the technology sector advantages are rarely permanent and, in 2026, the position of the company led by Sam Altman It’s a far cry from what it had then. Google has managed to attract the general public with Nano Banana Prowhile Gemini steadily gaining ground as an artificial intelligence chatbot. At the same time, ChatGPT’s market share has fallen significantly in some markets. Anthropic, for its part, has established itself as a reference in software engineering and has become one of the preferred tools among programmers. In this race to set the pace of AI, this Thursday we witnessed a curious movement: the almost simultaneous arrival of two models focused on programming, GPT-5.3-Codex and Claude Opus 4.6. The coincidence does not seem coincidental and reflects the extent to which the major players in the sector compete to define the next step, in a scenario where the main beneficiaries are, once again, the users. With these new models already on the table, the question becomes what they really contribute. There are plenty of promises and they are also beginning to appear benchmarks comparable that help to place them. So, therefore, it is time to look in a little more detail at what OpenAI and Anthropic propose for those who use AI as a development tool. GPT-5.3-Codex and Opus 4.6 enter the scene: what each promises to developers GPT-5.3-Codex is presented as a model focused on scheduling agents which seeks to expand the scope of what a developer can delegate to AI. OpenAI claims that it combines improvements in code performance, reasoning and professional knowledge over previous generations and is 25% faster. With this balance, the system is oriented to prolonged tasks that involve research, use of tools and complex execution, while also maintaining the possibility of intervening and guiding the process in real time without losing the work thread. One of the most striking elements that OpenAI highlights in this generation is the role that Codex itself would have had in its development. The team used early versions of the model to debug training, manage deployment, and analyze test and evaluation results, an approach that accelerated research and engineering cycles. Beyond that internal process, GPT-5.3-Codex also shows progress in practical tasks such as the autonomous creation of web applications and games. The company has published two examples that we can try right now by clicking on the links: a racing game with eight maps and a diving game to explore reefs. Anthropic’s turn comes with Claude Opus 4.6, an update that the company presents as a direct improvement in planning, autonomy and reliability within large code bases. The model, they claim, can sustain agentic tasks for longer, reviewing and debugging its own work more accurately. The idea is that we can use these capabilities in tasks such as financial analysis, documentary research or creating presentations. Added to this is a context window of up to one million tokens in beta phase, a leap that seeks to reduce the loss of information in long processes and reinforce the usefulness of the system. Beyond the core of the model, Anthropic accompanies Opus 4.6 with a series of changes aimed at prolonging its usefulness in real workflows. Among them there are mechanisms such as the so-called “adaptive thinking”, which allows the system automatically adjust the depth of your reasoning depending on the context. Configurable effort levels and context compression techniques designed to sustain long conversations and tasks without exhausting the available limits also appear on the scene. Added to this are teams of agents that can be coordinated in parallel within Claude Code and deeper Excel or PowerPoint integration. While OpenAI’s product, GPT-5.3-Codex, is not yet available in the API, Anthropic’s is. Maintains the base price of $5 per million entry tokens and $25 per million exit tokenswith nuances such as a premium cost when the prompts exceed 200,000 tokens. Measure who wins with numbers? When trying to put GPT-5.3-Codex and Claude Opus 4.6 face to face, the main obstacle is not the lack of figures, but rather their difficult correspondence. Each company selects evaluations that best reflect its progress and, although many belong to similar categories, they differ in methodology, versions or metrics, which prevents a direct reading. In this type of models, this fragmentation of results is part of the state of the technology itself, but also requires cautious interpretation that separates technical demonstrations from truly equivalent comparisons. Only from this filter is it possible to identify the few points where both systems can be measured under comparable conditions and draw useful conclusions for developers. If we restrict the analysis to truly comparable metrics, the common ground between GPT-5.3-Codex and Claude Opus 4.6 is limited to two specific evaluations identified through our own research: Terminal-Bench 2.0 and OS World in its verified version. The results show a distribution of strengths rather than a clear supremacy. GPT-5.3-Codex marks a 77.3% in Terminal-Bench 2.0 compared to 65.4% for Opus 4.6, which points to greater efficiency in terminal-centric workflows. On the contrary, Opus 4.6 reaches a 72.7% on OSWorldsurpassing the 64.7% of GPT-5.3-Codex in general interaction tasks with the system, a contrast that reinforces the idea of ​​specialization according to the environment of use. So we could say that the capabilities described by each manufacturer point to tools that are no longer limited to generating code, but rather seek to participate in prolonged processes of analysis, execution and review within real professional environments. This transition introduces new selection criteria that go beyond punctual performance. In Xataka | OpenAI has a problem: Anthropic is succeeding right where the most money is at stake

The Opus schools decided to keep up with the government and continue segregating by sex. His students are running away

When it came into force in January 2021 the new education lawno one missed that in its provisions there was a direct missile to the waterline of dozens of schools and institutes throughout the country: segregation by sex was prohibited; Only mixed schools could continue to be chartered. What we discovered a couple of weeks later is that the missile came with a timer. Five years later, the timer is reaching zero and many centers are preparing to stop being chartered. Immediately afterwards, a wave of students are trying to leave those schools. What did the law say? The LOMLOE, which is what the law is called, demanded that educational centers that receive public funds “develop the principle of coeducation in all educational stages.” That is, they were prohibited from “not separating students by gender.” However, as competition is regional and each place has different regulations, many of the attempts to apply this point they have been delayed. In Catalonia, for example, when the ERC department tried to eliminate agreements with differentiated education centers, the courts stopped the measures until the agreements were renewed. That period begins at the beginning of 2026. And why does it affect Opus Dei? Strictly speaking, talking about “Opus schools” is a bit inaccurate. It is true that there are many centers in that orbit, but the relationships between them are complex and that means that they are not a uniform whole. However, this group of centers (which in Catalonia number a dozen and receive 35 million each year) are the spearhead of the “anti-coeducational” movement. Thus, many Catalan schools linked to the Prelature are doing the math. Continuing to be concerted would mean losing one of its hallmarks; Not losing it means becoming private (with the increase in fees that this entails). For this reason, the steps they were taking in two schools in the Sant Cugat/Bellaterra area (La Vall – for girls – and La Farga – for boys) were seen as the great privatization experiment. The area is one of the richest and most exclusive in all of Catalonia and, in that sense, it seemed logical to think that they would be two of the schools that would suffer the least from the jump. But the flight of students has begun. El País requested in July (through a complaint to the Commission for Guarantees of Access to Public Information) the data from the official pre-registration process and what these data show is a complete leak. 63 students from La Vall and 96 students from La Farga tried to go to other schools. Finally, only 38 of the first and 74 of the second achieved it; but it is a warning to sailors. Applications for admission also decreased (between 10 and 14%). All this, while a group of families try not to abandon the concert. However, the decision seems firm. Last week, two schools in L’Hospitalet de Llobregat also linked to the Prelature (Xaloc – for boys – and Pineda – for girls) announced that they were going to begin preparing for a more than possible non-renewal of the agreement and the problems that this will entail. According to data from El Paísonly those two schools (with more than 2,800 students) receive seven million euros from the Generalitat. And what situation does all this leave us in? In recent years, the debate about whether single-sex or mixed education it has become more intense. In fact, in some countries like the US, differentiated education It has been experiencing a real boom for a decade. However, the current conversation makes it clear that research on the topic is the least of it. The opposing positions at an ideological, economic and social level They make these investigations become ammunition with which to attack the opponent. For this reason, what everyone in the sector is wondering is how long the legislature will last and what will happen if, eventually, a government of the opposite direction arrives. Meanwhile, what is clear is that differentiated education is going to verify, for the first time in many years, the commitment of its families to the project. Image| Vazovsky In Xataka | The generation of parents who feel guilty because their children spend a lot of time looking at screens

Claude Opus 4 launches and presents it as the best programming model in the world

After Google will display all its artillery in artificial intelligenceAnthropic did not want to be left behind. The company founded by Dario Amodei has moved tab strongly: has presented Claude Opus 4 and Claude Sonnet 4two new models with which he aspires to leave his mark on the race for AI. The announcement star is Claude Opus 4, the most advanced model Anthropic has developed so far. And they do not walk with Rodeos: they assure that it is “the best programming model in the world. An ambitious statement that, as always, will have to be tested. But the first data places it very well positioned in front of its main rivals. In the benchmark Swe-Bench Verified, which evaluates real software engineering tasks, Opus 4 gets 72.5 % in standard conditions and reaches 79.4 % if the Parallel processing. It is a performance that leaves it above models such as GPT-4.1 (54.6 %), O3 (69.1 %) or the recent Gemini 2.5 Pro of Google (63.2 %). However, in other more demanding evidence in multimodal reasoning, such as GPQA Diamond or MMMU, focused on university level questions and complex scenarios that combine text and image, Opus 4 fails to overcome O3, which continues to lead in that field. A model with resistance and autonomy But beyond the numbers, what Anthropic wants to highlight is the resistance and autonomy of this model. Claude Opus 4 is capable of maintaining long work sessions and executing thousands of steps continuously. From the company they explain that this makes it an ideal basis for AI agents More sophisticated: systems that make decisions, complete tasks on their own and do not need constant human supervision. In parallel arrives Claude Sonnet 4, an evolution of the model that Anthropic launched in February. It is not intended to compete with Power Opus, but it offers a very balanced proposal between performance and efficiency. In coding it also makes an important leap with respect to its previous version: it goes from 62.3 % to 72.7 % in Swe-Bench Verified, and improves in reasoning tasks, instructions monitoring and general precision. Both models arrive with interesting news. For example, they can now alternate between reasoning and use of tools Within the same process, which allows more complete answers. They have also improved in reliability. According to Anthropic, they are 65 % less likely to take shortcuts or make serious mistakes than Sonnet 3.7. Claude Opus 4 and Sonnet 4 are already available in the API of Anthropic, at Amazon Bedrock and Google Cloud Vertex AI. They are included in the Pro, Max, Team and Enterprise plans. Prices are kept in the line of the previous models: Opus 4 costs $ 15 per million input tokens and 75 per million departure tokens. Sonnet 4 is more affordable: 3 and 15 dollars respectively. The latter can also be used from free accounts. Images | Anthropic In Xataka | We have tried the new Google AI mode: it is a direct bullet to the blue links that worries and excites in equal parts

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.