AI agents have indeed changed work and the economy forever. But for now only in one sector: programming

AI agents are beginning to demonstrate their capabilities, but the only area in which they do so is programming. An Anthropic report reveals how software engineering is where half of the activity of AI agents is currently concentrated, and that proves two things. The first, that AI can effectively enhance work. The second, that there is a huge opportunity for hundreds of verticals where AI has barely landed. what has happened. If there is a sector that has embraced AI and AI agents, it is programming. Platforms like Cursor or WindSurf first and like Claude Code, OpenAI Codex or Antigravity today have made all kinds of people —whether they know programming or not— can turn their projects into reality in a really simple way. It’s a clear case of how AI can contribute to a field, but there’s a problem: it’s practically the only case where it has actually done so. Distribution of requests to AI tools by segment. Software engineering is almost responsible for 50% of those calls or requests, at least in the case of the Claude platform. Source: Anthropic. Verticals with a lot of margin. As can be seen in this graph, the presence of AI agents is very reduced or practically non-existent in a large number of verticals in which it is evident that there is a notable opportunity to take advantage of these tools. The automation of office tasks is the second main protagonist with 9.1% of the function calls of the Anthropic AI model in this report. Below it we find segments such as marketing, sales, finance, business analysis or scientific research. And others who are ignoring AI. There are quite a few sectors in which AI agents seem to be barely present. The travel, legal, medical, e-commerce or education segments seem perfect to start taking advantage of these tools, but at the moment this is not the case and this presence is very, very small in all of them. Claude Code can work longer and longer. Double what it was three months ago, in fact. Source: Anthropic. Models can now work autonomously for a long time. In these scenarios it is true that the models used to be limited by the time they could function autonomously and “chain” actions and self-analyze progress to continue acting. That’s not so true now. Claude Code, for example, has doubled the time of his longest sessions in just three months: from 25 minutes in October 2025 to 45 minutes in January 2026. And they need less human intervention. Another of the revealing data of the study is that the evolution of these agents not only means that they can function autonomously for longer periods of time, but that this also implies fewer human interventions. Those situations in which an agent “needs human help” to continue with the process are becoming limited. In August 2025, the average was 5.4 human interventions per session. In December that average dropped to 3.3 interventions. We trust more and more in AI. At Anthropic they have also noticed a unique behavior among users: they are increasingly trusting AI agents. In programming, novices approve each new step before it is executed, but veterans delegate and intervene when something goes wrong: they have gone from pre-approving everything to exercising active and constant monitoring. As they say at Anthropic“Users develop confidence as they work with the model, and change their monitoring strategy based on that growing confidence.” From programming to other fields. What has happened with programming could happen in other scenarios. The challenge is to build AI agents that adapt to each segment using that specific data from said vertical. If an AI wants to help in the legal segment, it must be specifically trained for that segment. What the AI ​​did when trained with thousands of code repositories on GitHub It was learning and improving. Well, the same can be applied to other verticals, although the challenge is certainly notable because programming was a perfect segment for the application of AI: it is very deterministic. It either works or it doesn’t, and whether it does or not, execution logs allow you to fine-tune that operation. The new unicorns await. As entrepreneur Garry Tan points out in your newsletterin the last two decades SaaS platforms have managed to capture 40% of venture capital investments and that industry has more than 170 unicorns. “The thesis is simple,” Tan concludes, “all of those unicorns have an equivalent in the form of vertical AI waiting.” Promises and realities. The AI ​​agent segment therefore promises many changes in a multitude of segments, but the reality is that today the practical success (there is no economic success at the moment) of AI is limited to the world of programming. Will we be able to transfer it to other segments? The opportunity is there, but it is one thing to say it, and quite another to do it… even if it is with AI. Image | Joshua Reddekopp In Xataka | Every time Facebook had a competitor, it bought it: it is exactly the same thing that OpenAI is doing

Programming is the new board of AI. OpenAI and Anthropic have made it clear with GPT-5.3-Codex and Claude Opus 4.6

When ChatGPT broke out in November 2022, OpenAI seemed unrivaled. And, to a large extent, that was the case. That chatbot, despite its errors and limitations, inaugurated a category of its own. However, in the technology sector advantages are rarely permanent and, in 2026, the position of the company led by Sam Altman It’s a far cry from what it had then. Google has managed to attract the general public with Nano Banana Prowhile Gemini steadily gaining ground as an artificial intelligence chatbot. At the same time, ChatGPT’s market share has fallen significantly in some markets. Anthropic, for its part, has established itself as a reference in software engineering and has become one of the preferred tools among programmers. In this race to set the pace of AI, this Thursday we witnessed a curious movement: the almost simultaneous arrival of two models focused on programming, GPT-5.3-Codex and Claude Opus 4.6. The coincidence does not seem coincidental and reflects the extent to which the major players in the sector compete to define the next step, in a scenario where the main beneficiaries are, once again, the users. With these new models already on the table, the question becomes what they really contribute. There are plenty of promises and they are also beginning to appear benchmarks comparable that help to place them. So, therefore, it is time to look in a little more detail at what OpenAI and Anthropic propose for those who use AI as a development tool. GPT-5.3-Codex and Opus 4.6 enter the scene: what each promises to developers GPT-5.3-Codex is presented as a model focused on scheduling agents which seeks to expand the scope of what a developer can delegate to AI. OpenAI claims that it combines improvements in code performance, reasoning and professional knowledge over previous generations and is 25% faster. With this balance, the system is oriented to prolonged tasks that involve research, use of tools and complex execution, while also maintaining the possibility of intervening and guiding the process in real time without losing the work thread. One of the most striking elements that OpenAI highlights in this generation is the role that Codex itself would have had in its development. The team used early versions of the model to debug training, manage deployment, and analyze test and evaluation results, an approach that accelerated research and engineering cycles. Beyond that internal process, GPT-5.3-Codex also shows progress in practical tasks such as the autonomous creation of web applications and games. The company has published two examples that we can try right now by clicking on the links: a racing game with eight maps and a diving game to explore reefs. Anthropic’s turn comes with Claude Opus 4.6, an update that the company presents as a direct improvement in planning, autonomy and reliability within large code bases. The model, they claim, can sustain agentic tasks for longer, reviewing and debugging its own work more accurately. The idea is that we can use these capabilities in tasks such as financial analysis, documentary research or creating presentations. Added to this is a context window of up to one million tokens in beta phase, a leap that seeks to reduce the loss of information in long processes and reinforce the usefulness of the system. Beyond the core of the model, Anthropic accompanies Opus 4.6 with a series of changes aimed at prolonging its usefulness in real workflows. Among them there are mechanisms such as the so-called “adaptive thinking”, which allows the system automatically adjust the depth of your reasoning depending on the context. Configurable effort levels and context compression techniques designed to sustain long conversations and tasks without exhausting the available limits also appear on the scene. Added to this are teams of agents that can be coordinated in parallel within Claude Code and deeper Excel or PowerPoint integration. While OpenAI’s product, GPT-5.3-Codex, is not yet available in the API, Anthropic’s is. Maintains the base price of $5 per million entry tokens and $25 per million exit tokenswith nuances such as a premium cost when the prompts exceed 200,000 tokens. Measure who wins with numbers? When trying to put GPT-5.3-Codex and Claude Opus 4.6 face to face, the main obstacle is not the lack of figures, but rather their difficult correspondence. Each company selects evaluations that best reflect its progress and, although many belong to similar categories, they differ in methodology, versions or metrics, which prevents a direct reading. In this type of models, this fragmentation of results is part of the state of the technology itself, but also requires cautious interpretation that separates technical demonstrations from truly equivalent comparisons. Only from this filter is it possible to identify the few points where both systems can be measured under comparable conditions and draw useful conclusions for developers. If we restrict the analysis to truly comparable metrics, the common ground between GPT-5.3-Codex and Claude Opus 4.6 is limited to two specific evaluations identified through our own research: Terminal-Bench 2.0 and OS World in its verified version. The results show a distribution of strengths rather than a clear supremacy. GPT-5.3-Codex marks a 77.3% in Terminal-Bench 2.0 compared to 65.4% for Opus 4.6, which points to greater efficiency in terminal-centric workflows. On the contrary, Opus 4.6 reaches a 72.7% on OSWorldsurpassing the 64.7% of GPT-5.3-Codex in general interaction tasks with the system, a contrast that reinforces the idea of ​​specialization according to the environment of use. So we could say that the capabilities described by each manufacturer point to tools that are no longer limited to generating code, but rather seek to participate in prolonged processes of analysis, execution and review within real professional environments. This transition introduces new selection criteria that go beyond punctual performance. In Xataka | OpenAI has a problem: Anthropic is succeeding right where the most money is at stake

We believed that Stack Overflow was essential for programming. AI is proving the opposite

For more than a decade, programming and Stack Overflow They were almost synonymous. When faced with an error, a question, or a line of code that didn’t work, the gesture was automatic: open the browser, search for the exact question and trust that someone, somewhere in the world, had already gone through the same thing. Today that reflection begins to fail. Not because the problems have disappeared, but because the conversation seems to have shifted. The data suggests that the place where millions of developers asked new questions in public is becoming increasingly silent. Therefore, the value of Stack Overflow was not just in accumulating answers, but in how it constructed them. Each question was left open, debated and refined until the community agreed on which solution deserved to be highlighted. This process turned the platform into a technical thermometer: it allowed us to detect which languages ​​were growing, which frameworks generated the most friction, and where the real problems of modern development were. Over time, that dynamic led many to assume that the software ecosystem as we know it would be difficult to understand without this collective repository. The data that set off the alarms To understand what is happening, perceptions are not enough. The graph comes from Stack Exchange Data Explorer (SEDE)a public tool that allows you to run SQL queries on historical data from the Stack Exchange network. In this case, the number of new ones has been measured questions posted on Stack Overflow month after month. It is an imperfect metric, but very revealing when its evolution over time is analyzed. The fall of Stack Overflow reflected in a Stack Exchange chart The data allows the recent history of Stack Overflow to be divided into fairly clear stages. Between 2008 and 2014, the platform experienced a phase of accelerated expansion, coinciding with its adoption as a global reference to resolve programming doubts. Starting in 2015 and until 2021, it enters a long stage of maturity, with high and relatively stable volumes of new questions. The turning point comes in 2022, when the trend reverses and the number of queries begins to fall steadily, a moment that coincides in time with the public emergence of tools such as ChatGPTa change of context that helps interpret the chronology, although it does not explain it on its own. A historic low: The fall not only continues, but accelerates in the last section. Data from that series shows a decline from around 17,000 questions per month at the beginning of 2025 to approximately 3,800 in January 2026, the lowest level reflected in the graph in its final stretch. This fall marks a before and after, because it no longer speaks of progressive wear and tear, but rather of an abrupt change in the use of the platform. The need for help does not disappear, but it changes location. Compared to Stack Overflow’s open model, AI offers immediate responses adapted to the context that the user provides, with results that may vary in quality and precision. You don’t have to formulate the question well for a broad audience or expose yourself to public corrections. Just ask for it. That comfort does not in itself prove a direct causal relationship, but it fits with the moment when public participation begins to fade. AI enters the workflow: The internal x-ray reinforces what the graph suggests. According to Stack Overflow’s 2025 Developer Surveyconducted among more than 49,000 developers around the world, the use of AI tools already reaches 84% ​​of those surveyed, compared to 76% the previous year. GPT models lead that adoption, followed by Claude Sonnet and Gemini Flash. It is not a marginal technology, but rather a layer integrated into everyday life, which helps contextualize why fewer and fewer doubts are raised in public. OverflowAI and the product pivot: Far from ignoring the change, Stack Overflow has begun to integrate artificial intelligence into its own proposal. OverflowAI is a suite designed to allow semantic searches and AI-generated responses that summarize knowledge already validated by the community. The idea is not to replace human responses, but to reorganize and make more accessible the enormous archive accumulated over years. In a context of falling new questions, the platform tries to remain useful as a point of consultation, although the interaction no longer takes the traditional form of the forum. Integrate into the AI ​​ecosystem: In parallel with the collapse of new questions, Stack Overflow has closed agreements with OpenAI and Google Cloud achieved between 2024 and 2025 that place their content within the flow of development and improvement of language models. These agreements allow the platform’s technical file to be used as a reference to increase the accuracy of the responses. In practice, references to Stack Overflow may appear in some technical responses generated by AI assistants, although this does not in itself imply a stable return on direct participation by developers. With this panorama, the question is no longer whether Stack Overflow has lost centrality, but what it means today to “continue to exist” for a platform like this. Data shows that public questions have dropped to historic lows, while accumulated knowledge continues to have value on and off site. Stack Overflow may stop being the place where you ask questions and become, above all, a silent layer that feeds other systems. What remains up in the air is whether this transformation is compatible with the open spirit that made it essential. Images | Xataka with Gemini 3 Pro In Xataka | As Google enters the AI ​​race, Samsung has opted for a more intriguing move

It is surely the best model for programming, but it still has a big problem

Anthropic has announced Claude Opus 4.5, its most advanced AI model to date. The company claims it is the best in the world for programming, intelligent agents and computing usage, beating OpenAI’s GPT-5.1 Codex-Max and Google’s Gemini 3 Pro. It has also arrived a few days after both as well as Grok 4.1. The general overview. The new model has achieved 80.9% accuracy in SWE-Bench Verified, the benchmark reference to evaluate software engineering capabilities. Anthropic has also put it through its own hiring test for engineers – notably difficult, with a two-hour limit – and the model has outperformed every human candidate who took it. Why is it important. This release solidifies Anthropic as a leader in AI tools for programming. Even Meta uses Claude for its internal Devmate code assistantdespite competing directly with the company in other areas. The improvements are not limited to the code. Opus 4.5 stands out in: Creation of documents, spreadsheets and professional presentations. Deep research tasks with multiple sources. Advanced visual and mathematical reasoning. Management of subagent teams for complex multi-agent systems. In figures. Additionally, Anthropic has drastically reduced the price of its API: from $15/75 per million tokens entrance/exit at 5/25 dollars. And the model is more efficient than its predecessors: In medium effort mode, it equals the performance of Sonnet 4.5 but consumes 76% less tokens. In high mode, it beats Sonnet 4.5 by 4.3 percentage points using 48% less tokens. The context. The company has introduced that “effort” parameter (low, medium, high) that allows developers to control how long and tokens invests the model in solving a problem. It is a trend that OpenAI has also adopted in its latest modelsseeking efficiency without sacrificing quality. In detail. Along with the model, Anthropic has updated its development platform and consumer applications: Claude Code Improves your planning mode: Asks clarifying questions before creating an editable execution plan file. As seen with the Deep Research on duty. Claude for Chrome is now available to all Max users (around $100-$200 per month depending on limits), allowing AI to manage tasks across multiple browser tabs. Claude for Excel opens to Max, Team, and Enterprise users, with support for charts, pivot tables, and file uploads. Endless conversations– Long conversations no longer run into context window limits thanks to automatic summaries. Yes, but. The big problem with Opus 4.5 and Claude in general is its usage limit. Even for Pro and Max subscribers of the first levelthe tokens They sell out quickly. They take five hours to restart from the first message sent. The Opus model, being the most powerful, is also the one that consumes the quotas the fastest. This is the main source of frustration for users who pay $20 or even $100 a month. Anthropic has slightly increased the limits for Max and Team Premium, but the experience is still far from what is expected in a service of this category. Between the lines. The release of Opus 4.5 restores balance to the Anthropic model family. For the past two months, Sonnet 4.5 was outperforming the older Opus 4.1, leaving little reason to use the more expensive model. Now, with three clearly differentiated models (Haiku, Sonnet and Opus), each one has a specific purpose in terms of cost, speed and capacity. And now what. Anthropic follows a clear strategy: position itself as the premium provider for knowledge professionals and developers, competing directly with OpenAI and Google in the field where accuracy and reliability matter most. But if you don’t solve the problem of usage limits, you risk frustrating the very users who could get the most value from the model. In Xataka | AI is transforming the relationship we have with our own ideas: we no longer create, we just “edit” ourselves Featured image | Anthropic

Anthropic wants to be unbeatable in programming, although his ambition goes further

Anthropic has just presented Claude Sonnet 4.5an evolution that The company defines as its most precise model to date. The focus is in Agentsprogramming and computer use, with the idea of ​​expanding what the previous versions of the Sonnet series already offered. His arrival is interpreted inside an increasingly adjusted struggle: Openai has launched GPT-5 With different levels of capacity and Google continues to bet on Geminiconfiguring a board where each advance generates new expectations. The family’s trajectory helps to understand the place occupied by this new version. With Sonnet 3.7Anthropic introduced a hybrid reasoning model that marked a remarkable leap in coding, content generation and data analysis. The subsequent arrival of Sonnet 4 He consolidated that bet, reinforcing his position as a practical option for attendees. These improvements made Sonnet into UNa Outstanding Alternative for Programmersand it is from that base where the expectation is now raised about what 4.5 can contribute. What Anthropic promises with his new model Sonnet 4.5 introduces improvements designed for agents that require maintaining attention for long periods. According to Anthropic, he is able to sustain the focus during More than 30 hours in complex tasks and admits outputs of up to 64,000 tokens, which expands the capacity of planning and generating code in extensive blocks. The developers have finer controls about the time that the “think” model before responding, which opens margin to balance speed and detail based on the need for each project. Another of the areas where Sonnet 4.5 seeks to differentiate is in the use of computer and browser. Anthropic points out that the model has reached 61.4% in Osworld, a Benchmark which measures the ability to complete real tasks in a desktop environment. This is a considerable leap compared to 42.2% obtained by Sonnet 4 just a few months ago. The company shows practical examples with its extension of Chrome, where Claude is capable of navigating websites, filling spreadsheets or perform competitive analysis without constant supervision. Programming is the land where Sonnet 4.5 wants to consolidate its leadership. Anthropic ensures that the model can cover The entire development cycle Software: from initial planning to the refactorization of large projects, through the maintenance and correction of errors. With Claude Code’s support, he seeks to become a stable assistant for technical teams. The range of Sonnet 4.5 extends to a wide range of applications that, according to Anthropic, make it a model designed for corporate and research environments. The most repeated examples in your presentation include: Cybersecurity: deployment of agents that correct failures without human intervention. Finance: Constant monitoring of regulatory changes and risk management. Productivity: Edition and creation of office files in different formats. Investigation: Integration of internal and external data to prepare reports. CONTENTS: writing with math understanding and deep semantic analysis. The company adds that Sonnet 4.5 has passed reviews with external experts to validate its safety and reliability. Sonnet 4.5 is now available for any user in Claude.AIboth on the web and in iOS and Android applications. In parallel, developers can integrate it into the Claude Developer Platform, in addition to services such as Amazon Bedrock and Google Cloud VerTex AI. The free plan works with a session limit that is restarted every five hours and with a variable number of messages according to the demand. Regarding prices, part of $ 3 per million input tokens and $ 15 per million departure tokens. Images | Anthropic | Xataka with Gemini 2.5 In Xataka | “The humanoid robots is pure fantasy”: Irobot’s co -founder believes that there is a robotics bubble

AI is turning software programming into a assembly chain

AI is not only transforming the labor market, but also It is changing the nature of the work itself and the tasks that They make up every position. What was previously a highly Specialized and well paidlike that of Software Engineernow Run risk If becoming much more routine, segmented and, above all, accelerated. Something closer to an operator in a assembly chain. According to The New York Timessome Amazon engineers already begin to feel like that. From the industrial revolution to digital automation The impact of AI is already beginning to be noticed among the Technology template. Not so much for the destruction of jobs, as for the transformation of Those who already existed. Often It is equated The arrival of AI with the Industrial Revolution for the change of technological paradigm that was lived. The History has demonstrated that, in the medium term, the automation was not linked to the destruction of employment, but to A displacement and transformation of labor tasks. During the industrial revolution, and then with the passage to Model ‘Fordist’ Mass production with assembly chain, no jobs were destroyed. Its nature was changed. With the Ford assembly chain, the manufacture of cars in workshops was replaced in an artisanal way, by workers located in A assembly line They specialized in doing a very specific task. That was the key to Increase productivity and reduce costs. Automation allowed workers to will focus on specific functionswhile the machines assumed the heaviest or repetitive work. According to A study conducted by MIT researchers and the University of Pennsylvania, Amazon, Google and Microsoft are using their AI tools in a similar way. Software operators With the As a co -pilotlarge technological ones are automating certain tasks of their software engineers that previously required working hours. That liberated time dedicate it to Improve existing products or get new products in less development time. That is, this process is taking a new level, where IA begins to play the role equivalent to industrial machinery of the nineteenth century. Andy Jassy, ​​CEO of Amazon, acknowledged in A LinkedIn post That thanks to this automation with AI, the company had “saved the equivalent of 4,500 years of development work.” Amazon Robotic Warehouse Amazon logistics centers can serve as an example of what is happening now in their software development offices. Some years ago, the employees who prepared Amazon shipments had to travel the long halls of their stores to gather the products of each order. However, now technology It has robotized This whole process to eliminate that search task. With the automation of its centers It has not been dispensed with of those jobs, but are now dedicated to specific tasks. of packaging, the number of orders that each employee can process and His work has been devalued. That warehouse staff now assures that his work is monotonousrepetitive and poorly paid. The same that happened with the operators of the assembly chains. Software engineers begin to notice the same symptoms, with an acceleration of their workflows and greater automation. “If they tell you that you have to review the code, it is never a fun part of work. When you work with these tools (of AI attendance), it is most of the work,” assured Simon Willison, veteran programmer and blogger. The consequences of this automation are already beginning to be noticed in the Junior positions of different sectorsand the programming is exempt. The agents of Which generate code They do the task that the Junior previously developed and, of that practice, acquired the experience that allowed them to ascend. Without this learning, senior engineers will not be trained that tomorrow can review the code generated by an AI and detect errors, or face adequate development strategies. In Xataka | No one is considered a working class in Spain: the percentage has fallen from 50% to 16% in twenty years Image | AmazonUnspash (Adrian Sulyok, Global UI UX Design Agency procreator, Kseniia Ilinykh)

They are algebra courses and programming tutorials

Pornhub is known worldwide as a video of adult video, but in recent years it is being seen educational material among its pornographic content: algebra classes, mathematics tutorials, Educational Videos on Programming, EXPLAINERS about neuronal networks … Why Pornhub? And not youtube? What does this channel have to attract the disseminators? CHANGHSU. The case that was first announced is that of this Taiwanese mathematics professor 34 years old that gives classes (in Chinese) of advanced mathematics. He has published more than 200 educational videos on the platform and has made public In interviews How much it earns with them: more than $ 250,000 a year. In this case it is its only content, but there are others more relevant for its intermediate state between the educational and the erotic Zara Dar. This userfor example, he caught attention with this type of content because it is also an erotic model that has developed his career in Onlyfans and that in Pornhub, however, he has opted for videos with the little erotic titles of ‘What is a loss function in automatic learning?‘ either ‘What is a convex function? – Decrease in gradient part 1‘. Although everyone is around 50,000 visits, they have some that exceed 350,000. Counted to 404Media That your success may be due to the contrast between your videos and the rest of the page (obviously, take a look at your main page, although your videos are safe, it is already extremely NSFW for everything that surrounds them). Why Pornhub. The question is why the preference for this platform for this type of content is due, and according to Changhsu, it is very simple. Pornhub is one of the most visited websites in the world, with more than 100 million visits per day. This great audience presents a unique opportunity for creators of educational content to reach a much broader audience than on traditional platforms. And since the platform is not segmented by the usual thematic interests, the content can reach users who would not otherwise seek tutorials or classes. Less restrictions. But there is more: Pornhub does not put any type of restrictions on these contents, which of course would not make sense in terms of censorship, but this benefits them, since It does not make them fall victims of restrictive algorithmswhat is also known as “Filter bubbles“, as would happen on platforms such as YouTube, prone to show less specialized content. In this way, creators can experiment with formats and have no need to enter the games to study the algorithm to” convince you “to show them more spectators. Brutal shock. According to Changhsuthe contrast to what the Pornhub visitor expects to be essential: there is a deliberate break of social norms and expectations. The appearance of videos of mathematics and science defies these spectatives and resignifies the meaning of virtual space, since it is inevitable to ask: Why should education be restricted to “formal” platforms? They are not the only ones. Although these are the most famous cases, there is a whole subculture of “safe” videos in Pornhub. They are almost metalinguistic games with expectations and go beyond the merely educational. For example, gameplays of ‘Minecraft’, fetish videos without nudes, conventional interviews with porn actresses, non -erotic parodies using pornographic terms such as hook … unfortunately, many of them were eliminated when The rules of use changed and 10 million videos of unregistered users were eliminated. In that raid With legitimate intentions (Pursuing child pornography), unfortunately, one of the most curiously ironic phenomena of modern Internet fell. Header | Pornhub In Xataka | The Japanese are ceasing to consume paper pornography. And that has had a direct effect on its streets

Claude Opus 4 launches and presents it as the best programming model in the world

After Google will display all its artillery in artificial intelligenceAnthropic did not want to be left behind. The company founded by Dario Amodei has moved tab strongly: has presented Claude Opus 4 and Claude Sonnet 4two new models with which he aspires to leave his mark on the race for AI. The announcement star is Claude Opus 4, the most advanced model Anthropic has developed so far. And they do not walk with Rodeos: they assure that it is “the best programming model in the world. An ambitious statement that, as always, will have to be tested. But the first data places it very well positioned in front of its main rivals. In the benchmark Swe-Bench Verified, which evaluates real software engineering tasks, Opus 4 gets 72.5 % in standard conditions and reaches 79.4 % if the Parallel processing. It is a performance that leaves it above models such as GPT-4.1 (54.6 %), O3 (69.1 %) or the recent Gemini 2.5 Pro of Google (63.2 %). However, in other more demanding evidence in multimodal reasoning, such as GPQA Diamond or MMMU, focused on university level questions and complex scenarios that combine text and image, Opus 4 fails to overcome O3, which continues to lead in that field. A model with resistance and autonomy But beyond the numbers, what Anthropic wants to highlight is the resistance and autonomy of this model. Claude Opus 4 is capable of maintaining long work sessions and executing thousands of steps continuously. From the company they explain that this makes it an ideal basis for AI agents More sophisticated: systems that make decisions, complete tasks on their own and do not need constant human supervision. In parallel arrives Claude Sonnet 4, an evolution of the model that Anthropic launched in February. It is not intended to compete with Power Opus, but it offers a very balanced proposal between performance and efficiency. In coding it also makes an important leap with respect to its previous version: it goes from 62.3 % to 72.7 % in Swe-Bench Verified, and improves in reasoning tasks, instructions monitoring and general precision. Both models arrive with interesting news. For example, they can now alternate between reasoning and use of tools Within the same process, which allows more complete answers. They have also improved in reliability. According to Anthropic, they are 65 % less likely to take shortcuts or make serious mistakes than Sonnet 3.7. Claude Opus 4 and Sonnet 4 are already available in the API of Anthropic, at Amazon Bedrock and Google Cloud Vertex AI. They are included in the Pro, Max, Team and Enterprise plans. Prices are kept in the line of the previous models: Opus 4 costs $ 15 per million input tokens and 75 per million departure tokens. Sonnet 4 is more affordable: 3 and 15 dollars respectively. The latter can also be used from free accounts. Images | Anthropic In Xataka | We have tried the new Google AI mode: it is a direct bullet to the blue links that worries and excites in equal parts

Openai has just launched his new programming agent. The interesting thing is what you can do when nobody looks

Artificial intelligence (AI) currently presumes a leading place in the world of programming. More and more software developers resort to AI systems to write code, correct errors or automate repetitive tasks. Openai is betting again for this area with Codexhis new agent. It is a prior view phase tool that acts as a virtual collaborator. Its engine is Codex-1, a variant of O3 adjusted to better understand the needs of modern development. Among their promises are to generate more orderly code, follow instructions with greater precision, among other advantages. How does Codex work? Codex is not a simple assistant that suggests fragments of code. It is a software agent that operates in the background from the cloud. Once connected to your account GITHUByou can access your repository, read files, propose changes and execute tasks such as writing new functions, correcting errors or throwing tests. All this does it autonomously and safely, within an isolated environment (a kind of virtual cloud computer) that simulates your development environment. This should not only protect your system, but also allows Codex to execute tasks without affecting your local workflow. While he works, you can continue using your computer normally. The interesting thing is that it does not execute a single action in turn: it can take care of several tasks at the same time. For example, you can ask you to check a part of the code or Look for mistakes In another section of the project. Each task is managed separately and Codex will inform their real -time progress, allowing the user to review them. The tool is designed to adapt to the way of working of developers. In fact, it can be guided by specific files called agents.md, a kind of instruction manual that allows you to indicate what style follow, how to launch the tests or what practices should be respected within the project. Although Codex has just launched in preview, it is not an unfilming experiment. Openai engineers themselves have been using it for months as part of their workflow. So that? For Automize repetitive tasksrename variables, write tests or outline documentation. The system is also being tested in companies such as Cisco, where they seek to accelerate the development of new ideas, and in temporary, which uses it to purify errors, write automated tests and reorganize large code bases. OpenAi’s recommendation after this first round of tests is clear: assign well -defined tasks, launch several in parallel and experiment with different types of requests. Because the key is to find the exact point where the agent can display his full potential. Codex works remotely, but does it within a controlled environment. As we said above, each task that executes takes place in an isolated virtual machine, without direct Internet connection or access to external services. It can only interact with the code that the user provides and with the tools that are pre -installed using a configuration script. In addition, Openai ensures that he has trained Codex to identify and reject instructions oriented to the creation of harmful software, as tools for Hacking or malware. In any case, we must not lose sight of the fact that this proposal remains a product in research phase, with many aspects to improve. And it is still a generative AI: you can make mistakes or misunderstand certain instructions if the context is not well defined. Openai plans to introduce gradual improvements, such as the possibility of interacting with the agent during tasks, receiving more detailed updates or integrating it with tools such as incident managers. If you want to start trying Codex from today, you must keep in mind that it is not yet available for all users. For now, OpenAi has begun to deploy the tool Among the subscribers of the plans PRO (200 dollars per month)Enterprise and Team. According to the company, the users of the plans Plus and Edu will have “soon” access. Images | OpenAI In Xataka | Saudi Arabia has signed a check of 7,000 million dollars for Nvidia. Jensen Huang is now 12,000 million richer

What is Vibe Coding, and what advantages and disadvantages offers this concept of programming using artificial intelligence

Let’s explain in a simple way What is Vibe Codinga new way of programming using artificial intelligence tools. Or at least it is a term that has been coined to refer to the people who use AI to create code and program. We are going to start the article explaining what Vibe Coding is in a simple way so you can understand it. Then, we will tell you the main advantages and disadvantages that this methodology has. What is Vibe Coding The easiest way to summarize the concept of Vibe Coding that is It has nothing to do with knowing how to program, but to know what to program. Come on, it is something like programming without knowing how to program, using an idea in your mind and resorting to artificial intelligence. Therefore, it is a term coined by artificial intelligence expert Andrej Karpathy for refer to using AI tools to create code instead of writing a person. It’s like saying in an elegant way that you use artificial intelligence to create the code, just like there was an elegant word to say that your drawings have made them with chatgpt, or that you are a “composer” of music that uses Suno. Basically, in this concept or work method it is that you have an idea, and you are asking with natural language to an artificial intelligence chat that Code believes you to do this and the other. Thus, the AI ​​is responsible for generating the code while you supervise the process by making the creative mind through Prompts. It is a concept that is creating a lot of controversy, because some people see it as the future of a more accessible schedule and where it is not so necessary to spend hours picing code. However, others also warn of the dangers of depending too much on artificial intelligence. In the end, as with many other artificial intelligence tools The important thing will be to find a balance. Perhaps to use AI for simple code fragments or sketches on which to work. Advantages of Vibe Coding The main advantage of this method is Make the most accessible programming For anyone. You would no longer need to have programming experience to create applications or software, not even study one or more languages, of all this will take care of artificial intelligence. In addition to this, the Vibe Coding It should also be able to reduce times in projects improving productivity. As the code brute is written by an AI, even if you have to review this, this should reduce development times at a certain level. And finally, even if you want to do a project programming it, you will always have the option of Create prototypes and sketches quickly with the code generated by the AI. Then, from that code you can start working, or directly start from scratch after seeing how it works. Vibe Coding Disadvantages As is evident, the first disadvantage is that You will create a code that can have all kinds of errors. The AI ​​has not yet reached maximum reliability when creating code, so it will be very necessary to invest a lot of time to review everything you think to make sure there are no failures. It is also The loss of technical knowledgesince if a company hires many people who do not know how to program and use only AI to do so, in the end it will not be able to undertake complex tasks. In addition to this, there is also the problem that If a technical or code failure arises But you don’t know how to program, so you will not have the knowledge to solve the problem. This, in the long run can make productivity less. In Xataka Basics | How to improve chatgpt responses: 9 steps to guarantee higher quality and better sources

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.