Claude just demonstrated it with Firefox

For years, finding serious vulnerabilities in complex software has been a task reserved for specialized researchers who spend weeks or months examining millions of lines of code. That scenario is beginning to change. Artificial intelligence models are no longer limited to generating code or helping to debug it, they are also beginning to detect security flaws on their own. A recent example has been shown by Anthropic with Claude Opus 4.6its most advanced model, when put to the test with Firefox. The experiment is especially striking because Firefox, managed by Mozilla and used by hundreds of millions of people, is one of the most audited open source projects in the web ecosystem.

Analyze the Firefox browser code. During two weeks of testing, the system identified 22 different vulnerabilities, according to information published by both organizations. Mozilla assessed 14 of them as high severity flaws, meaning they could have served as a basis for attacks if someone had developed the appropriate exploit code. According to those responsible for the project, most of these problems have already been solved in Firefox 148, the version published in February, while the rest will be corrected in future versions.

Inside the experiment. Claude’s work was not a simple automatic search for errors. According to Anthropic, the team first used the model to try to reproduce historical vulnerabilities recorded in Firefox, a way to test if it was able to recognize real failure patterns. Then they moved on to the most interesting part of the experiment: asking it to analyze the current version of the browser to locate problems that had not yet been reported. The process started in the JavaScript engine and then expanded to other areas of the code. In total, the analysis covered thousands of files from the project, including several thousand C++ files, generating a long list of findings that were subsequently reviewed by the researchers.

Firefox
Firefox

A striking fact. Claude found more high-severity bugs in two weeks than the browser usually receives in about two months through its usual investigation channels. During the process, the Anthropic team submitted 112 unique reports to the project’s bug tracking system, although not all were confirmed vulnerabilities. Part of Mozilla’s job was precisely to review, debug and classify those findings before determining which ones had real security implications. The experience ended up becoming a direct collaboration between both organizations to review the results and prioritize corrections.

The other half of the problem. The Anthropic team also wanted to see how far the model could go beyond detecting errors and turning those failures into real attacks. To do this, they asked him to develop exploits capable of taking advantage of the discovered vulnerabilities. The experiment included hundreds of runs with different approaches and cost approximately $4,000 in API credits. Still, the result showed a clear difference between the two capabilities: Claude only managed to generate two working exploits in a simplified test environment, without some of the defenses present in a real browser.

Beyond the specific case of Firefox, the experiment reflects a change that is beginning to worry and interest the security community at the same time. AI-based tools are rapidly improving at detecting vulnerabilities in complex software, which could help developers fix bugs more quickly.

Images | Anthropic | Rubaitul Azad

In Xataka | iPhones were supposed to be the most secure cell phones in the world. It was supposed

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.