Claude Mythos Preview It is the best AI model ever created. We don’t say it, Anthropic says it, but almost no one else can say it because only a select group of companies has access to said model. The cybersecurity capabilities of the model appear to be astonishingbut more and more experts say that although Mythos is better than its predecessors, it is not the revolutionary leap that Anthropic seems to propose. Is that way of launching the model just an effective way of creating hype?
Beware the Anthropic speech. The well-known entrepreneur and analyst Gary Marcus recently gave three reasons why, according to him, the launch of Mythos is not as revolutionary as Anthropic wants us to see. I cited tweets from software engineers and cybersecurity experts who cast doubt on Anthropic’s claims. The company published a study on the capabilities of Claude Mythos Preview that seemed to make it an extraordinary tool for the field of cybersecurity, but at the same time it was so powerful that it could be very dangerous if it fell into the wrong hands.
Isn’t that a big deal? Among Claude Mythos’ achievements, Anthropic highlighted how he had found vulnerabilities in Firefox 147. But in reality many of the flaws were basically variations of the same two bugs. If you removed them from the equation, Mythos’ effectiveness rate at finding new exploits dropped a lot, even below Opus 4.6. Anthropic did not hide that fact, of course, but it makes this capacity, for example, not seem so striking. An X user also criticized the use of Cybench as a cybersecurity benchmark when Opus 4.6 almost completely surpassed it. For him, the choice of some of the Anthropic tests was debatable because they were not a challenge to current models.


Other models can do the same. The co-founder and CEO of Hugging Face, Clement Delangue, stated that Mythos was no big deal. Their argument: they had used small, cheap open models, isolated the relevant code from some examples of the vulnerabilities found by Mythos, and they found the same problems which had already detected the Anthropic model.

According to the Epoch Capabilities Index, which measures the capacity of AI models by combining several benchmarks, the leap that Mythos has taken is striking and “departs” from the progressive line of its predecessors. Source: Anthropic.
Observer bias. But here it should be noted that in those analyzes they knew where to look because Mythos had already found those problems. We are dealing with observer bias, and in fact the Hugging Face document makes it clear that they even gave him specific clues such as “consider integer overflow”) to find those bugs. And on this observation, another one: Hugging Face does not say that a small model can replace Mythos on its own, but that it can be very good by giving it the appropriate code fragment. Mythos seems more capable of blindly complex security breaches, but it is a huge model and that is why it has greater capacity. Or what is the same: Mythos is better because it has the size, design and resources to be better.
Fear, uncertainty, doubt? The language used by Anthropic in this advertisement could be considered to some extent a clear use of FUD (“Fear, Uncertainty, Doubt” -> “Fear, Uncertainty, Doubt”) as a marketing technique. It is a resource that has been seen in the past, and for example OpenAI already said in 2019—years before the launch of ChatGPT—that GPT-2 was too dangerous for a public launch. Obviously it wasn’t, but that certainly served to create expectation about the true capacity of the model.
It’s better, but it may not be revolutionary. The results of the benchmarks that Anthropic published already made it clear that although there are very notable jumps in some tests, in others the evolution is much less striking. Claude Mythos was not the best at everything, and now analysts appear who contrast that data with other metrics. For example, with the Epoch Capabilities Index (ECI) from Epoch AI, the startup that has one of the most reputable benchmarks of the industry. And according to this index, Claude Mythos is above his rivals, but not for long.
The wolf is coming. The truth is that the launch of Claude Mythos Preview has been really striking and the documents that accompanied that document tell us about a really capable AI model. The problem is that it is impossible to verify it because only a few companies have access to it and can test it. Without that public availability the only thing we can do is trust (or not) what Anthropic tells us, and that is the point: it is not clear that we should do it. The company is interested in us buying this discourse, obviously, but without an independent analysis it is impossible to verify these statements.



GIPHY App Key not set. Please check settings