usatoday24

We had been waiting for the new family calling 4 artificial intelligence models for a long time. Last weekend the company finally revealed those models and Everything seemed promising. The problem is that the way of announcing them is generating some controversy and an uncomfortable conversation: that perhaps they have cheated in the benchmarks.

Call 4 seems great. As soon as they appear on the scene, the new models call 4 goal surprised by their excellent performance in Benchmarks. They were second in the ranking LMARENAonly below Gemini 2.5 pro experimental. However, suspicions soon appeared, because the flame 4 version that is available to all audiences was not the same as it was shown in that ranking.

The AIs are becoming very ready. And create tests to put them in trouble is being increasingly difficult

Trucada version? As indicated in the advertisement As a finish line, that flame 4 version was an “experimental” that obtained a 1,417 points in LMarenawhile Gemini 2.5 Pro experimental had obtained 1,439 points. Some experts pointed out that this experimental flame version 4 was a version that cheated and had been specifically trained with data sets used in Benchmarks to be able to score well in them.

We have not cheated.Ahmad al-Dahle is the head of the generative division in the finish line, and therefore is in charge of the flame launch 4. This manager has denied sharply The rumors that point to what goal would have cheated to get better scores in the benchmarks. These rumors “are false and we would never do that,” he said.

But it was “optimized”. As indicated In TechCrunchin that official announcement Meta did pointed to the experimental flame 4 model that had scored very well was “optimized for conversation.” In Lmarena They indicated What a goal should have explained better what type of model had sent to include in the ranking.

The same calls 4 is not so good. Some experts who They analyzed flame performance 4 with synthetic or conventional tests They already warned that performance It didn’t seem so good As they claim in goal. The publicly available model showed a behavior that He did not adjust to the quality that pointed its score in LMarena.

Not quite consistent. Al-Dahle himself confirmed that some users were seeing “different quality” results of Maverick and Scout, the two flame versions 4 available, depending on the supplier. “We hope that some days are late when public implementations are adjusted,” and added that they would continue working to correct possible errors.

A rare release. What a goal this model will launch a Saturday is strange, but when asked about it Mark Zuckerberg He replied that “is when it was ready.” That also the model used in LMarena is not the same as people can use is also worrying, and it may begin to distrust us from benchmarks and companies that use them to promote their products. It is not the first time that This happens Not much less, and it will not be the last one.

In Xataka | Openai is burning money as if there were no tomorrow. The question is how much can endure like this

The new Meta Model took a very good score at the benchmarks. Maybe too good

What do you think?

If you are looking for a good iPhone, a lot of eye at Mediamarkt’s last offer in this model compatible with Apple Intelligence

The owner of a Tesla Model and has filled his roof of solar panels to load “up to 100 km”. It is not a good idea

Barcelona erased a Google bus line so that tourists do not satura. Tourists have told him “good attempt”

Apple has been obsessed with the Iphone Premium for years. All I needed was a good and cheap

With Gemini he is doing practically everything good

We do not know what the Benchmarks of Ia measure. So we have talked to the Spanish who created one of the most difficult

The trade war threatens to cut his wings in full takeoff

The recipe of generation Z against exhaustion consists of not waiting at 65 to retire

This is what we know after the versions of a possible cyber attack

His reasoning models finally do what until now was impossible for them

The happy outcome of the Nero Alpha that almost ruins an island

Green hydrogen consumes huge amounts of water. A new incredibly simple invention allows you to use seawater

Leave a ReplyCancel reply

In the middle of the commercial war, China has found a way to punish US exports. And you don’t need tariffs

‘A Minecraft movie’ is becoming such a large phenomenon that is causing disturbances in cinemas

This 1,000 hp electric supercar is a wheel vacuum. And to demonstrate it they have put it to do the pine

It already has an alternative to Nvidia Cud

news and how to download the latest stable version

A 14 -day Chinese invasion

What do you think?

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Hey Friend!Before You Go…

Hey Friend!
Before You Go…