Artificial intelligence (AI) is not evolving: it is taking off. In just two and a half years we have gone from GPT-3.5 to GPT-4Oand who has tried both knows it: the difference in the conversation experience is huge. GPT-3.5 marked a before and after when inaugurating the era Chatgptbut today no one would probably use it again if it has more advanced models.
Now, what does it mean that a model is more advanced? The answer is complex. We talk about broader context windows (that is, the ability to read and process more information at the same time), more elaborate results and, in theory, of fewer errors. But there is a point that is still thorny: hallucinations. And do not always advance in the right direction.
What are hallucinations? In AI, hallucinating means inventing things. They are answers that sound good, even convincing, but that are false. The model does not lie because it wants, it simply generates text depending on patterns. If you don’t have enough data, you imagine them. And that can go unnoticed. There is the risk.
O3 and O4-mini: more reasoning, more errors. In September last year the so -called reasoning models arrived. They supposed an important leap: they introduced a kind of chain of thought that improved their performance in complex tasks. But they were not perfect. O1-PRO was more expensive than O3-mini, and not always more effective. Even so, this whole line was presented with a promise: reduce hallucinations.


The problem is that, according to Openai’s own data, that is not happening. Techcrunch cites A technical report of the company where it is recognized that O3 and O4-mini They hallucinate more than their predecessors. Literally. In internal tests with Personqa, O3 failed in 33% of the answers, twice as O1 and O3-mini. O4-mini made it even worse: 48%.
Other analysis, like that of the independent laboratory transluceshow that O3 even invents actions: he said he had executed code in a MacBook Pro outside Chatgpt and then have copied the results. Something that simply cannot do.
A challenge that is still pending. The idea of having models that do not hallucinate sounds fantastic. It would be the definitive step to fully trust your answers. But, meanwhile, it’s time to live with this problem. Especially when we use AI for delicate tasks: summarize documents, consult data, prepare reports. In those cases, it should be reviewed all twice.
Because there have already been serious errors. The most popular was that of a lawyer who presented to the judge documents generated by Chatgpt. They were convincing, yes, but also fictitious: The model invented several legal cases. The AI will advance, but the critical judgment, for the moment, remains our thing.
Images | Xataka with chatgpt | OpenAI
In Xataka | Some users are using OPENAI O3 and O4-Mini to find out the location of photos: it is a nightmare for privacy
In Xataka | If you’ve ever been afraid of chasing you a robot, China has organized a half marathon to breathe calm
GIPHY App Key not set. Please check settings