When they sold us generative “Artificial Intelligence” we did not know that it was going to be artificial and generative but not “intelligent”
A few months ago, a group of Spanish researchers thought of putting an AI chatbot to the test with a curious test. They uploaded an image of an analog clock to the chatbot and asked the AI a simple “What time is it on that clock?” The AI failed disturbingly. Machine, can you tell me the time? Researchers from the Polytechnic University of Madrid, the University of Valladolid and the Politecnico de Milano signed a month ago a study in which they wanted to evaluate how intelligent the artificial intelligence of those models was. To do this, they built a large set of synthetic images of analog clocks—available in Hugging Face— in which 43,000 different hours were shown. Before fine-tuning their behavior, the AI models consistently failed when trying to tell the time. After the adjustment the behavior was much better, but still imperfect. That should not happen with such a “simple” issue for humans. disastrous result. From there they asked four generative AI models what time those images of those analog clocks showed. None of them managed to tell the time accurately. That group of models was made up of GPT-4o, Gemma3-12B, LlaMa3.2-11B and QwenVL-2.5-7B, and all of them had serious problems “reading” the time and differentiating, for example, the hands or the angle and direction of those hands in relation to the numbers marked on the watch. Fine tuning to improve. After these first tests, the group of researchers managed to significantly improve the behavior of these models after performing fine tuning: they trained them with 5,000 additional images from that data set and then re-evaluated the behavior of the models. However, the models again failed consistently when tested with a different set of images of analog clocks. The conclusion was clear. They don’t know how to generalize. What they discovered with this test was confirmation of what we have been observing from the beginning with AI models: they are good at recognizing data that they are familiar with (memorized), but they often fail in scenarios that they have never faced and that are not part of their training sets. Or what is the same: they were incapable of generalizing. Dalí enters the scene. To try to find out the causes of these failures, the researchers created new sets of images in which, for example, they used the Dalí’s famous distorted clocksor those that included arrows at the end of the hands. Humans are able to tell time on analog clocks even if they are distorted, but for AI models that was a huge problem. If they do this with watches, imagine with medical analysis. The danger of these conclusions is that they reignite the debate about whether generative AI models are indeed artificial and generative, but not very intelligent. If they have these difficulties in identifying the hands or their orientations, things are dangerous if what the models have to analyze are medical images or, for example, real-time images of an autonomous car driving through a city. AIs are stupid. Although it is true that generative AI models are fantastic as aids in various scenarios such as programming, the reality is that what they do is “regurgitate” responses that are already part of their training data. As Thomas Wolf, Chief Science Officer of Hugging Face, explained, a generative AI “will never ask questions that no one had thought of or that no one had dared to ask.” Although thanks to their enormous memory and training they can recover a multitude of data and present it in useful ways, finding solutions to problems for which they have not been trained is very complicated. For experts like Yann LeCun, the reality is clear: generative AI it’s very stupid and, furthermore, a dead end. Source: clocks.brianmoore.com AI doesn’t draw watches very well either. Added to the experiment of these researchers is another small test that once again calls into question the capacity of generative AI. It involves asking different models to create the code that allows an analog clock to be displayed with the current time. A designer named Brian Moore wanted to share the result of several AI models and the truth is that the result obtained in most of them is terrible, although others like Kimi K2 achieve a good result. We have tested with the recent Grok 4.1 and GPT-5.1. After a little insistence, Grok 4.1 has drawn the perfect clock and it works. With GPT-5.1 there has been no way, at least in our tests. A worrying reality. This inability to solve tasks that seem simple certainly means that these models are not in a good place. It is true that a good prompt can help resolve some of these limitations, but what is becoming increasingly evident is that AI models continue to make mistakes despite the passage of time. The theoretical revolution of this technology precisely needs to eradicate them, and it does not seem that we are on the way to achieving it. The models improve, yes, but not enough for us to trust them 100%. Image | Yaniv Knobel In Xataka | As if there weren’t enough AI companies, Jeff Bezos has just returned from the shadows to build another one, according to the NYT