Imagine an AI that not only answers questions, but can imagine scenarios, predict consequences, or plan actions before executing them. This is precisely what world models promise, a technology that is attracting attention from the main artificial intelligence laboratories and that could radically change how machines understand and interact with their environment.
What exactly are they. World models are AI systems that build an internal representation of the environment, as if they contained a simulation of the real world. Unlike traditional supervised learning, which simply maps inputs to outputs using labeled data, these models learn how an environment works and they can predict what will happen next. It is similar to how humans use mental simulations to anticipate outcomes without needing to physically experience each situation.
The example of the batter. Researchers David Ha and Jürgen Schmidhuber they explain it With a sports analogy: a baseball batter has just milliseconds to decide how to hit the ball, less time than it takes for the visual signal to reach the brain. What allows him to hit a fastball at 100 miles per hour is his ability to instinctively predict where the ball will go. Your muscles react reflexively based on the predictions of your internal mental model, without the need to consciously plan every possible scenario.
Why they matter now. Prominent figures such as Yann LeCun (Meta), Demis Hassabis (Google DeepMind) and Yoshua Bengio (Quebec AI Institute) consider that these models are essential to building truly intelligent systems. The startup World Labs of Fei-Fei Li, one of the most influential figures in AI, raised last year 230 million dollars to develop them.
On the other hand, General Intuition, a new AI lab owned by Medal (known for its app for recording and sharing game clips), just got a financing round of 133.7 million. The investment came primarily from Khosla Ventures founder Vinod Khosla (one of OpenAI’s early investors), who affirms that “multiple companies valued at hundreds of billions, potentially even trillions of dollars will be built” in this field.
How they work. These systems have three fundamental capabilities. On the one hand, they compress complex sensory data (images, videos, text) into simpler representations. Second, they predict future states of the environment based on past and present information. Third, they use that learned model to simulate different actions and choose the best option. It is as if the AI can “dream” different scenarios before acting.
The case of video games. Ha and Schmidhuber also have a clarifying example To do this: imagine an AI learning to play a racing game. Instead of memorizing sequences of moves, you first build an internal model of how the game world behaves: how the car moves, how the road curves, where obstacles appear. You can then imagine future scenarios, testing different driving strategies in your simulated world before applying them in the real game.
Promising applications. world models They are already transforming several fields. In autonomous driving, they allow vehicles to simulate traffic dynamics and pedestrian behavior to make safer decisions. In robotics, robots can imagine different ways to complete a task before executing it, especially useful when real-world training is expensive or dangerous. And in video generation, help create more realistic content: A model that understands why a ball bounces is going to represent it better than one that has simply memorized patterns.
Beyond the video. A better video generation model would be just the beginning. LeCun describe how a world model could help achieve goals through reasoning: Given a video of a messy room and the goal of cleaning it, you could devise a sequence of actions (vacuuming, cleaning the dishes, emptying the trash) not because you have observed that pattern, but because you understand at a deeper level how to go from dirty to clean. “We need machines that understand the world, that can remember things, that have intuition and common sense,” affirms.
The obstacles ahead. Train and run world models requires massive computing powereven compared to current generative models. Although right now thousands and thousands of GPUs are needed cloistered in gigantic data centers that They consume a lot of energy to run current models, training world models is another level. Furthermore, like all AI models, they also have the risk of hallucinate and internalize biases from your training data.
The industry’s bet. Despite the technical challenges, there are different strategies in place. Google DeepMind and OpenAI they bet because with enough multimodal training data (video, 3D simulations and beyond text) a world model will spontaneously emerge within a neural network. LeCun, for his part, believe that a completely new, non-generative AI architecture will be necessary.
What comes next. Several experts also predict that world models will allow you to create interactive 3D worlds on demand for video games, virtual photography and other applications. According to Justin Johnson, co-founder of World Labs, “we already have the ability to create virtual, interactive worlds, but it costs hundreds of millions of dollars and a lot of development time.” They could also revolutionize robotics by giving robots real awareness of their environment and their own body. As resume Mashrabov, “with an advanced world model, an AI could develop a personal understanding of any scenario it finds itself in and begin to reason out possible solutions.”
Although LeCun esteem that we are still at least a decade away from the world models he imagines, the great expectation of the industry to see evolutions in the field of AI and the monstrous investment that this phenomenon is receiving, indicate that this technology could be the next great leap towards machines that not only react to the world, but understand and model it.
Cover image | Michael Marais
In Xataka | “The safety of our children is not for sale”: the first law that regulates ‘AI friends’ is here
 
					 
		


GIPHY App Key not set. Please check settings