The development of AI models is becoming more and more diversified. In addition to large-scale language models and small-scale language models, "world models", which are called world simulators, are being regarded as one of the next key development directions of AI.
In 2024, AI pioneer and computer scientist Feifei Li's spatial intelligence startup, World Labs, has completed two rounds of funding with the goal of building a 'big world model' and is currently valued at $1 billion, while Google DeepMind has poached OpenAI's image generation models Sora One of the people responsible for building the World Simulator; OpenAI also describes Sora as a model of the world.
Giving AI an understanding of the real world
The AI world model is inspired by the human mind model - the human brain takes in information from the senses to develop a more concrete understanding of the world around it.
In a paper, AI researchers David Ha and Jürgen Schmidhuber cite the example of baseball hitters who can hit a 100 mph fastball because they can 'instinctively' predict the direction of the ball, which is reasoned and happens subconsciously - their muscles instinctively swing the bat at the right time and place based on the predictions of the brain model. It has been argued that mental modeling is a prerequisite for human intelligence.
As an AI system, an AI world model follows the same path. According to AI startup runway, an AI world model can construct internal cues for the external environment and simulate future environmental events based on those cues; the goal of the world model is to simulate a situation exactly like the real world.
Why are world models in the spotlight?
In fact, the concept of world modeling has been around for more than a decade, but the One of the reasons for this growing interest is the rise of AI-generated video The
TechCrunch observes that most AI-generated video content today still suffers from the Valley of Horror phenomenon, such as showing limbs as twisted or fused to each other. In addition, while generative AI models may be able to accurately predict physical phenomena such as the direction of a basketball bounce, despite years of image training, they don't actually know why the basketball is bouncing.
In contrast, a world model with 3D world perception can better show the effects of a basketball bounce. In order for AI to realize this insight, the world model needs to be trained on a range of data, including photos, audio, video, and text.
The potential of the world model is not limited to generating videos. researchers such as Meta lead AI scientist Likun Yang say that the World models can be used in the future for complex forecasting and planning in both digital and physical domains For his part, Justin Johnson, co-founder of World Labs, says that world modeling could in the future Generate virtual 3D worlds for gaming, virtual photography, etc. The
For developers, with a powerful world model, there's no need to define how each object moves one by one - often a tedious, cumbersome, and time-wasting task.Alex Mashrabov, former head of AI at Snap and CEO of Higgsfield, told the press that with an advanced world model, the AI is able to develop a self-understanding of any scenario it finds itself in and start reasoning about possible solutions.
3 Walls to Cross for World Modeling
While the concept of a world model is tantalizing, there are still many technical challenges. In a talk at 2024, Li-Kun Yang admitted that it would take at least 10 years to realize his model of the world.
According to the analysis of the foreign media, the obstacles faced by the world model are also a microcosm of the current development of AI models. First. Training and running world models requires a lot of arithmetic power --Thousands of GPUs are needed just for Sora, which is considered an early model of the world.
In addition. The world model also produces hallucinations , and may internalize the bias into the training data. For example, a visual model trained based on video of a sunny day in a European city may have difficulty understanding or representing a snowy Korean city, or even generate incorrect content outright.
In order to address this issue. The training data for the world model must be broad enough to cover not only a variety of different scenarios, but also very specific in order for the AI to deeply understand the nuances of different scenarios However, AI development is also currently facing a data scarcity crisis. However, AI development is also currently facing a data scarcity crisis, with Epoch AI predicting that developers will run out of data to train generative AI models by 2026 to 2032.
Nonetheless, the world model is still very attractive, and Mashrabov says that if the hurdles are overcome, the world model could be a "much stronger" connection between AI and the real world-a breakthrough not only in generating virtual worlds, but also major advances in the areas of robotics and AI decision-making.
Related items
Skybox AI: Generating 360° panoramic images to easily create virtual worlds