Yume1.5 - An Interactive World Generation Model Open-Sourced by Shanghai AI Lab and Fudan University

Latest AI Resources2mos agorelease AI Sharing Circle

What is Yume 1.5

Yume1.5 is an open source interactive world generation model, jointly developed by Shanghai Artificial Intelligence Laboratory, Fudan University, and Shanghai Innovation Research Institute, capable of real-time interactive rendering (12 FPS on a single card). It employs joint spatio-temporal channel modeling (TSCM) technology to maintain a stable sampling rate even when the context length increases, and integrates Self-Forcing to accelerate inference and reduce error accumulation. The model excels at world generation and editing tasks, and related papers and open source code are available via GitHub.

Features of Yume 1.5

Efficient real-time generation: Yume-1.5 enables inference at 12 frames per second (540p resolution), reducing benchmark generation time to 8 seconds, significantly improving real-time performance and rapidly generating realistic virtual worlds.
Text Control Function: The model supports the generation of dynamic events through natural language prompts, and the user can control the generation and change of the virtual world through textual commands to achieve semantic guidance of the generated content.
Keyboard Interaction Experience: Supports keyboard-based generative world exploration, simulates first-person navigation, and allows users to control character and camera movement via the keyboard, enhancing the user interaction experience.
Technological Innovation Breakthroughs: The problem of rapid growth of history context in long video generation is solved by using joint time-space-channel modeling (TSCM) technique, which efficiently compresses history frame context in multiple dimensions by hierarchical compression technique.
Bidirectional Attention Distillation: Combined with an enhanced text embedding scheme, it accelerates the sampling process, reduces the error accumulation in autoregressive generation, and significantly improves the inference efficiency.
Mixed dataset training: Semantic guidance of the generated content is achieved through a hybrid dataset training strategy and architectural decomposition of event and action descriptions, which improves the quality and diversity of model generation.
Wide range of applications: Yume-1.5 has a wide range of applications in the fields of immersive simulation, virtual embodiment and interactive entertainment, and is able to provide users with a more realistic and richer virtual world experience.

Core Benefits of Yume 1.5

Efficient inference performance: Yume-1.5 achieves an inference speed of 12 frames per second (540p resolution), which reduces the benchmark generation time to 8 seconds, significantly improving the efficiency of real-time generation.
Text-driven interactivity: Users can control the generation and change of the virtual world through natural language commands to realize dynamic event generation, which enhances the interaction between users and the virtual world.
First person navigation experience: Supporting keyboard-based interactions that simulate first-person navigation, users can freely control character and camera movement, providing an immersive exploration experience.
Innovative modeling techniques: Joint time-space-channel modeling (TSCM) is used to effectively solve the problem of fast-growing historical context in long video generation, and improve the quality and efficiency of generation.
Bidirectional attention mechanism: The inference speed is further improved by accelerating the sampling process and reducing error accumulation through bidirectional attentional distillation and enhanced text embedding schemes.
Mixed dataset training strategy: Combining multiple datasets for training improves the model's ability to adapt to different scenes and events, and enhances the diversity and realism of the generated content.

What is the official website of Yume 1.5?

Project website:: https://stdstu12.github.io/YUME-Project/
GitHub repository:: https://github.com/stdstu12/YUME
HuggingFace Model Library:: https://huggingface.co/stdstu123/Yume-5B-720P
arXiv Technical Paper:: https://arxiv.org/pdf/2512.22096

People for whom Yume 1.5 is suitable

game developerYume-1.5 can be used to quickly generate virtual game worlds, reduce development time and costs, and provide players with a more immersive gaming experience.
Virtual Reality (VR) and Augmented Reality (AR) Developers: The model can be used to create realistic virtual environments and enhance the realism and interactivity of VR/AR applications.
moviemaker: It can be used to generate virtual scenes and special effects to assist in the production of movies, TV dramas and other film and television productions, saving the cost and time of building real sets.
educator: Virtual teaching environments can be created for historical reenactments, scientific simulations, and other educational scenarios to enhance students' interest in learning and understanding.
Architectural designers and planners: It can quickly generate virtual scenes of architectural models and urban planning for program presentation and client communication to enhance design efficiency.
Entertainment industry practitionersYume-1.5 can be used by designers of theme parks, escape rooms, and other venues to generate unique virtual scenes that enrich the entertainment experience.