Lumina-DiMOO - A Multimodal Large Model Open-Sourced by Shanghai AI Lab and Huawei Ascendant

堆友AI

What is Lumina-DiMOO?

Lumina-DiMOO is a new generation of unified model for multimodal generation and understanding launched by Shanghai AI Lab in conjunction with Huawei Rise at the World AI Conference 2025. Based on the Rise AI basic hardware and software platform and the MindSpeed MM multimodal large model suite, pre-training at 256, 512, and 1024 resolutions and supervised fine-tuning at 1024 resolutions were completed. As the world's first model with discrete diffusion unified architecture, it completely replaces the traditional diffusion and autoregressive framework, and the sampling speed has been increased by about 10 times compared with its predecessor.Lumina-DiMOO supports a variety of tasks, such as text-generated image/video, image editing, image translation, and image recovery, and the ability of cross-modal generation and comprehension has reached a new level. The full-flow training code of the model has been open-sourced, providing developers with a friendly and efficient multimodal model development experience.

Lumina-DiMOO - 上海AI Lab联合华为昇腾开源的多模态大模型

Lumina-DiMOO's Functional Features

  • Discrete Diffusion Unified Architecture: Adopting the world's first discrete diffusion unified architecture, replacing the traditional diffusion and autoregressive frames, the sampling speed is dramatically increased.
  • Highly efficient sampling speed: Sampling speed is increased by about 10 times compared to traditional models, which greatly improves the generation efficiency.
  • Multimodal task support: Supports a variety of tasks such as text to generate image/video, image editing, image translation, image restoration, etc., with powerful cross-modal generation and comprehension capabilities.
  • Full process training code open source: Provide full-process training codes to facilitate developers' research and development and promote the wide application of multimodal models.
  • Based on the Rise AI platformRelying on the Rise AI basic hardware and software platform and MindSpeed MM multimodal large model suite, it achieves efficient training and optimization.

Core Benefits of Lumina-DiMOO

  • Innovation Architecture: Adopting the world's first discrete diffusion unified architecture, replacing the traditional diffusion and autoregressive frameworks to achieve more efficient content generation.
  • High performance: Sampling speed is increased by about 10 times compared to traditional models, which significantly improves the generation efficiency and makes it suitable for large-scale applications.
  • multimodal capability: Supports a wide range of tasks, including text to image/video generation, image editing, image translation, and image restoration, with powerful cross-modal generation and understanding capabilities.
  • Open source friendly: Open source of full-process training code to facilitate developers' research and development and to promote the wide application of multimodal technology.
  • Platform Advantages: Based on the Rise AI base hardware and software platform with the MindSpeed MM multimodal large model suite, it ensures high performance and efficient training and optimization.

What is Lumina-DiMOO's official website?

  • Project website:: https://synbol.github.io/Lumina-DiMOO
  • Github repository:: https://github.com/Alpha-VLLM/Lumina-DiMOO
  • HuggingFace Model Library:: https://huggingface.co/Alpha-VLLM/Lumina-DiMOO

Who is Lumina-DiMOO for?

  • Artificial intelligence researchers: Be able to conduct cutting-edge research with open source code and innovative architectures to explore new applications and optimization methods for multimodal models.
  • content creatorThe company's software is designed for video producers, ad designers, game developers, and others who want to quickly generate creative content with powerful generative capabilities to improve their creative efficiency.
  • software developer: You can integrate Lumina-DiMOO into your own applications, providing users with the ability to generate multimodal content, expanding the functionality and appeal of your application.
  • Educators and students: It can be used for teaching and learning, helping students to understand the workings and applications of multimodal models and providing new tools for the creation of educational content.
  • business user: Especially companies that require a lot of content generation and creative design, such as advertising agencies, film and TV production companies, media organizations, etc., can use models to improve the quality and speed of content production.
© Copyright notes

Related articles

No comments

You must be logged in to leave a comment!
Login immediately
none
No comments...