DreamOmni2 - HKUST open source multimodal AI image editing and generation models

Latest AI Resources6mos agorelease AI Sharing Circle

36.3K 00

What is DreamOmni2

DreamOmni2 is a multimodal AI image editing and generation model open-sourced by Jiajia's team at HKUST. It can simultaneously process text and image commands and support multiple reference images, providing creators with more flexible creation methods. The model is trained using a three-stage data synthesis process, which jointly trains the generation/editing model and the visual language model to effectively maintain the identity of the image subject.DreamOmni2 performs well in multimodal command editing and generation tasks, outperforming current open source models, and comparing to or surpassing commercial models in some aspects. It can be used in a variety of scenarios including product photography, design workflow, portrait editing, and creative painting.

Features of DreamOmni2

multimodal instruction processing: Supports text and image commands to work with both concrete objects and abstract concepts such as materials, textures, styles, etc., providing creators with richer ways to express themselves.
Multi-Reference Chart Capability: The ability to combine multiple reference images for editing and generation provides creators with greater flexibility to meet complex and diverse creative needs.
Data Synthesis and Training: A three-stage data synthesis process is used, including feature mixing methods, editing and extraction models to generate training data, and index coding and position coding offset schemes are also designed to avoid pixel confusion in multiple image inputs and to improve the training effect and generation quality of the model.
joint training: Co-training a generative/editing model with a visual language model (VLM) to better handle complex commands allows the model to more accurately understand and execute the user's multimodal commands.
Identity Consistency Maintenance: In the editing process, the identity characteristics of the image subject can be effectively maintained to ensure the consistency between the edited image and the original subject, and to avoid the loss or confusion of the subject's characteristics caused by editing.
Performance Advantages: In multimodal command editing and generation tasks, DreamOmni2 significantly outperforms the current SOTA open-source model, and even matches or exceeds the commercial model in some aspects, providing users with higher quality image editing and generation results.
Open Source and Ease of Use: The code, model weights, and training datasets are freely available on GitHub and Hugging Face, and support local running, which facilitates users to perform local inference on CUDA-compatible GPUs with sufficient video memory, lowering the threshold of use and improving the accessibility of the models.

Core Benefits of DreamOmni2

Multimodal instruction understanding: Ability to process both text and image commands, precisely understand and perform complex editing tasks such as modification of materials, textures, styles, and other abstract concepts.
Multi-Reference Chart Support: Can be combined with multiple reference images for editing and generation, providing creators with greater flexibility to meet diverse creative needs.
Identity Consistency Maintenance: In the editing process, the identity characteristics of the image subject are effectively maintained to ensure that the edited image is highly consistent with the original subject and to avoid loss or confusion of the subject's characteristics.
Joint training mechanism: Joint training of generative/editing models with visual language models to improve the understanding and execution of complex commands and generate images that better match user intent.
superior performance: Significantly outperforms current open-source models in multimodal command editing and generation tasks, and even outperforms commercial models in some aspects, providing high-quality image editing and generation results.

What is DreamOmni2's official website

Project website:: https://pbihao.github.io/projects/DreamOmni2/index.html
Github repository:: https://github.com/dvlab-research/DreamOmni2
arXiv Technical Paper:: https://arxiv.org/pdf/2510.06679
Experience Address:: https://huggingface.co/spaces/wcy1122/DreamOmni2-Gen

Who is DreamOmni2 for?

Creative Designer: Can quickly realize design ideas, generate multiple styles of design drafts, and improve work efficiency.
cinematographer: Used for post-processing of product photography to enhance the visual effect of products and meet the needs of different customers.
artists: Create quick drawings and paintings, exploring different styles and ideas to inspire art.
advertising agency: Generate advertisement materials quickly to meet the requirements of different advertisement themes and styles.
Individual creators: Easily implement creative ideas and produce personalized image content to meet individual creative needs.