AI Personal Learning
and practical guidance
讯飞绘镜

DiffSynth-Engine: Open Source Engine for Low-Existing Deployments of FLUX, Wan 2.1

General Introduction

DiffSynth-Engine is an open source project launched by ModelScope and hosted on GitHub, which is based on diffusion modeling technology and focuses on efficiently generating images and videos for developers to deploy AI models in production environments. The project evolved from DiffSynth-Studio , after a comprehensive engineering transformation , optimized computational efficiency and ease of deployment . It supports multiple models (e.g., FLUX, Wan2.1) and provides a clear code structure and flexible memory management. As of March 2025, the project is continuously updated and has received extensive attention from the open source community, aiming to promote the practicality of AI authoring.

DiffSynth-Engine:低现存部署FLUX、Wan2.1的开源引擎-1


 

Function List

  • Supports efficient generation of images and videos, covering a wide range of needs from static images to dynamic content.
  • Provides clear and readable code without relying on third-party libraries, making it easy for developers to modify and extend.
  • Compatible with a variety of base models (e.g. FLUX, Wan2.1) and LoRA models to adapt to different scenarios.
  • Built-in flexible memory management, support FP8, INT8 and other quantization modes, low graphics memory devices can also run.
  • Optimize inference speed and support tensor parallel computing to accelerate large-scale generation tasks.
  • Provides cross-platform support and is compatible with Windows, macOS (including Apple Silicon) and Linux.
  • Supports text-to-image, text-to-video, and video stylization, among many other features.

 

Using Help

Installation process

Installing DiffSynth-Engine is easy and can be done in a few steps.

  1. Installation of core packages
    Install via pip3 by entering the following command in the terminal:
pip3 install diffsynth-engine

Make sure Python version is 3.8 or higher. It is recommended to use a virtual environment to avoid dependency conflicts.

  1. Download model files
    The project does not include model files, which need to be downloaded manually or via code. For example, getting the FLUX Model:
from diffsynth_engine import fetch_model
model_path = fetch_model("muse/flux-with-vae", revision="20240902173035", path="flux1-dev-with-vae.safetensors")

After downloading, the model files are usually placed locally in a specified directory for scripts to call.

  1. Verify Installation
    After the installation is complete, run a simple test script to confirm that the environment works:

    from diffsynth_engine import __version__
    print(__version__)
    

    Outputting the version number means the installation was successful.

Main Functions

1. Generating images (FLUX as an example)

  • procedure
    Use the following code to generate an image:

    from diffsynth_engine.pipelines import FluxImagePipeline, FluxModelConfig
    from diffsynth_engine import fetch_model
    model_path = fetch_model("muse/flux-with-vae", revision="20240902173035", path="flux1-dev-with-vae.safetensors")
    config = FluxModelConfig(dit_path=model_path)
    pipe = FluxImagePipeline.from_pretrained(config, offload_mode="cpu_offload").eval()
    image = pipe(
    prompt="月球上的宇航员骑马,黑白摄影风格,颗粒感强,对比度高",
    width=1024,
    height=1024,
    num_inference_steps=30,
    seed=42,
    )
    image.save("flux_txt2img.png")
    
  • Featured Description
    By default, 23GB of video memory is required. If you don't have enough memory, you can adjust the offload_mode="sequential_cpu_offload"The quantization process can be run with as little as 4GB of video memory, but the generation time is extended (e.g., 91 seconds). Supports multiple quantization accuracies (e.g., q8_0, q6_k), and memory requirements can be reduced to 7-12GB.

2. Video generation (Wan2.1 as an example)

  • procedure
    Use the following code to generate a video:

    from diffsynth_engine.pipelines import WanVideoPipeline, WanModelConfig
    from diffsynth_engine.utils.download import fetch_model
    from diffsynth_engine.utils.video import save_video
    config = WanModelConfig(
    model_path=fetch_model("muse/wan2.1-14b-bf16", path="dit.safetensors"),
    t5_path=fetch_model("muse/wan2.1-umt5", path="umt5.safetensors"),
    vae_path=fetch_model("muse/wan2.1-vae", path="vae.safetensors"),
    )
    pipe = WanVideoPipeline.from_pretrained(config)
    video = pipe(
    prompt="小狗在草地上奔跑,阳光照耀,背景有野花和蓝天",
    num_frames=41,
    width=848,
    height=480,
    seed=42,
    )
    save_video(video, "wan_t2v.mp4", fps=15)
    
  • Featured Description
    It takes 358 seconds to generate 2 seconds of video on a single card. If you use 4 A100 GPUs and enable tensor parallelism (parallelism=4, use_cfg_parallel=TrueThe time is reduced to 114 seconds, with an acceleration ratio of 3.14 times.

3. Low video memory optimization

  • procedure
    In the FLUX example, the offload_mode change into sequential_cpu_offload::

    pipe = FluxImagePipeline.from_pretrained(config, offload_mode="sequential_cpu_offload").eval()
    
  • Featured Description
    Graphics memory requirements have been reduced from 23GB to 3.52GB for average devices. Quantization modes (such as q4_k_s) further balance speed and quality, generating slightly reduced but still useful results.

4. Multi-card parallel reasoning

  • procedure
    In the Wan2.1 example, add the parallelism parameter:

    pipe = WanVideoPipeline.from_pretrained(config, parallelism=4, use_cfg_parallel=True)
    
  • Featured Description
    Supports multi-GPU parallel computing. 2 GPUs accelerate 1.97x, 4 GPUs accelerate 3.14x for industrial grade deployments.

caveat

  • hardware requirement: 8GB of video memory is recommended for image generation, 24GB for video generation or multi-card configurations.
  • model path: To ensure that fetch_model The downloaded path is correct, otherwise you need to specify it manually.
  • documentation reference: See the official GitHub page for more usage. <https://github.com/modelscope/DiffSynth-Engine>The

 

application scenario

  1. personal creation
    Users can generate artistic images with FLUX or create short videos with Wan2.1, suitable for social media sharing.
  2. Industrial deployment
    Enterprises can utilize multi-card parallel reasoning to quickly generate high-quality video content for advertising or film production.
  3. Technical studies
    Developers can modify code, test different models and quantitative strategies, and drive optimization of AI technology.
  4. Education and training
    Students can learn the practical application of diffusion modeling and explore the principles of AI generation through simple installation.

 

QA

  1. What if the installation fails?
    Check the Python version and network connection. Make sure pip is up to date (pip install --upgrade pip), or download the dependencies manually.
  2. What models are supported?
    Supports base models such as FLUX, Wan2.1, and LoRA-compatible fine-tuned models covering image and video generation.
  3. Will a low end computer work?
    Can. Adjustment offload_mode and quantization parameters, 4GB of RAM is sufficient to run FLUX.
  4. How do I set up multi-card parallelism?
    Added during pipeline initialization parallelism parameter, make sure the number of GPUs matches.
May not be reproduced without permission:Chief AI Sharing Circle " DiffSynth-Engine: Open Source Engine for Low-Existing Deployments of FLUX, Wan 2.1
en_USEnglish