DiffSynth-Engine: an open source engine for low existing deployment of FLUX, Wan2.1

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

DiffSynth-Engine is an open source project launched by ModelScope and hosted on GitHub, which is based on diffusion modeling technology and focuses on efficiently generating images and videos for developers to deploy AI models in production environments. The project evolved from DiffSynth-Studio , after a comprehensive engineering transformation , optimized computational efficiency and ease of deployment . It supports multiple models (e.g., FLUX, Wan2.1) and provides a clear code structure and flexible memory management. As of March 2025, the project is continuously updated and has received extensive attention from the open source community, aiming to promote the practicality of AI authoring.

DiffSynth-Engine: An Open Source Engine for Low-Existing Deployments of FLUX, Wan2.1-1

Function List

Supports efficient generation of images and videos, covering a wide range of needs from static images to dynamic content.
Provides clear and readable code without relying on third-party libraries, making it easy for developers to modify and extend.
Compatible with a variety of base models (e.g. FLUX, Wan2.1) and LoRA models to adapt to different scenarios.
Built-in flexible memory management, support FP8, INT8 and other quantization modes, low graphics memory devices can also run.
Optimize inference speed and support tensor parallel computing to accelerate large-scale generation tasks.
Provides cross-platform support and is compatible with Windows, macOS (including Apple Silicon) and Linux.
Supports text-to-image, text-to-video, and video stylization, among many other features.

Using Help

Installation process

Installing DiffSynth-Engine is easy and can be done in a few steps.

Installation of core packages
Install via pip3 by entering the following command in the terminal:

pip3 install diffsynth-engine

Make sure Python version is 3.8 or higher. It is recommended to use a virtual environment to avoid dependency conflicts.

Download model files
The project does not include model files, which need to be downloaded manually or via code. For example, getting the FLUX Model:

from diffsynth_engine import fetch_model
model_path = fetch_model("muse/flux-with-vae", revision="20240902173035", path="flux1-dev-with-vae.safetensors")

After downloading, the model files are usually placed locally in a specified directory for scripts to call.

Verify Installation
After the installation is complete, run a simple test script to confirm that the environment works:
```
from diffsynth_engine import __version__
print(__version__)
```
Outputting the version number means the installation was successful.

Main Functions

1. Generating images (FLUX as an example)

procedure
Use the following code to generate an image:

from diffsynth_engine.pipelines import FluxImagePipeline, FluxModelConfig
from diffsynth_engine import fetch_model
model_path = fetch_model("muse/flux-with-vae", revision="20240902173035", path="flux1-dev-with-vae.safetensors")
config = FluxModelConfig(dit_path=model_path)
pipe = FluxImagePipeline.from_pretrained(config, offload_mode="cpu_offload").eval()
image = pipe(
prompt="月球上的宇航员骑马，黑白摄影风格，颗粒感强，对比度高",
width=1024,
height=1024,
num_inference_steps=30,
seed=42,
)
image.save("flux_txt2img.png")

Featured Description
By default, 23GB of video memory is required. If you don't have enough memory, you can adjust the offload_mode="sequential_cpu_offload"The quantization process can be run with as little as 4GB of video memory, but the generation time is extended (e.g., 91 seconds). Supports multiple quantization accuracies (e.g., q8_0, q6_k), and memory requirements can be reduced to 7-12GB.

2. Video generation (Wan2.1 as an example)

procedure
Use the following code to generate a video:

from diffsynth_engine.pipelines import WanVideoPipeline, WanModelConfig
from diffsynth_engine.utils.download import fetch_model
from diffsynth_engine.utils.video import save_video
config = WanModelConfig(
model_path=fetch_model("muse/wan2.1-14b-bf16", path="dit.safetensors"),
t5_path=fetch_model("muse/wan2.1-umt5", path="umt5.safetensors"),
vae_path=fetch_model("muse/wan2.1-vae", path="vae.safetensors"),
)
pipe = WanVideoPipeline.from_pretrained(config)
video = pipe(
prompt="小狗在草地上奔跑，阳光照耀，背景有野花和蓝天",
num_frames=41,
width=848,
height=480,
seed=42,
)
save_video(video, "wan_t2v.mp4", fps=15)

Featured Description
It takes 358 seconds to generate 2 seconds of video on a single card. If you use 4 A100 GPUs and enable tensor parallelism (parallelism=4, use_cfg_parallel=TrueThe time is reduced to 114 seconds, with an acceleration ratio of 3.14 times.

3. Low video memory optimization

procedure
In the FLUX example, the offload_mode change into sequential_cpu_offload::

pipe = FluxImagePipeline.from_pretrained(config, offload_mode="sequential_cpu_offload").eval()

Featured Description
Graphics memory requirements have been reduced from 23GB to 3.52GB for average devices. Quantization modes (such as q4_k_s) further balance speed and quality, generating slightly reduced but still useful results.

4. Multi-card parallel reasoning

procedure
In the Wan2.1 example, add the parallelism parameter:

pipe = WanVideoPipeline.from_pretrained(config, parallelism=4, use_cfg_parallel=True)

Featured Description
Supports multi-GPU parallel computing. 2 GPUs accelerate 1.97x, 4 GPUs accelerate 3.14x for industrial grade deployments.

caveat

hardware requirement: 8GB of video memory is recommended for image generation, 24GB for video generation or multi-card configurations.
model path: To ensure that fetch_model The downloaded path is correct, otherwise you need to specify it manually.
documentation reference: See the official GitHub page for more usage. <https://github.com/modelscope/DiffSynth-Engine>The

application scenario

personal creation
Users can generate artistic images with FLUX or create short videos with Wan2.1, suitable for social media sharing.
Industrial deployment
Enterprises can utilize multi-card parallel reasoning to quickly generate high-quality video content for advertising or film production.
Technical studies
Developers can modify code, test different models and quantitative strategies, and drive optimization of AI technology.
Education and training
Students can learn the practical application of diffusion modeling and explore the principles of AI generation through simple installation.

QA

What if the installation fails?
Check the Python version and network connection. Make sure pip is up to date (pip install --upgrade pip), or download the dependencies manually.
What models are supported?
Supports base models such as FLUX, Wan2.1, and LoRA-compatible fine-tuned models covering image and video generation.
Will a low end computer work?
Can. Adjustment offload_mode and quantization parameters, 4GB of RAM is sufficient to run FLUX.
How do I set up multi-card parallelism?
Added during pipeline initialization parallelism parameter, make sure the number of GPUs matches.

DiffSynth-Engine: Open Source Engine for Low-Existing Deployments of FLUX, Wan 2.1

General Introduction

Function List

Using Help

Installation process

Main Functions

1. Generating images (FLUX as an example)

2. Video generation (Wan2.1 as an example)

3. Low video memory optimization

4. Multi-card parallel reasoning

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification