Plugin for ComfyUI to provide video generation capability based on Wan 2.1

Latest AI Resources5mos agorelease AI Sharing Circle

1.6K 00

General Introduction

ComfyUI-WanVideoWrapper is an open source plugin created by developer kijai, designed for the ComfyUI platform design. It is based on WanVideo's Wan2.1 model that provides powerful video generation and processing functions. Users can realize image to video (I2V), text to video (T2V) and video to video (V2V) conversion with it. This plugin is for AI enthusiasts, video creators, and users who need an efficient tool. The project is hosted on GitHub, and as of March 2025, has received over 1300 stars and has an active community. It is still marked as "Work in Progress" and features are being improved.

Function List

Image to video (I2V): Convert still images to motion video with support for customized frame rate and resolution.
Text to video (T2V): Generate videos based on text descriptions with adjustable generation parameters.
Video to Video (V2V): Enhance or style shift existing videos to keep the action flowing.
Wan2.1 Model Support: Using Wan2.1's Transformer and VAE models, and is also compatible with ComfyUI native coding modules.
Long video generation: Support for generating videos with more than 1000 frames through window size and overlap settings.
performance optimization: Support torch.compile to improve generation speed.

Using Help

Installation process

To use ComfyUI-WanVideoWrapper, you need to install ComfyUI and add the plugin first. Below are the detailed steps:

Install ComfyUI
- Download the ComfyUI main program from GitHub (https://github.com/comfyanonymous/ComfyUI).
- Unzip locally, e.g. C:\ComfyUIThe
- exist ComfyUI_windows_portable file running run_nvidia_gpu.bat Startup (Windows users).

Install the WanVideoWrapper plugin

Go to the ComfyUI root directory in the custom_nodes Folder.

Clone the plugin using the Git command:

git clone https://github.com/kijai/ComfyUI-WanVideoWrapper.git

Go to the plugin directory:
```
cd ComfyUI-WanVideoWrapper
```

Install the dependencies:

python_embeded\python.exe -m pip install -r requirements.txt

If using the portable version, the ComfyUI_windows_portable folder to run:

python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\requirements.txt

Download Wan2.1 Model
- Visit the Hugging Face model repository (https://huggingface.co/Kijai/WanVideo_comfy).
- Download the required documents:
  - Text encoder into ComfyUI/models/text_encodersThe
  - The Transformer model is placed into the ComfyUI/models/diffusion_modelsThe
  - VAE model placement ComfyUI/models/vaeThe
- The original model can also be replaced with ComfyUI's own text encoder and CLIP Vision.
Start ComfyUI
- Once the installation is complete, restart ComfyUI and the plugin node will automatically load into the interface.

Main function operation flow

1. Image to video (I2V)

preliminary: Ensure that the Wan2.1 model and VAE are loaded.
procedure:
1. Adding the ComfyUI interface WanVideoModelLoader node, select the Wan2.1 I2V model.
2. increase WanVideoVAELoader node to load the VAE model.
3. expense or outlay Load Image node to upload an image.
4. increase WanVideoSampler node, set the number of frames (e.g. 81 frames), resolution (e.g. 512x512).
5. grout VHS_VideoCombine node, set the frame rate (e.g. 16fps) and output format (e.g. MP4).
6. Click "Generate" and the result is saved to ComfyUI/output Folder.
take note of: Official tests show that 512x512x81 frames take up about 16GB of video memory, which can be reduced by lowering the resolution.

2. Text-to-video (T2V)

preliminary:: Prepare text descriptions, e.g., "City streets at night".
procedure:
1. increase LoadWanVideoT5TextEncoder node (or with the ComfyUI native CLIP model).
2. increase WanVideoTextEncode node, enter the text.
3. grout WanVideoModelLoader cap (a poem) WanVideoSampler node, set the number of frames (e.g. 256), resolution (e.g. 720p).
4. increase WanVideoDecode Node decoding.
5. expense or outlay VHS_VideoCombine The node outputs video.
6. Click "Generate", the generation time depends on the hardware.
draw attention to sth.: In the official example, the 1.3B T2V model generates 1025 frames with 5GB of RAM in 10 minutes (RTX 5090).

3. Video to video (V2V)

preliminary: Prepare a short video (MP4 format).
procedure:
1. expense or outlay VHS_LoadVideo The node loads the video.
2. increase WanVideoEncode Node encoded video.
3. grout WanVideoSampler node to adjust the enhancement parameters.
4. increase WanVideoDecode Node decoding.
5. expense or outlay VHS_VideoCombine The node outputs the results.
6. Click "Generate" to complete the enhancement.
typical example: Official testing of V2V with the 14B T2V model gives better results.

4. Long-form video generation

procedure:
1. exist WanVideoSampler The node sets the number of frames (e.g. 1025 frames).
2. Set the window size (e.g. 81 frames) and the overlap value (e.g. 16) to ensure consistent movement.
3. The other steps are the same as for T2V or I2V.
hardware requirement: High video memory GPUs (e.g. 24GB) are recommended, and frame rates can be reduced on lower-end machines.

Featured Functions

Wan2.1 core support: The plugin is based on the Wan2.1 model and provides efficient video generation capabilities.
Compatible with ComfyUI Native Modules: ComfyUI's own text encoder and CLIP Vision can be used without additional models.
Long video generation: Support for ultra-long videos with windowed and overlapping settings, and stable performance at 1025 fps in official tests.
performance optimization: Support for torch.compile to significantly improve generation speed.

common problems

Node not shown: Check that the dependency installation is complete, or restart ComfyUI.
insufficient video memory: Reduce the resolution or frame rate, officially recommended to be adjusted according to the hardware.
Model Path Error: Ensure that the model is placed in the correct folder, refer to the official instructions.