General Introduction
ComfyUI-WanVideoWrapper is an open source plugin created by developer kijai, designed for the ComfyUI platform design. It is based on WanVideo's Wan2.1 model that provides powerful video generation and processing functions. Users can realize image to video (I2V), text to video (T2V) and video to video (V2V) conversion with it. This plugin is for AI enthusiasts, video creators, and users who need an efficient tool. The project is hosted on GitHub, and as of March 2025, has received over 1300 stars and has an active community. It is still marked as "Work in Progress" and features are being improved.
Function List
- Image to video (I2V): Convert still images to motion video with support for customized frame rate and resolution.
- Text to video (T2V): Generate videos based on text descriptions with adjustable generation parameters.
- Video to Video (V2V): Enhance or style shift existing videos to keep the action flowing.
- Wan2.1 Model Support: Using Wan2.1's Transformer and VAE models, and is also compatible with ComfyUI native coding modules.
- Long video generation: Support for generating videos with more than 1000 frames through window size and overlap settings.
- performance optimization: Support torch.compile to improve generation speed.
Using Help
Installation process
To use ComfyUI-WanVideoWrapper, you need to install ComfyUI and add the plugin first. Below are the detailed steps:
- Install ComfyUI
- Download the ComfyUI main program from GitHub (https://github.com/comfyanonymous/ComfyUI).
- Unzip locally, e.g.
C:\ComfyUI
The - exist
ComfyUI_windows_portable
file runningrun_nvidia_gpu.bat
Startup (Windows users).
- Install the WanVideoWrapper plugin
- Go to the ComfyUI root directory in the
custom_nodes
Folder. - Clone the plugin using the Git command:
git clone https://github.com/kijai/ComfyUI-WanVideoWrapper.git
- Go to the plugin directory:
cd ComfyUI-WanVideoWrapper
- Install the dependencies:
python_embedded\python.exe -m pip install -r requirements.txt
- If using the portable version, the
ComfyUI_windows_portable
folder to run:python_embedded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\requirements.txt
- If using the portable version, the
- Go to the ComfyUI root directory in the
- Download Wan2.1 Model
- Visit the Hugging Face model repository (https://huggingface.co/Kijai/WanVideo_comfy).
- Download the required documents:
- Text encoder into
ComfyUI/models/text_encoders
The - The Transformer model is placed into the
ComfyUI/models/diffusion_models
The - VAE model placement
ComfyUI/models/vae
The
- Text encoder into
- The original model can also be replaced with ComfyUI's own text encoder and CLIP Vision.
- Start ComfyUI
- Once the installation is complete, restart ComfyUI and the plugin node will automatically load into the interface.
Main function operation flow
1. Image to video (I2V)
- preliminary: Ensure that the Wan2.1 model and VAE are loaded.
- procedure:
- Adding the ComfyUI interface
WanVideoModelLoader
node, select the Wan2.1 I2V model. - increase
WanVideoVAELoader
node to load the VAE model. - expense or outlay
Load Image
node to upload an image. - increase
WanVideoSampler
node, set the number of frames (e.g. 81 frames), resolution (e.g. 512x512). - grout
VHS_VideoCombine
node, set the frame rate (e.g. 16fps) and output format (e.g. MP4). - Click "Generate" and the result is saved to
ComfyUI/output
Folder.
- Adding the ComfyUI interface
- take note of: Official tests show that 512x512x81 frames take up about 16GB of video memory, which can be reduced by lowering the resolution.
2. Text-to-video (T2V)
- preliminary:: Prepare text descriptions, e.g., "City streets at night".
- procedure:
- increase
LoadWanVideoT5TextEncoder
node (or with the ComfyUI native CLIP model). - increase
WanVideoTextEncode
node, enter the text. - grout
WanVideoModelLoader
cap (a poem)WanVideoSampler
node, set the number of frames (e.g. 256), resolution (e.g. 720p). - increase
WanVideoDecode
Node decoding. - expense or outlay
VHS_VideoCombine
The node outputs video. - Click "Generate", the generation time depends on the hardware.
- increase
- draw attention to sth.: In the official example, the 1.3B T2V model generates 1025 frames with 5GB of RAM in 10 minutes (RTX 5090).
3. Video to video (V2V)
- preliminary: Prepare a short video (MP4 format).
- procedure:
- expense or outlay
VHS_LoadVideo
The node loads the video. - increase
WanVideoEncode
Node encoded video. - grout
WanVideoSampler
node to adjust the enhancement parameters. - increase
WanVideoDecode
Node decoding. - expense or outlay
VHS_VideoCombine
The node outputs the results. - Click "Generate" to complete the enhancement.
- expense or outlay
- typical example: Official testing of V2V with the 14B T2V model gives better results.
4. Long-form video generation
- procedure:
- exist
WanVideoSampler
The node sets the number of frames (e.g. 1025 frames). - Set the window size (e.g. 81 frames) and the overlap value (e.g. 16) to ensure consistent movement.
- The other steps are the same as for T2V or I2V.
- exist
- hardware requirement: High video memory GPUs (e.g. 24GB) are recommended, and frame rates can be reduced on lower-end machines.
Featured Functions
- Wan2.1 core support: The plugin is based on the Wan2.1 model and provides efficient video generation capabilities.
- Compatible with ComfyUI Native Modules: ComfyUI's own text encoder and CLIP Vision can be used without additional models.
- Long video generation: Support for ultra-long videos with windowed and overlapping settings, and stable performance at 1025 fps in official tests.
- performance optimization: Support for torch.compile to significantly improve generation speed.
common problems
- Node not shown: Check that the dependency installation is complete, or restart ComfyUI.
- insufficient video memory: Reduce the resolution or frame rate, officially recommended to be adjusted according to the hardware.
- Model Path Error: Ensure that the model is placed in the correct folder, refer to the official instructions.