General Introduction
SkyReels-V2 is an open source video generation model developed by SkyworkAI. It supports the generation of videos of unlimited length through advanced Diffusion Forcing techniques for both text-to-video (T2V) and image-to-video (I2V) tasks. Users can generate high-quality, movie-quality video content using text descriptions or input images. The model has a strong track record in the open source community, with performance comparable to commercial models such as Kling and Runway-Gen4. It provides flexible inference patterns suitable for developers, creators, and researchers, and the code and model weights of SkyReels-V2 are publicly available on GitHub for easy download and deployment.
Function List
- Unlimited length video generation: Support for generating videos of any length, suitable for short films to full movies.
- Text to video (T2V): Generate video content that matches the description via text prompts.
- Image to video (I2V): Generate dynamic video based on the input image, maintaining the image characteristics.
- multimodal support: Combining large-scale language modeling (MLLM) and reinforcement learning to improve video generation quality.
- Story Generation: Automatically generate video storyboards that fit the narrative logic.
- camera control: Provides a director's point of view and supports customized camera angles and movements.
- Multi-subject coherence: Ensure visual consistency of multi-character videos with the SkyReels-A2 system.
- Efficient Reasoning Framework: Supports multi-GPU reasoning to optimize generation speed and resource usage.
Using Help
Installation process
SkyReels-V2 is a Python based open source project , you need to configure the environment locally or on the server . Here are the detailed installation steps:
- clone warehouse
Open a terminal and run the following command to get the SkyReels-V2 code:git clone https://github.com/SkyworkAI/SkyReels-V2 cd SkyReels-V2
- Creating a Virtual Environment
It is recommended that you create a virtual environment using Python 3.10.12 to avoid dependency conflicts:conda create -n skyreels-v2 python=3.10 conda activate skyreels-v2
- Installation of dependencies
Install the Python libraries needed for the project and run it:pip install -r requirements.txt
- Download model weights
The model weights for SkyReels-V2 are hosted at Hugging Face. download them using the following command:pip install -U "huggingface_hub[cli]" huggingface-cli download Skywork/SkyReels-V2 --local-dir ./models
Make sure you have enough disk space (model sizes can be tens of gigabytes).
- hardware requirement
- minimum configuration: Single block RTX 4090 (24GB VRAM) with FP8 support to quantitatively reduce memory requirements.
- Recommended Configurations: Multiple GPUs (e.g., 4-8 A100s) to support efficient parallel inference.
- At least 32GB of system memory and 100GB of disk space.
Usage
SkyReels-V2 provides two main functions: Text to Video (T2V) and Image to Video (I2V). The following is the specific operation procedure:
Text to video (T2V)
- Preparing Cues
Write text prompts describing the content of the video, for example:A serene lake surrounded by towering mountains, with swans gliding across the water.
Negative cues can be added to avoid unwanted elements:
low quality, deformation, bad composition
- Run the generated script
modificationsgenerate_video.py
parameters, set the resolution, frame rate, etc:python generate_video.py --model_id "Skywork/SkyReels-V2-T2V-14B-540P" --prompt "A serene lake surrounded by mountains" --num_frames 97 --fps 24 --outdir ./output
--model_id
: Select the model (e.g. 540P or 720P).--num_frames
: Set the video frame rate (default 97).--fps
: Frame rate (default 24).--outdir
: Output video save path.
- View Output
The generated video will be saved in MP4 format, e.g.output/serene_lake_42_0.mp4
The
Image to video (I2V)
- Preparing the input image
Provide a high-quality image (e.g. PNG or JPG), making sure the resolution matches the model (default 960x544). - Run the generated script
existgenerate_video.py
Specify the image path in thepython generate_video.py --model_id "Skywork/SkyReels-V2-I2V-14B-540P" --prompt "A warrior fighting in a forest" --image ./input_image.jpg --num_frames 97 --fps 24 --outdir ./output
--image
: Enter the image path.- Other parameters are similar to those of the T2V.
- Optimized settings
- utilization
--guidance_scale
(Default 6.0) Adjusts the intensity of text steering. - utilization
--inference_steps
(default 30) Controls the quality of the generation, the more steps the higher the quality but the longer it takes. - start using
--offload
Optimized memory usage for low graphics memory devices.
- utilization
Featured Function Operation
- Unlimited length video
SkyReels-V2 uses Diffusion Forcing technology to support the generation of very long videos. Run long video inference scripts:python inference_long_video.py --model_id "Skywork/SkyReels-V2-T2V-14B-720P" --prompt "A sci-fi movie scene" --num_frames 1000
- It is recommended to generate them in segments of 97-192 frames each, and then stitch them together with post-processing tools.
- Story Generation
Use the Story Generation feature of the SkyReels-A2 system to enter a plot description:A hero’s journey through a futuristic city, facing challenges.
Running:
python story_generate.py --prompt "A hero’s journey" --output story_video.mp4
The system will generate videos containing storyboards, automatically arranging scenes and shots.
- camera control
pass (a bill or inspection etc)--camera_angle
parameter sets the lens view (e.g. "frontal" or "profile"):python generate_video.py --prompt "A car chase" --camera_angle "profile" --outdir ./output
- Multi-subject coherence
SkyReels-A2 supports multi-character scenes. Provides multiple reference images to run:python multi_subject.py --prompt "Two characters talking" --images "char1.jpg,char2.jpg" --outdir ./output
Make sure the characters are visually consistent in the video.
Optimization and Debugging
- lack of memory: Enable
--quant
Quantification using FP8, or--offload
Offloads some calculations to the CPU. - Generating quality: Increase
--inference_steps
(e.g., 50) or adjust--guidance_scale
(e.g. 8.0). - Community Support: Check GitHub Issues for problems or join the SkyReels Community Discussion.
application scenario
- Short video creation
Creators can use the T2V feature to quickly generate short video clips from text, suitable for social media content production. - Movie pre-production
Directors can utilize the unlimited length video and story generation features to create movie trailers or concepts and reduce upfront costs. - Virtual E-Commerce Showcase
Use the I2V function to turn product pictures into dynamic videos to show how the product is used in a virtual scene. - Educational animation
Teachers can generate instructional animations from text descriptions to visualize complex concepts, such as the process of a science experiment. - game development
Developers can generate game scenes or character animations to be used as material for prototyping or transitions.
QA
- What resolutions does SkyReels-V2 support?
Currently supports 540P (960x544) and 720P (1280x720), with the possibility of expanding to higher resolutions in the future. - How much video memory do I need to run it?
A single RTX 4090 (24GB) can run basic reasoning, and multi-GPU configurations can accelerate raw and grown video. - How to improve the quality of generated videos?
Increase the number of reasoning steps (--inference_steps
), optimize prompt words, or use high-quality input images. - Does it support real-time generation?
Currently offline generation, real-time generation requires higher hardware support and may be optimized in the future. - Are model weights free?
Yes, SkyReels-V2 is completely open source and the weights can be downloaded for free from Hugging Face.