General Introduction
Open-Sora is an open source project designed to allow anyone to efficiently generate high quality videos. It is developed by the hpcaitech team to provide tools to generate video from text or images, supporting a variety of resolutions and durations. The project is completely open source, with model weights, code and training processes made public to encourage community contributions. The latest version, Open-Sora 2.0, is close to the industry's top models in terms of performance, costs only $200,000 to train, and generates fast and high-quality video. Users can download the code for free, run it locally, or experience it through Hugging Face's Gradio interface. Open-Sora is for creators, developers and researchers, driving popularity and innovation in video creation while providing a commercial product:Video Ocean The
Function List
- Text-to-video generation: Enter a text description to generate a video that matches the content.
- Image to Video Generation: Based on a single image, generate dynamic video.
- Video Extension: extends the length of the video or adds content.
- Multi-resolution support: Supports video output from 144p to 768p.
- Flexible Duration: Generate videos from 2 seconds to 16 seconds.
- Variety of aspect ratios: supports 16:9, 9:16, 1:1 and more.
- Open source models and training: provide model weights and training code, support custom development.
- Efficient Reasoning: Optimized algorithms reduce hardware requirements and generate video on a single GPU.
- Cue word optimization: Support GPT-4o to enhance the cue words and improve the generation quality.
- Motion Scoring: Adjust video dynamics through motion scoring.
Using Help
Installation process
To use Open-Sora, users need to configure their Python environment and install dependencies. Below are the detailed steps:
- Creating a Virtual Environment
Create a virtual environment with Python 3.10 to avoid dependency conflicts:conda create -n opensora python=3.10 conda activate opensora
- Cloning Codebase
Download the Open-Sora project from GitHub:git clone https://github.com/hpcaitech/Open-Sora cd Open-Sora
- Installation of dependencies
Make sure PyTorch version is higher than 2.4.0 by running the following command:pip install -v .
If a development mode is required, use it:
pip install -v -e .
- Installation of acceleration libraries
Open-Sora Usagexformers
cap (a poem)flash-attn
Boost performance. Installed according to the CUDA version:pip install xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu121 pip install flash-attn --no-build-isolation
For faster reasoning, manually compiled
flash-attention
::git clone https://github.com/Dao-AILab/flash-attention cd flash-attention/hopper python setup.py install
- Download model weights
Open-Sora 2.0 supports 256p and 768p video generation and models can be downloaded from Hugging Face or ModelScope:pip install "huggingface_hub[cli]" huggingface-cli download hpcai-tech/Open-Sora-v2 --local-dir ./ckpts
or use ModelScope:
pip install modelscope modelscope download hpcai-tech/Open-Sora-v2 --local_dir ./ckpts
- Verify Installation
Check that the environment is normal:python -c "import opensora; print(opensora.__version__)"
Usage
Open-Sora supports text-to-video, image-to-video and many other functions with easy operation. The following is a detailed guide:
Text to Video Generation
Open-Sora optimizes the image-to-video process, but also supports direct text-to-video generation. A text-to-image-to-video pipeline is recommended to improve quality.
- Preparing Cues
Write a detailed text description, e.g., "The ocean in a storm with huge waves lapping at the rocks and dark clouds." The more specific the cue words, the better the results. - Run the generate command
Generates 256x256 video:torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "Stormy ocean with crashing waves"
Generates 768x768 video (requires multiple GPUs):
torchrun --nproc_per_node 8 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_768px.py --save-dir samples --prompt "Stormy ocean with crashing waves"
- Adjustment parameters
--aspect_ratio
: Set the aspect ratio, e.g.16:9
maybe1:1
The--num_frames
: Set the number of frames in the range 4k+1 (max 129 frames).--offload True
: Enable memory optimization for lower-end devices.
- View Results
The generated video is saved in thesamples
folder in MP4 format.
Image to Video Generation
- Preparing reference images
Upload an image and save it asinput.png
The - Run the generate command
Generates 256p video:torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/256px.py --cond_type i2v_head --prompt "A serene forest with flowing river" --ref input.png
Generates 768p video:
torchrun --nproc_per_node 8 --standalone scripts/diffusion/inference.py configs/diffusion/inference/768px.py --cond_type i2v_head --prompt "A serene forest with flowing river" --ref input.png
- Optimize dynamic effects
utilization--motion-score
Adjusts the degree of screen dynamics. the default value is 4. Example:torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/256px.py --cond_type i2v_head --prompt "A running horse" --ref horse.png --motion-score 7
Cue word optimization
Open-Sora supports the use of GPT-4o to optimize cue words and improve generation quality:
- Set the OpenAI API key:
export OPENAI_API_KEY=sk-xxxx
- Run the optimize command:
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea" --refine-prompt True
Gradio Interface Operation
Open-Sora provides an interactive Gradio interface:
- Startup Interface:
python scripts/demo.py --model-type v2-768px
- Accessed in a browser
http://localhost:7860
The - Input prompt words or upload pictures, adjust resolution and frame rate, click "Generate" to generate video.
- The "Refine prompt" optimization prompt can be enabled (requires OpenAI API key).
Custom Model Training
Users can train models based on their own datasets:
- Prepare the dataset: a CSV file containing the path and description of the video.
- Modify the configuration file:
configs/opensora-v2/train/stage1.py
, set the data path. - Run training:
torchrun --nproc_per_node 8 scripts/train.py configs/opensora-v2/train/stage1.py --data-path your_data.csv
computational efficiency
Open-Sora optimizes the inference efficiency, and the test results are as follows (H100 GPU, 50 steps of sampling):
- 256x256: Single GPU 60 seconds, 52.5GB video memory; 4 GPU 34 seconds, 44.3GB video memory.
- 768x768: 8 GPUs 276 seconds, 44.3GB video memory.
caveat
- hardware requirementNVIDIA H100 or A100 with at least 24GB of video memory is recommended. lower resolutions are available for lower-end devices.
- Cue word quality: Detailed descriptions can dramatically improve video results.
- authorization: Open-Sora uses the MIT license, which allows commercial use, subject to terms.
application scenario
- Short video creation
Bloggers can use Open-Sora to generate promotional videos, such as "city lights twinkling at night", for sharing on social media platforms. - Educational animation
Teachers can generate instructional animations, such as "planets revolving around the sun," to enhance the appeal of the classroom. - Game Scene Design
Developers generate dynamic scenes based on concept maps for use in game backgrounds or transitions. - AI Research
Researchers can modify the model code to test new algorithms or datasets to advance video generation techniques.
QA
- What is the performance of Open-Sora 2.0?
In the VBench review, Open-Sora 2.0 closes the gap with OpenAI Sora to 0.69%, close to HunyuanVideo 11B and Step-Video 30B. - What resolutions and durations are supported?
Supports 144p to 768p, video durations from 2 to 16 seconds, and aspect ratios including 16:9, 9:16, and more. - How to optimize the quality of generation?
Use detailed cues to adjust themotion-score
(1-7), or enable GPT-4o Optimization Cue Word. - Can I use it for free?
Open-Sora is completely open source, with models and code freely available, and follows the MIT license.