Open Sora: An Open Source Video Generation Tool for Optimizing Face Consistency

Latest AI Resources4mos agorelease AI Sharing Circle

General Introduction

Open-Sora is an open source project designed to allow anyone to efficiently generate high quality videos. It is developed by the hpcaitech team to provide tools to generate video from text or images, supporting a variety of resolutions and durations. The project is completely open source, with model weights, code and training processes made public to encourage community contributions. The latest version, Open-Sora 2.0, is close to the industry's top models in terms of performance, costs only $200,000 to train, and generates fast and high-quality video. Users can download the code for free, run it locally, or experience it through Hugging Face's Gradio interface. Open-Sora is for creators, developers and researchers, driving popularity and innovation in video creation while providing a commercial product:Video Ocean The

Function List

Text-to-video generation: Enter a text description to generate a video that matches the content.
Image to Video Generation: Based on a single image, generate dynamic video.
Video Extension: extends the length of the video or adds content.
Multi-resolution support: Supports video output from 144p to 768p.
Flexible Duration: Generate videos from 2 seconds to 16 seconds.
Variety of aspect ratios: supports 16:9, 9:16, 1:1 and more.
Open source models and training: provide model weights and training code, support custom development.
Efficient Reasoning: Optimized algorithms reduce hardware requirements and generate video on a single GPU.
Cue word optimization: Support GPT-4o to enhance the cue words and improve the generation quality.
Motion Scoring: Adjust video dynamics through motion scoring.

Using Help

Installation process

To use Open-Sora, users need to configure their Python environment and install dependencies. Below are the detailed steps:

Creating a Virtual Environment
Create a virtual environment with Python 3.10 to avoid dependency conflicts:
```
conda create -n opensora python=3.10
conda activate opensora
```

Cloning Codebase
Download the Open-Sora project from GitHub:

git clone https://github.com/hpcaitech/Open-Sora
cd Open-Sora

Installation of dependencies
Make sure PyTorch version is higher than 2.4.0 by running the following command:
```
pip install -v .
```
If a development mode is required, use it:
```
pip install -v -e .
```

Installation of acceleration libraries
Open-Sora Usage xformers cap (a poem) flash-attn Boost performance. Installed according to the CUDA version:

pip install xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu121
pip install flash-attn --no-build-isolation

For faster reasoning, manually compiled flash-attention::

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper
python setup.py install

Download model weights
Open-Sora 2.0 supports 256p and 768p video generation and models can be downloaded from Hugging Face or ModelScope:

pip install "huggingface_hub[cli]"
huggingface-cli download hpcai-tech/Open-Sora-v2 --local-dir ./ckpts

or use ModelScope:

pip install modelscope
modelscope download hpcai-tech/Open-Sora-v2 --local_dir ./ckpts

Verify Installation
Check that the environment is normal:

python -c "import opensora; print(opensora.__version__)"

Usage

Open-Sora supports text-to-video, image-to-video and many other functions with easy operation. The following is a detailed guide:

Text to Video Generation

Open-Sora optimizes the image-to-video process, but also supports direct text-to-video generation. A text-to-image-to-video pipeline is recommended to improve quality.

Preparing Cues
Write a detailed text description, e.g., "The ocean in a storm with huge waves lapping at the rocks and dark clouds." The more specific the cue words, the better the results.

Run the generate command
Generates 256x256 video:

torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "Stormy ocean with crashing waves"

Generates 768x768 video (requires multiple GPUs):

torchrun --nproc_per_node 8 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_768px.py --save-dir samples --prompt "Stormy ocean with crashing waves"

Adjustment parameters
- --aspect_ratio: Set the aspect ratio, e.g. 16:9 maybe 1:1The
- --num_frames: Set the number of frames in the range 4k+1 (max 129 frames).
- --offload True: Enable memory optimization for lower-end devices.
View Results
The generated video is saved in the samples folder in MP4 format.

Image to Video Generation

Preparing reference images
Upload an image and save it as input.pngThe

Run the generate command
Generates 256p video:

torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/256px.py --cond_type i2v_head --prompt "A serene forest with flowing river" --ref input.png

Generates 768p video:

torchrun --nproc_per_node 8 --standalone scripts/diffusion/inference.py configs/diffusion/inference/768px.py --cond_type i2v_head --prompt "A serene forest with flowing river" --ref input.png

Optimize dynamic effects
utilization --motion-score Adjusts the degree of screen dynamics. the default value is 4. Example:

torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/256px.py --cond_type i2v_head --prompt "A running horse" --ref horse.png --motion-score 7

Cue word optimization

Open-Sora supports the use of GPT-4o to optimize cue words and improve generation quality:

Set the OpenAI API key:
```
export OPENAI_API_KEY=sk-xxxx
```

Run the optimize command:

torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea" --refine-prompt True

Gradio Interface Operation

Open-Sora provides an interactive Gradio interface:

Startup Interface:

python scripts/demo.py --model-type v2-768px

Accessed in a browser http://localhost:7860The
Input prompt words or upload pictures, adjust resolution and frame rate, click "Generate" to generate video.
The "Refine prompt" optimization prompt can be enabled (requires OpenAI API key).

Custom Model Training

Users can train models based on their own datasets:

Prepare the dataset: a CSV file containing the path and description of the video.
Modify the configuration file:configs/opensora-v2/train/stage1.py, set the data path.

Run training:

torchrun --nproc_per_node 8 scripts/train.py configs/opensora-v2/train/stage1.py --data-path your_data.csv

computational efficiency

Open-Sora optimizes the inference efficiency, and the test results are as follows (H100 GPU, 50 steps of sampling):

256x256: Single GPU 60 seconds, 52.5GB video memory; 4 GPU 34 seconds, 44.3GB video memory.
768x768: 8 GPUs 276 seconds, 44.3GB video memory.

caveat

hardware requirementNVIDIA H100 or A100 with at least 24GB of video memory is recommended. lower resolutions are available for lower-end devices.
Cue word quality: Detailed descriptions can dramatically improve video results.
authorization: Open-Sora uses the MIT license, which allows commercial use, subject to terms.

application scenario

Short video creation
Bloggers can use Open-Sora to generate promotional videos, such as "city lights twinkling at night", for sharing on social media platforms.
Educational animation
Teachers can generate instructional animations, such as "planets revolving around the sun," to enhance the appeal of the classroom.
Game Scene Design
Developers generate dynamic scenes based on concept maps for use in game backgrounds or transitions.
AI Research
Researchers can modify the model code to test new algorithms or datasets to advance video generation techniques.

QA

What is the performance of Open-Sora 2.0?
In the VBench review, Open-Sora 2.0 and OpenAI Sora The gap narrows to 0.691 TP3T, close to the HunyuanVideo 11B and Step-Video 30B.
What resolutions and durations are supported?
Supports 144p to 768p, video durations from 2 to 16 seconds, and aspect ratios including 16:9, 9:16, and more.
How to optimize the quality of generation?
Use detailed cues to adjust the motion-score(1-7), or enable GPT-4o Optimization Cue Word.
Can I use it for free?
Open-Sora is completely open source, with models and code freely available, and follows the MIT license.