STAR: Spatial Timing Enhancement AI Model to Improve Video Resolution Sharpness

Latest AI Resources7mos agorelease AI Sharing Circle

1.5K 00

General Introduction

STAR (Spatial-Temporal Augmentation with Text-to-Video Models) is an innovative video super-resolution framework jointly developed by Nanjing University, ByteDance and Southwest University. The project is dedicated to solving key problems in real-world video super-resolution processing, and realizes high-quality enhancement of video frames by combining the a priori knowledge of text-to-video (T2V) diffusion models.The distinguishing feature of the STAR model is its ability to simultaneously maintain spatial detail realism and temporal consistency, which is often difficult to be balanced in traditional GAN-based approaches. The project provides two versions of implementation: a lightweight and heavy quality reduction processing model based on I2VGen-XL, and a heavy quality reduction processing model based on CogVideoX-5B, which is capable of adapting to the needs of video enhancement in different scenarios.

Function List

Supports super-resolution reconstruction for many types of video degradation processing (light and heavy)
Automated cue word generation, support for using tools such as Pllava to generate video descriptions
Provision of an online demo platform (HuggingFace Spaces)
Support 720x480 resolution video input processing
Provide complete inference code and pre-trained models
Integration of Local Information Enhancement Module (LIEM) to improve the quality of detailed reconstruction of the screen
Support batch video processing
Provides flexible model weighting options

Using Help

1. Environmental configuration

First you need to configure the runtime environment as follows:

Clone the code repository:

git clone https://github.com/NJU-PCALab/STAR.git
cd STAR

Create and activate the conda environment:

conda create -n star python=3.10
conda activate star
pip install -r requirements.txt
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y

2. Model selection and downloading

STAR offers two versions of the model:

I2VGen-XL based model:
- light_deg.pt: for light degradation video processing
- heavy_deg.pt: for heavily degraded video processing
CogVideoX-5B based model:
- Specialized for processing heavily degraded videos
- Supports 720x480 resolution input only

Download the appropriate model weights from HuggingFace and place them in thepretrained_weight/Catalog.

3. Video processing flow

Prepare test data:
- Place the video to be processed into theinput/video/catalogs
- Cue word preparation (three choices):
  - unprompted word
  - Automatic generation using Pllava
  - Manually writing video descriptions
Configure processing parameters:
- modificationsvideo_super_resolution/scripts/inference_sr.shThe path configuration in the
  - video_folder_path: input video path
  - txt_file_path: prompt file path
  - model_path: model weight path
  - save_dir: output save path
Initiate reasoning:

bash video_super_resolution/scripts/inference_sr.sh

Note: If you encounter a memory overflow (OOM) problem, you can add a new file in theinference_sr.shmidrange minor (in music)frame_lengthParameters.

4. CogVideoX-5B model special configuration

If using the CogVideoX-5B model, additional steps are required:

Creation of specialized environments:

conda create -n star_cog python=3.10
conda activate star_cog
cd cogvideox-based/sat
pip install -r requirements.txt

Download additional dependencies:

Requires download of VAE and T5 Encoder
updatecogvideox-based/sat/configs/cogvideox_5b/cogvideox_5b_infer_sr.yamlThe path configuration in the
Replacing the transformer.py file