AI Personal Learning
and practical guidance

STAR: Spatial Timing Enhancement AI Model to Improve Video Resolution Sharpness

General Introduction

STAR (Spatial-Temporal Augmentation with Text-to-Video Models) is an innovative video super-resolution framework jointly developed by Nanjing University, ByteDance and Southwest University. The project is dedicated to solving key problems in real-world video super-resolution processing, and realizes high-quality enhancement of video frames by combining the a priori knowledge of text-to-video (T2V) diffusion models.The distinguishing feature of the STAR model is its ability to simultaneously maintain spatial detail realism and temporal consistency, which is often difficult to be balanced in traditional GAN-based approaches. The project provides two versions of implementation: a lightweight and heavy quality reduction processing model based on I2VGen-XL, and a heavy quality reduction processing model based on CogVideoX-5B, which is capable of adapting to the needs of video enhancement in different scenarios.

STAR: Spatial Timing Enhancement AI Model to Improve Video Resolution Sharpness-1


 

Function List

  • Supports super-resolution reconstruction for many types of video degradation processing (light and heavy)
  • Automated cue word generation, support for using tools such as Pllava to generate video descriptions
  • Provision of an online demo platform (HuggingFace Spaces)
  • Support 720x480 resolution video input processing
  • Provide complete inference code and pre-trained models
  • Integration of Local Information Enhancement Module (LIEM) to improve the quality of detailed reconstruction of the screen
  • Support batch video processing
  • Provides flexible model weighting options

 

Using Help

1. Environmental configuration

First you need to configure the runtime environment as follows:

  1. Clone the code repository:
git clone https://github.com/NJU-PCALab/STAR.git
cd STAR
  1. Create and activate the conda environment:
conda create -n star python=3.10
conda activate star
pip install -r requirements.txt
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y

2. Model selection and downloading

STAR offers two versions of the model:

  • I2VGen-XL based model:
    • light_deg.pt: for light degradation video processing
    • heavy_deg.pt: for heavily degraded video processing
  • CogVideoX-5B based model:
    • Specialized for processing heavily degraded videos
    • Supports 720x480 resolution input only

Download the appropriate model weights from HuggingFace and place them in thepretrained_weight/Catalog.

3. Video processing flow

  1. Prepare test data:
    • Place the video to be processed into theinput/video/catalogs
    • Cue word preparation (three choices):
      • unprompted word
      • Automatic generation using Pllava
      • Manually writing video descriptions
  2. Configure processing parameters:
    • modificationsvideo_super_resolution/scripts/inference_sr.shThe path configuration in the
      • video_folder_path: input video path
      • txt_file_path: prompt file path
      • model_path: model weight path
      • save_dir: output save path
  3. Initiate reasoning:
bash video_super_resolution/scripts/inference_sr.sh

Note: If you encounter a memory overflow (OOM) problem, you can add a new file in theinference_sr.shmidrange minor (in music)frame_lengthParameters.

4. CogVideoX-5B model special configuration

If using the CogVideoX-5B model, additional steps are required:

  1. Creation of specialized environments:
conda create -n star_cog python=3.10
conda activate star_cog
cd cogvideox-based/sat
pip install -r requirements.txt
  1. Download additional dependencies:
  • Requires download of VAE and T5 Encoder
  • updatecogvideox-based/sat/configs/cogvideox_5b/cogvideox_5b_infer_sr.yamlThe path configuration in the
  • Replacing the transformer.py file
May not be reproduced without permission:Chief AI Sharing Circle " STAR: Spatial Timing Enhancement AI Model to Improve Video Resolution Sharpness

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish