AI Personal Learning
and practical guidance

ConsisID: a portrait reference map to generate character-consistent video, rapid multi-terminal integration

General Introduction

ConsisID is an open source project developed by Yuan Rong's group at Peking University, aiming to achieve identity-consistent text-to-video generation (IPT2V) through frequency decomposition techniques. The core of the project is a model based on DiT (Diffusion Transformer), which is able to maintain the identity consistency of characters when generating videos.The ConsisID project not only provides the complete code and dataset, but also includes detailed installation and usage guidelines to facilitate users to get started quickly. This project is of great significance in the field of video generation, especially in application scenarios where character consistency needs to be maintained, such as film and television production, virtual reality, and so on.


 

 

Function List

  • Identity Consistent Video Generation: A frequency decomposition technique is used to generate videos that are consistent with the description of the input text and maintain the identity of the characters.
  • Open source code and datasets: Complete code and partial datasets are provided to facilitate secondary development and research.
  • Multi-platform support: Support for running on Windows and Linux systems , providing Jupyter Notebook and ComfyUI extensions .
  • Optimization of high-quality prompt words: Optimize the input of text prompt words using GPT-4o to improve the quality of the generated video.
  • GPU memory optimization: Provides multiple GPU memory optimization options to adapt to different hardware configurations.
  • Community Contributions: Support for community-developed plug-ins and extensions that enhance functionality and usage experience.

 

Using Help

Environment Configuration

  1. Clone the project code:
   git clone --depth=1 https://github.com/PKU-YuanGroup/ConsisID.git
cd ConsisID
  1. Create and activate a virtual environment:
   conda create -n consisid python=3.11.0
conda activate consisid
  1. Install the dependencies:
   pip install -r requirements.txt

Download model weights

  1. Download weights from HuggingFace:
   huggingface-cli download --repo-type model BestWishYsh/ConsisID-preview --local-dir ckpts
  1. Or download it from WiseModel:
   git lfs install
git clone https://www.wisemodel.cn/SHYuanBest/ConsisID-Preview.git

running example

  1. Run the Web UI example:
   python app.py
  1. Run command line reasoning:
   python infer.py --model_path BestWishYsh/ConsisID-preview

Cue word optimization

Use GPT-4o to optimize input text prompt words, e.g. Original prompt word: "A man is playing the guitar." Optimized prompt word: "The video shows a man standing next to an airplane, talking on his cell phone. He is wearing sunglasses, a black top, and a serious expression. The plane has a green stripe down the side and a big engine in the back."

GPU memory optimization

If you do not have multiple GPUs or enough GPU memory, you can enable the following options:

pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

Note: Enabling these options increases inference time and may reduce generation quality.

Data preprocessing

Please refer to the data preprocessing guide in the project for the data needed to train ConsisID. If you need to train text-to-image and video generation models, you need to organize the dataset into the following format:

datasets/
├── captions/
│ ├── dataname_1.json
│ ├── dataname_2.json
├─ dataname_1/ ├─ captions/ │ ├─ refine_1.json
│ ├── refine_bbox_jsons/
│ ├── track_masks_data/ ├── track_masks_data/ ├── track_masks_data/
│ ├── videos/
├── dataname_2/ │
│ ├── refine_bbox_jsons/ │ ├── track_masks_data/ ├── videos/
│ ├── track_masks_data/ ├── videos/ ├── videos/
│ ├── videos/
├── ...
├── total_train_data.txt

model training

  1. Set the hyperparameters:
   bash train_single_rank.sh
  1. Initiate training:
   bash train_multi_rank.sh

Community Contributions

Thanks to the community developers for the plugins and extensions:

  • ComfyUI-ConsisIDWrapper
  • Jupyter-ConsisID
  • Windows-ConsisID

 

ConsisID Quick Integration

Online Experience:Hugging Face

Windows Installer:Hugging Face Beginning Intelligence AI

ComfyUI node:ComfyUI-CogVideoXWrapper openart: https://openart.ai/workflows/TxIQ6lwGkRx2zQiYjvE5

May not be reproduced without permission:Chief AI Sharing Circle " ConsisID: a portrait reference map to generate character-consistent video, rapid multi-terminal integration

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish