AI Personal Learning
and practical guidance
Resource Recommendation 1

Hallo2: audio-driven generation of lip-synchronized/expression-synchronized portrait videos (Windows one-click installation)

General Introduction

Hallo2 is an open source project jointly developed by Fudan University and Baidu to generate high-resolution portrait animations through audio-driven generation. The project utilizes advanced Generative Adversarial Networks (GAN) and time alignment techniques to achieve 4K resolution and up to 1 hour of video generation.Hallo2 also supports text prompts to enhance the diversity and controllability of generated content.

Hallo3 was released and achieved significant lip synchronization by introducing a cross-attention mechanism for audio conditioning that effectively captures the complex relationship between audio signals and facial expressions.

Note that:Hallo3 has the following simple requirements on the input data for inference:

  • Reference Image: The reference image must have an aspect ratio of 1:1 or 3:2.
  • Driver Audio: The driver audio must be in WAV format.
  • Audio language: the audio must be in English, as the model's training dataset contains only this language.
  • Audio clarity: ensure that vocals are clear in the audio; background music is acceptable.

Hallo2: Audio-driven Raw Growth Duration and High Resolution Portrait Animation Video-1


 

Function List

  • Audio Driven Animation Generation: Generate corresponding portrait animations from input audio files.
  • High Resolution Support: Support for generating videos with 4K resolution to ensure clear picture quality.
  • Long video generation: Can generate video content up to 1 hour long.
  • Text Alert Enhancement: Control generated portrait expressions and actions with semantic text labels.
  • open source: Full source code and pre-trained models are provided to facilitate secondary development.
  • Multi-platform support: Supports running on multiple platforms such as Windows, Linux, etc.

 

Using Help

Installation process

  1. system requirements::
    • Operating system: Ubuntu 20.04/22.04
    • GPU: Graphics card supporting CUDA 11.8 (e.g. A100)
  2. Creating a Virtual Environment::
    conda create -n hallo python=3.10
    conda activate hallo
    
  3. Installation of dependencies::
    pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
    pip install -r requirements.txt
    sudo apt-get install ffmpeg
    
  4. Download pre-trained model::
    git lfs install
    git clone https://huggingface.co/fudan-generative-ai/hallo2 pretrained_models
    

Usage Process

  1. Preparing to enter data::
    • Download and prepare the required pre-trained model.
    • Prepare the source image and driver audio files.
  2. Running inference scripts::
    python scripts/inference.py --source_image path/to/image --driving_audio path/to/audio
    
  3. View Generated Results::
    • The generated video file will be saved in the specified output directory and can be viewed using any video player.

Detailed steps

  1. Download Code::
    git clone https://github.com/fudan-generative-vision/hallo2
    cd hallo2
    
  2. Create and activate a virtual environment::
    conda create -n hallo python=3.10
    conda activate hallo
    
  3. Install the necessary Python packages::
    pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
    pip install -r requirements.txt
    
  4. Install ffmpeg::
    sudo apt-get install ffmpeg
    
  5. Download pre-trained model::
    git lfs install
    git clone https://huggingface.co/fudan-generative-ai/hallo2 pretrained_models
    
  6. Running inference scripts::
    python scripts/inference.py --source_image path/to/image --driving_audio path/to/audio
    
  7. View Generated Results::
    • The generated video file will be saved in the specified output directory and can be viewed using any video player.

 

Hallo2: One-Click Installer for Windows

Chief AI Sharing CircleThis content has been hidden by the author, please enter the verification code to view the content
Captcha:
Please pay attention to this site WeChat public number, reply "CAPTCHA, a type of challenge-response test (computing)", get the verification code. Search in WeChat for "Chief AI Sharing Circle"or"Looks-AI"or WeChat scanning the right side of the QR code can be concerned about this site WeChat public number.

Content 1
May not be reproduced without permission:Chief AI Sharing Circle " Hallo2: audio-driven generation of lip-synchronized/expression-synchronized portrait videos (Windows one-click installation)

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish