AI Personal Learning
and practical guidance
豆包Marscode1

EchoMimic: Audio-driven portrait photos to generate talking videos (EchoMimicV2 accelerated installer)

General Introduction

EchoMimic is an open source project designed to generate realistic portrait animations through audio-driven generation. Developed by Ant Group's Terminal Technologies division, the project utilizes editable marker point conditions that combine audio and facial marker points to generate dynamic portrait videos.EchoMimic has been comprehensively compared across multiple public and proprietary datasets, demonstrating its superior performance in both quantitative and qualitative evaluations.

EchoMimicV2 version optimizes inference speed, adds gesture actions, and is recommended.


EchoMimic:音频驱动的逼真肖像动画-1

Demo at https://www.modelscope.cn/studios/BadToBest/BadToBest V2: https://huggingface.co/spaces/fffiloni/echomimic-v2

 

Function List

  • Audio Driver Animation: Generate realistic portrait animations with audio input.
  • Marker point driven animation: Generate stable portrait animations using facial marker points.
  • Audio + Marker Driver: Combine audio and selected facial markers to generate more natural portrait animations.
  • Multi-language support: Supports audio input in Chinese, English and other languages.
  • Efficient Reasoning: Optimized models and pipelines significantly improve inference speed.

 

Using Help

Installation process

  1. Download Code::
    git clone https://github.com/BadToBest/EchoMimic
    cd EchoMimic
    
  2. Setting up the Python environment::
    • It is recommended to use conda to create a virtual environment:
      conda create -n echomimic python=3.8
      conda activate echomimic
      
    • Install the dependency packages:
      pip install -r requirements.txt
      
  3. Download and unzip ffmpeg-static::
    • Download ffmpeg-static and unzip it, then set the environment variable:
      export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static
      
  4. Download Pre-training Weights::
    • Download the appropriate pre-trained model weights according to the project description.

Usage Process

  1. Running the Web Interface::
    • Launch the web interface:
      python webgui.py
      
    • Visit the local server to view the interface and upload audio files for animation generation.
  2. command-line reasoning::
    • Use the following commands for audio-driven portrait animation generation:
      python infer_audio2vid.py --audio_path /path/to/audio --output_path /path/to/output
      
    • Reasoning in conjunction with signposts:
      python infer_audio2vid_pose.py --audio_path /path/to/audio --landmark_path /path/to/landmark --output_path /path/to/output
      
  3. model optimization::
    • Using the optimized model and pipeline can significantly improve inference speed, e.g. from 7 min/240 fps to 50 sec/240 fps on V100 GPUs.

caveat

  • Ensure that the Python version and CUDA version used are consistent with the project requirements.
  • If you encounter problems during use, you can refer to the project's README file or submit an issue on GitHub for help.

 

Windows One-Click Installer

首席AI分享圈This content has been hidden by the author, please enter the verification code to view the content
Captcha:
Please pay attention to this site WeChat public number, reply "CAPTCHA, a type of challenge-response test (computing)", get the verification code. Search in WeChat for "Chief AI Sharing Circle"or"Looks-AI"or WeChat scanning the right side of the QR code can be concerned about this site WeChat public number.

May not be reproduced without permission:Chief AI Sharing Circle " EchoMimic: Audio-driven portrait photos to generate talking videos (EchoMimicV2 accelerated installer)
en_USEnglish