LiteAvatar: Audio-driven 2D portraits of real-time interactive digital people running at 30fps on the CPU

Latest AI Resources6mos agorelease AI Sharing Circle

1.9K 00

General Introduction

LiteAvatar is an open source tool developed by the HumanAIGC team (part of Ali) that focuses on generating facial animations from audio-driven 2D avatars in real-time. It runs at 30 frames per second (fps) on CPU alone, making it particularly suitable for low-power scenarios such as real-time 2D video chat or avatar applications on mobile devices.LiteAvatar combines speech recognition (ASR) and mouth prediction technology to generate synchronized facial expressions and mouth movements based on incoming audio features, with smooth and natural animation effects. The project is hosted on GitHub, with full code and documentation available for developers to access for free and develop as needed. Whether for entertainment, education, or virtual hosting, this tool demonstrates the perfect combination of lightweight and high performance.

Deploying LiteAvatar Live Interactive Edition: https://github.com/HumanAIGC-Engineering/OpenAvatarChat

Function List

Audio Driven Animation Generation: Generate facial expression and mouth animation of the avatar in real time by inputting audio.
Lightweight operation: Smooth animations at 30fps can be achieved by relying only on the CPU, with no GPU support required.
Mouth Synchronization Prediction: The ASR model is utilized to extract audio features and generate mouth movements that match the speech content.
Support for mobile devices: The optimized model is adapted to low-power devices and is suitable for cell phones or tablets.
Open Source Support: Full source code is provided, allowing users to customize features or integrate them into other projects.
Real-time processing capability: Low-latency processing of audio inputs ensures that animation is highly synchronized with sound.

Using Help

LiteAvatar is a GitHub-based open source project that requires users to have a certain technical base to install and use. The following is a detailed installation and use guide to help you quickly get started with this audio-driven 2D avatar tool.

Installation process

environmental preparation
- Make sure you have Python 3.8 or above installed on your computer. This can be done with the command python --version Check the version.
- Install Git, which is used to download code from GitHub; Windows users can download Git from the official website, and Linux or macOS users can install it via a package manager (e.g. sudo apt install git).
- Prepare a terminal that supports the command line (e.g., CMD, PowerShell for Windows, or Terminal for Linux/macOS).
Download LiteAvatar Project
- Open a terminal and enter the following command to clone the code repository:
```
git clone https://github.com/HumanAIGC/lite-avatar.git
```
- Once the cloning is complete, go to the project directory:
```
cd lite-avatar
```
Installation of dependencies
- The project requires some Python library support. Run the following command to install the dependencies:
```
pip install -r requirements.txt
```
- in the event that requirements.txt No specific dependencies are listed in the documentation, you can refer to the project documentation, common dependencies may include numpy,torch(CPU version),modelscope etc. Example of manual installation:
```
pip install numpy torch modelscope
```
Verify Installation
- After the installation is complete, run a simple test command (the exact command is based on the project README, for example:
```
python demo.py
```
- If no error is reported, the environment is configured successfully.

Usage

The core function of LiteAvatar is to generate animations from audio-driven avatars. Here are the detailed steps:

Preparing Audio Files

audio format: Supports common formats such as .wav maybe .mp3. It is recommended to use clear mono audio with a sampling rate of around 16kHz for best results.
Audio Source: It can be your recorded voice, or audio extracted from a video. Recommended tool: Audacity (free audio editing software).

Running real-time animations

triggering program
- In the project directory, run the main script (assuming the main.py(The specific file name is based on the README):
```
python main.py --audio_path your_audio_file.wav
```
- Parameter Description:
  - --audio_path: Specifies the audio file path.
  - --output: Optional parameter to specify the path to save the generated animated video, the default may be displayed directly.
Real-time input testing
- If microphone input is supported, try real-time mode (need to check if README provides this feature). Example command:
```
python main.py --live
```
- The program listens for microphone input and generates animations in real time.

View Results

animation output: After running, the program displays an animation of the avatar on the screen or generates a video file (such as a output.mp4).
Adjustment parameters: If the animation is not satisfactory, you can refer to the documentation to adjust the model parameters, such as frame rate or mouth sensitivity (depending on the code implementation).

Featured Function Operation

Audio Driven Animation Generation

move::
1. Prepare the audio file, e.g. test.wavThe
2. Run command:
```
python main.py --audio_path test.wav --output result.mp4
```
3. The program calls ModelScope's ASR model to extract audio features, and then generates the animation through the mouth prediction model.
effectThe mouth shape and expression of the avatar changes with the audio, for example, the mouth opens when saying "hello" and the rhythm is stronger when singing.

Mobile device deployment

prerequisites: Models need to be exported to a lightweight format (e.g. ONNX) and integrated on mobile.
manipulate::
1. Convert the model locally (specific scripts to be added to the project documentation, example assumptions are export.py):
```
python export.py --model lite_avatar_model.pth --output lite_avatar.onnx
```
2. commander-in-chief (military) .onnx The files are deployed to mobile and run using ONNX-enabled frameworks such as NCNN.
in the end: Low-power real-time animation on cell phones, suitable for video chat applications.

caveat

performance optimization: If you are running laggy, you can reduce the frame rate (e.g. from 30fps to 15fps) by modifying the configuration file or command line parameters.
adjust components during testing: If you get an error, check to see if the dependency versions match, or check the GitHub Issues page for community help.
scalability: If you want to add new features (e.g. emoji control), you can fork the project and modify the code, the HumanAIGC team welcomes contributors to submit Pull Requests.

With these steps, you can easily install and use LiteAvatar to experience the audio-driven avatar animation generation process. Whether for development testing or real-world applications, this tool provides an efficient and convenient solution.