OpenAvatarChat: a modularly designed digital human conversation tool

Latest AI Resources4mos agorelease AI Sharing Circle

1.1K 00

General Introduction

OpenAvatarChat is an open source project developed by the HumanAIGC-Engineering team and hosted on GitHub. It is a modular digital human conversation tool that allows users to run full functionality on a single PC. The project combines real-time video, speech recognition, and digital human technology, and its core features are flexible modular design and low-latency interaction. The audio component uses SenseVoice, qwen-plus, and CosyVoice The video part relies on LiteAvatar algorithm. The code is completely open for developers to study and improve.

Function List

Modular digital human dialog: support real-time interaction of voice and video, modules can be freely combined.
Real-time audio/video transmission: low-latency audio/video communication via gradio-webrtc.
Speech Recognition and Generation: Integrates with SenseVoice and CosyVoice to handle speech input and output.
Digital Human Animation: Generate smooth digital human expressions and movements with LiteAvatar.
Open source support: provide complete code, users can modify or optimize according to requirements.

Using Help

OpenAvatarChat is an open source project that requires users to download the code and configure the environment on their own. The following are detailed installation and use steps to help you get started quickly.

Installation process

Checking system requirements
Before you begin, make sure your device meets the following conditions:
- Python 3.10 or later.
- CUDA-enabled GPU with at least 10GB of video memory (20GB or more for unquantized models).
- CPU performance is strong (tested at 30FPS with i9-13980HX).
  You can check the Python version with the following command:

python --version

Installing Git LFS
The project uses Git LFS to manage large files, so install it first:

sudo apt install git-lfs
git lfs install

Download Code
Clone the project locally by entering the following command in the terminal:

git clone https://github.com/HumanAIGC-Engineering/OpenAvatarChat.git
cd OpenAvatarChat

Updating submodules
The project depends on several submodules, which are updated by running the following command:

git submodule update --init --recursive

Download model
The multimodal language model MiniCPM-o-2.6 needs to be downloaded manually. You can download it from Huggingface maybe Modelscope Get. Place the model into the models/ folder, or run a script to download it automatically:

scripts/download_MiniCPM-o_2.6.sh

If the video memory is insufficient, download the int4 quantized version:

scripts/download_MiniCPM-o_2.6-int4.sh

Installation of dependencies
Run it in the project root directory:

pip install -r requirements.txt

Generate SSL Certificate
If remote access is required, generate a self-signed certificate:

scripts/create_ssl_certs.sh

Certificates are stored by default in the ssl_certs/ Folder.

running program
There are two types of startup:

run directly::
```
python src/demo.py
```
Containerized operation(Nvidia Docker required):
```
build_and_run.sh
```

Main Functions

Launching the Digital Human Dialogue
After running the program, open a browser and visit https://localhost:8282(Ports are available on the configs/sample.yaml (Modify). The interface will show the digital person, click "Start", the program will call the camera and microphone, and enter the conversation mode.
voice interaction
Speaking into the microphone, the system will recognize the voice through SenseVoice, MiniCPM-o will generate a reply, and CosyVoice will convert it to voice output. The digitizer will synchronize the expression and mouth shape. Tests show a response latency of about 2.2 seconds (based on i9-13900KF and RTX 4090).
Adjustment of the configuration
compiler configs/sample.yaml Documentation. Example:
Modify the port: set the service.port Change to another value.
Adjust the frame rate: set Tts2Face.fps Set to 30.
Save and restart the program for the configuration to take effect.

workflow

Start the program and wait for the interface to finish loading.
Check that the camera and microphone are working properly.
Start a conversation and the system automatically processes the voice and video.
To stop, press Ctrl+C to close the terminal or exit the interface.

Cloud Alternatives

If the local configuration is insufficient, you can replace MiniCPM-o with a cloud LLM:

modifications src/demo.pyTo enable the ASR, LLM, and TTS processors, note the MiniCPM section.
exist configs/sample.yaml configure LLM_Bailian, fill in the API address and key, for example:

LLM_Bailian:
model_name: "qwen-plus"
api_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
api_key: "yourapikey"

Restart the program to use it.

caveat

Insufficient GPU memory may cause the program to crash, it is recommended to use the int4 model or the cloud API.
Unstable network will affect real-time transmission, wired connection is recommended.
Configuration paths support relative paths (based on the project root directory) or absolute paths.

application scenario

Technical studies
Developers can use it to study digital human conversation technologies and analyze the implementation of modular designs.
personal test
Users can build local services and experience voice-driven digital human interaction.
Education and training
Students can learn the principles of speech recognition, language modeling, and digital human animation through code.

QA

What if I don't have enough video memory?
Download the int4 quantitative model, or use the cloud-based LLM API instead of a local model.
Does it support multiplayer conversations?
The current version is suitable for single-player conversations, multi-player functions need to be developed by yourself.
What about running lag?
Check CPU/GPU performance, reduce frame rate or turn off fast mode.