AI Personal Learning
and practical guidance

VideoChat: real-time voice-interactive digital person with customized image and tone cloning, supporting end-to-end voice solutions and cascading solutions

General Introduction

VideoChat is a real-time voice interaction digital human project based on open source technology, supporting end-to-end voice scheme (GLM-4-Voice - THG) and cascade scheme (ASR-LLM-TTS-THG). The project allows users to customize the image and timbre of the digital human, and supports timbre cloning and lip synchronization, video streaming output, and first packet delay as low as 3 seconds. Users can experience its features through online demos, or deploy and use it locally through detailed technical documentation.

Demo address: https://www.modelscope.cn/studios/AI-ModelScope/video_chat


 

Function List

  • Real-time voice interaction: support for end-to-end voice solutions and cascading solutions
  • Customized Image and Tone: Users can customize the appearance and voice of the digital person according to their needs
  • Voice cloning: supports cloning of user's voice to provide a personalized voice experience
  • Low latency: first packet latency as low as 3 seconds to ensure smooth interaction experience
  • Open source project: based on open source technology, users can freely modify and expand the function

 

Using Help

Installation process

  1. Environment Configuration
    • Operating system: Ubuntu 22.04
    • Python version: 3.10
    • CUDA version: 12.2
    • Torch version: 2.1.2
  2. cloning project
    git lfs install
    git clone https://github.com/Henry-23/VideoChat.git
    cd video_chat
    
  3. Create a virtual environment and install dependencies
    conda create -n metahuman python=3.10
    conda activate metahuman
    pip install -r requirements.txt
    pip install --upgrade gradio
    
  4. Download the weights file
    • Recommended to use CreateSpace to download, have set up git lfs to track weight files
    git clone https://www.modelscope.cn/studios/AI-ModelScope/video_chat.git
    
  5. Starting services
    python app.py
    

Usage Process

  1. Configuring the API-KEY::
    • If the performance of the local machine is limited, you can use the Qwen API and CosyVoice API provided by Aliyun's big model service platform, Hundred Refine, on theapp.pyConfigure the API-KEY in the
  2. local inference::
    • If you don't use the API-KEY, you can add the API-KEY to thesrc/llm.pycap (a poem)src/tts.pyConfigure the local inference method in to remove unneeded API call code.
  3. Starting services::
    • (of a computer) runpython app.pyStart the service.
  4. Customizing the digital persona::
    • exist/data/video/Catalog to add the recorded digital human image video.
    • modifications/src/thg.pyin the avatar_list of the Muse_Talk class, adding the image name and bbox_shift.
    • existapp.pyAfter adding the name of the digital persona to the avatar_name in Gradio, restart the service and wait for the initialization to complete.

Detailed Operation Procedure

  • Customize image and tone: in /data/video/ directory to add a recorded video of the digital human image to the src/thg.py modifications Muse_Talk class avatar_list, add the image name and bbox_shift Parameters.
  • voice cloning: in app.py centralized configuration CosyVoice API or using Edge_TTS Perform local reasoning.
  • End-to-end voice solutions: Use GLM-4-Voice model to provide efficient speech generation and recognition.

 

  1. Visit the address of the locally deployed service and go to the Gradio interface.
  2. Select or upload a customized digital persona video.
  3. Configure the voice clone function to upload a user's voice sample.
  4. Start real-time voice interaction and experience low-latency conversational capabilities.
AI Easy Learning

The layman's guide to getting started with AI

Help you learn how to utilize AI tools at a low cost and from a zero base.AI, like office software, is an essential skill for everyone. Mastering AI will give you an edge in your job search and half the effort in your future work and studies.

View Details>
May not be reproduced without permission:Chief AI Sharing Circle " VideoChat: real-time voice-interactive digital person with customized image and tone cloning, supporting end-to-end voice solutions and cascading solutions

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish