AI Personal Learning
and practical guidance
Beanbag Marscode1

HeyGem: Silicon Intelligence's Open Source Heygen Digital Human Pantographs

General Introduction

HeyGem is a fully offline video compositing tool for Windows, developed by the GuijiAI team and open-sourced on GitHub. It uses advanced AI algorithms to accurately clone a user's appearance and voice to create realistic avatars, and supports personalized videos driven by text or voice. The tool does not need to be connected to the Internet, all operations are done locally to ensure user privacy and security. HeyGem supports multi-language script (including English, Japanese, Korean, Chinese and other eight languages), simple and intuitive interface, suitable for users with no technical background to get started quickly, and provides an open API to facilitate developers to expand the functionality. A few months ago Silicon Intelligence open source mobile version of the digital man DUIX: Real-time interactive intelligent digital people with multi-platform one-click deployment supportThe

HeyGem: Silicon Intelligence Open Source Heygen Digital Human Pincushion Project-1

HeyGem official download address: https://heygem.ai/


 

Function List

  • Precise appearance and voice cloning: AI technology captures facial features and vocal details to generate high-fidelity avatars and voices with support for parameter adjustment.
  • Text-driven virtual image: After entering text, the tool automatically generates natural speech and drives the avatar through lip synchronization and expression movements.
  • Voice-driven video production: Generate dynamic videos by controlling the tone and rhythm of the avatar through user voice input.
  • Fully offline operation: No network connection is required and all data is processed locally for privacy and security.
  • Multi-language support: Eight language scripts are supported: English, Japanese, Korean, Chinese, French, German, Arabic and Spanish.
  • Efficient video compositing: Intelligent optimization of audio and video synchronization ensures a natural match between lip shape and voice.
  • Open Source API Interface: Provides APIs for model training and video compositing, with customizable features for developers.

 

Using Help

Installation process

The following installation process strictly follows the official instructions, retaining the original text and image addresses:

Prerequisites

  1. Must have a D drive.: Mainly for storing digital images and project data
    • Free space requirement: more than 30GB
  2. C Disk: Used to store service image files
    • Free space requirement: greater than 100GB
    • If you have less than 100GB of free space, after installing Docker, you can select a disk folder with more than 100GB of free space in the location shown below:
      HeyGem: Heygen open source pinto project for digital people-1
  3. system requirements::
    • Current support for Windows 10 19042.1526 or later
  4. Recommended Configurations::
    • CPU: 13th Gen Intel Core i5-13400F
    • Memory: 32GB
    • Graphics card: RTX-4070
  5. Ensure that you have an NVIDIA graphics card and that the drivers are installed correctly.
    • NVIDIA driver download link: https://www.nvidia.cn/drivers/lookup/
      HeyGem: Heygen open source pinto project for digital people -2

Installing Windows Docker

  1. Using commands wsl --list --verbose Check to see if WSL is already installed. the image below shows that it is installed and does not need to be reinstalled:
    HeyGem: Heygen open source pinto project for digital people -3

    • WSL installation commands:wsl --install
    • May fail due to network problems, please try several times
    • Set up and memorize a new username and password during the installation process
  2. utilization wsl --update Update WSL:
    HeyGem: Heygen open source pinto project for digital people -4
  3. Download Docker for Windows and choose an installer that fits your CPU architecture.
  4. This screen indicates successful installation:
    HeyGem: Heygen open source pinto project for digital people -5
  5. Run Docker:
    HeyGem: Heygen open source pinto project for digital people -6
  6. Accepts protocol and skips login on first run:
    HeyGem: Heygen open source pinto project for digital people-7
    HeyGem: Heygen open source pinto project for digital people -8
    HeyGem: Heygen open source pinto project for digital people-9

Installing the Server

Install the following using Docker and docker-compose:

  1. docker-compose.yml The file is located in the /deploy Catalog.
  2. exist /deploy directory to execute the docker-compose up -dThe
  3. Wait patiently (about half an hour, depending on the speed of the Internet), the download will consume about 70GB of traffic, please make sure to use WiFi.
  4. Success is indicated when three services are seen in Docker:
    HeyGem: Heygen open source flat replacement project for digital people-10

Client

  1. Build Script npm run build:winAfter execution, it will be in the dist Catalog Generation HeyGem-1.0.0-setup.exeThe
  2. double-click HeyGem-1.0.0-setup.exe Perform the installation.

Dependencies

  1. Nodejs 18
  2. Docker image:
    • docker pull guiji2025/fun-asr:1.0.1
    • docker pull guiji2025/fish-speech-ziming:1.0.39
    • docker pull guiji2025/heygem.ai:0.0.7_sdk_slim

Main function operation flow

1. Appearance and voice cloning

  • Prepare material
    • Record a clear voice (10-30 seconds in WAV format) and put it into the D:\heygem_data\voice\dataThe
    • Take a high-resolution photo of the front and place it in the D:\heygem_data\face2face(Paths can be found in the docker-compose.yml (adjusted in the middle).
  • Run Clone Function
    • Launch the client, open the interface and select "Model Training".
    • Calling the API http://127.0.0.1:18180/v1/preprocess_and_tran, input parameters such as:
      {
      "format": ".wav",
      "reference_audio": "D:/heygem_data/voice/data/sample.wav",
      "lang": "zh"
      }
      
    • Get the returned results (e.g. audio path and text) and save them for later use.

2. Text-driven virtual images

  • input text
    • Select "Audio Synthesis" in the client interface and call the API. http://127.0.0.1:18180/v1/invoke, input parameters such as:
      {
      "speaker": "unique-uuid".
      "text": "Welcome to the HeyGem.ai experience",
      "format": "wav",
      "topP": 0.7, "max_new_tokens": 10.7, "max_new_tokens".
      
      
      
      "temperature": 0.7, "need_asr": false_asr": false_asr": false_asr
      
      "streaming": false, "is_fixed_select": false, "is_fixed_select": false
      "is_fixed_seed": 0, "is_norm": 0, "is_fixed_seed": 0, "is_norm": 0
      
      "reference_audio": "Path to returned audio",
      "reference_text": "Returned text"
      }
      
  • Generate Video
    • Using the Synthesis Interface http://127.0.0.1:8383/easy/submit, input parameters such as:
      {
      "audio_url": "Path to the generated audio.",
      "video_url": "D:/heygem_data/face2face/sample.mp4",
      "code": "unique-uuid",
      "chaofen": 0, "chaofen": 0, "chaofen": 0, "chaofen".
      "watermark_switch": 0,
      "pn": 1
      }
      
    • Query Progress:http://127.0.0.1:8383/easy/query?code=unique-uuidThe
  • Save results
    • When finished, the video file is saved locally in the specified path.

3. Voice-driven video production

  • record voice
    • Record your voice in the client, or upload a WAV file directly to the D:\heygem_data\voice\dataThe
  • Generate Video
    • Call the above audio and video compositing APIs to generate an avatar video with actions.
  • Preview and Adjustment
    • Preview the effect through the client, you can adjust the parameters and re-generate.

Tips for use

  • material requirement: Photographs need to be evenly lit and speech needs to be free of noise.
  • Multi-language support: set in the API parameters lang is the corresponding language code (e.g. "zh" for Chinese).
  • Developer Support: Reference src/main/service Under the code, customize the function.

caveat

  • The system needs to meet the 100GB C drive and 30GB D drive space requirements.
  • Make sure WSL is enabled before installing Docker.
  • 70GB of traffic is required to download the image. Stable WiFi is recommended.
CDN1
May not be reproduced without permission:Chief AI Sharing Circle " HeyGem: Silicon Intelligence's Open Source Heygen Digital Human Pantographs

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish