WeClone: training digital doppelgangers with WeChat chats and voices

Latest AI Resources4mos agorelease AI Sharing Circle

1.4K 00

General Introduction

WeClone is an open-source project that lets users create personalized digital doppelgängers by using WeChat chat logs and voice messages, combined with large language models and speech synthesis technology. The project can analyze a user's chat habits to train the model, and can also generate realistic voice clones with a small number of voice samples. Eventually, the digital doppelganger can be bound to a WeChat bot, enabling automatic replies to text and voice. This tool is suitable for people who want to use an AI assistant on WeChat or learn AI technology. The code is completely public and has attracted a lot of tech enthusiasts and developers to participate.

Function List

Chat log training: Fine-tuning macro-language models to mimic user speech with micro-channel chat transcripts.
High-quality speech cloning: Generate sounds with a similarity of up to 951 TP3T using a 0.5B parametric model and a 5-second speech sample.
WeChat Robot Binding: Connect the digital split to WeChat to support automatic text and voice replies.
Data preprocessing tools: Provide scripts to convert chat logs to training data, filtering sensitive information by default.
Model Personalization Optimization: Supports LoRA fine-tuning technology to make the model more compatible with the user's language.

Using Help

WeClone requires some technical skills, such as Python and Git, but here's a detailed step-by-step guide to help you get started, from installation to use.

Installation process

cloning project
Open a terminal and type:

git clone https://github.com/xming521/WeClone.git

Then go to the project directory:

cd WeClone

Setting up the environment
Python 3.9 is recommended for this project. uv management environment, the installation command is as follows:

uv venv .venv --python=3.9
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

Installation of dependencies
After activating the environment, run it:

uv pip install --group main -e .

Note: This does not include voice cloning of xcodec Dependency, can be installed separately if needed.

Download model
The ChatGLM3-6B model is used by default. It can be downloaded using Hugging Face:

git lfs install
git clone https://huggingface.co/THUDM/chatglm3-6b

If the download is slow, use the Magic Hitch community:

export USE_MODELSCOPE_HUB=1  # Windows 用 set
git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git

Magic Match model needs to be replaced modeling_chatglm.py for the Hugging Face version.

Data preparation

Export Chat Logs

downloading PyWxDump, decrypting the microsoft database.
Click "Chat Backup" and select CSV format to export your contacts or group chats.
Place the exported wxdump_tmp/export/csv Folder into ./data/csvThe
The example data is in the data/example_chat.csvThe

Processing data
Run the script to JSON format:

./make_dataset/csv_to_json.py

Default filtering of sensitive information such as cell phone numbers and ID numbers. It is possible to filter sensitive information in the blocked_words.json Adding a banned word removes the entire sentence containing the banned word.

Model fine-tuning

Configuration parameters

compiler settings.json, specify the model path.
align per_device_train_batch_size cap (a poem) gradient_accumulation_steps Adaptation of video memory.
Adjustment to data volume num_train_epochs(rounds) and lora_rank, avoiding overfitting.

one-card training

python src/train_sft.py

The authors trained with 20,000 pieces of data and the loss dropped to 3.5 with good results.

Doka training
Install DeepSpeed:

uv pip install deepspeed

Run multi-card training (replace "Number of graphics cards used"):

deepspeed --num_gpus=使用显卡数量 src/train_sft.py

voice cloning

Prepare a 5+ second WeChat voice message to put in WeClone-audio Folder.
Run the relevant scripts (requires installation) xcodec), the generated sound is saved in that folder.
The voice retains intonation and emotion, with a similarity of up to 951 TP3T.

Binding WeChat Robot

Deployment of AstrBot

Download and install AstrBot (supports WeChat, QQ and other platforms).
Configure the messaging platform (e.g., WeChat).

Starting the API Service

python src/api_service.py

The default address is http://172.17.0.1:8005/v1The

Configuring AstrBot

Add a service provider and select OpenAI for Type.
API Base URL for local address, model for gpt-3.5-turboThe
Close tool call: send command /tool off reminderThe
Set the system prompt word consistent with that used for fine-tuning.

hardware requirement

Default fine-tuning with ChatGLM3-6B and LoRA requires 16GB of video memory.
Other options:
QLoRA (4-bit precision): 6GB (7B model) to 48GB (70B model).
Full parametric trim (16-bit): 60GB (7B) to 600GB (70B).
GPU is recommended, lack of video memory can be adjusted to lower precision or use a multi-card.

Tips for use

The amount of data is at least 2000 entries, the higher the quality the better.
Speech samples should be clear and avoid background noise.
Test model available web_demo.py maybe test_model.pyThe

Once done, your digital doppelganger will be able to automatically chat and voice reply on WeChat, with results very close to my own.

application scenario

Daily Assistant
The digital split can help you respond to WeChat messages, such as automatically responding to a friend's greeting when you're busy.
Technology Practice
Developers can use it to learn big language modeling and speech cloning techniques, and the code is open source for hands-on experimentation.
Fun Interactive
Let the digital doppelganger chat with your friends in your voice and tone for added entertainment.

QA

What happens when the amount of data is low?
Less than 2000 data may lead to inaccurate modeling, and it is recommended to prepare more conversations and clear speech.
Can a regular computer run it?
Requires a GPU with 16GB of video memory, which may not be possible on a regular computer, so try it in low-precision mode.
How similar are the sound clones?
The voices generated with the 5-second samples had a similarity of 951 TP3T, with natural intonation and emotion.