MockingBird: Fast Voice Cloning and Model Training, Text-to-Speech based on xtts v2 Implementation

Latest AI Resources8mos agoupdate AI Sharing Circle

2.2K 00

General Introduction

MockingBird is an open source project that aims to achieve fast speech cloning and text-to-speech through AI technology. Users only need to provide 5 seconds of voice samples to generate any voice content. The project supports a variety of Chinese datasets and runs well on Windows and Linux systems.MockingBird uses the PyTorch framework and provides easy-to-use tools and detailed installation guides for developers and researchers.

MockingBird：快速克隆声音与模型训练，基于 xtts v2 实现的文本转语音

Function List

Speech Cloning: Generate arbitrary speech content from 5-second speech samples
Text-to-speech: Input text to generate corresponding speech
Multi-language support: Supports Mandarin and multiple Chinese datasets
Cross-platform operation: compatible with Windows and Linux systems
Real-time processing: provides real-time speech generation
Open source code: the code is open to facilitate secondary development and research

Using Help

Installation process

environmental preparation::
- Install Python 3.7 or later.
- Install PyTorch (version 1.9.0 recommended).
- Install ffmpeg.
Download Project::
- Open the MockingBird project address, click the green "Code" button and select "Download ZIP" to download the project file.
- Or use the git command to download it:git clone https://github.com/babysor/MockingBird.git
Installation of dependencies::
- Go to the project directory and run pip install -r requirements.txt Install the necessary Python packages.
- If desired, you can use conda to create a virtual environment and install dependencies:conda env create -n env_name -f env.yml, and then activate the environment:conda activate env_nameThe
phonetic transcription model

In order to reduce the size of the main file does not contain the sound to sound model, if you need to download separately, click to go toDownload model (3G)

Usage Process

Running the Toolbox::
- (of a computer) run demo_toolbox.pyto open the Toolbox screen.
- Select the speech sample file in the toolbox, enter the text content and click the Generate button to generate the corresponding speech file.
training model::
- If you need to train your own model, you can follow the training tutorial in the program.
- Download and prepare the training dataset, run train.py Start training.
- Chinese help file for training models
remote call::
- MockingBird provides a web server function that allows you to use the generated speech results by remote invocation.
- Configure and start the web server to be called using the API interface.

common problems

installation failure: Ensure that your version of Python meets the requirements and that you are aware of version compatibility when installing PyTorch.
voice quality: The quality of speech samples and the richness of the training dataset affect the effectiveness of the generated speech, and it is recommended to use high-quality speech samples and diverse datasets for training.