Retrieval based Voice Conversion WebUI: A Framework for Retrieval-based Voice Conversion | Simulating Real-life Singing Voices

Latest AI Resources10mos agorelease AI Sharing Circle

2.1K 00

General Introduction

Retrieval based Voice Conversion WebUI is a simple and easy-to-use voice conversion framework based on VITS, which can realize voice conversion between any speakers, including song covers and real-time voice changing. It features low latency, excellent voice conversion effect, small amount of data training, etc. It supports N-card, A-card and I-card acceleration, provides web interface and real-time voice conversion interface, and can also call UVR5 model to quickly separate the human voice from the accompaniment, and use the most advanced human voice pitch extraction algorithm, RMVPE, to eliminate the problem of mute voice.

colab online experience

The bottom model is trained using close to 50 hours of open-source, high-quality VCTK training set, with no copyright concerns.
Look for RVCv3's bottom model with bigger parameters, bigger data, better results, essentially equal inference speed, and less training data required.

Retrieval based Voice Conversion WebUI：基于检索的语音转换框架|模拟真人歌声

Training Reasoning Interface

Real-time voice change interface

Function List

Train your own speech conversion model with as little as 10 minutes of speech data
Supports multiple sample rates and tones using pre-trained speech conversion models
Speech conversion using a web interface or a real-time voice-altering interface with end-to-end low latency support
Uses UVR5 modeling to separate vocals and backing tracks, supports multiple audio file formats
Use RMVPE algorithm to extract vocal pitch, support pytorch/onnx/DirectML

Using Help

Download or clone this repository and install the required dependencies and pre-models
Run go-web.bat or go-realtime-gui.bat and select the action you want to perform
According to the interface prompts, select the input and output voice files or devices, adjust parameters and options
Click start or stop and enjoy voice conversion!