AI Personal Learning
and practical guidance
讯飞绘镜

Fish Speech: Fast and Highly Accurate Cloning of English and Chinese Speech Using Few Samples

General Introduction

Fish Speech is an open source text-to-speech (TTS) synthesis tool developed by Fish Audio. The tool is based on cutting-edge AI technologies such as VQ-GAN, Llama, and VITS, and is capable of converting text into realistic speech.Fish Speech not only supports multiple languages, but also provides an efficient speech synthesis solution for a variety of application scenarios, such as voice-over, voice assistants, and accessible reading.

Voice cloning project FishSpeech 1.5 updated ~ similar to the previous one I shared for example F5-TTS , MaskGCT FishSpeech is a voice cloning program that requires only 5-10 seconds of voice samples to highly reproduce a person's voice characteristics, and supports multiple language interchanges such as Chinese, English, Japanese, and Korean.


An open source Fish Speech v1.5.0 Optimized One Piece Integration Pack has been provided.

Fish Speech:高效的少样本语音克隆合成工具-1

Experience it online at https://fish.audio/zh-CN/

 

Fish Speech:高效的少样本语音克隆合成工具-1

Recommended 30-second audio

 

Function List

  • Multi-language support: Supports text-to-speech conversion in multiple languages.
  • Efficient synthesis: Efficient speech synthesis based on VQ-GAN, Llama and VITS.
  • open source project: The code is open source and users can download and use it freely.
  • Online Demo: Provide online demo function, users can directly experience the effect of speech synthesis.
  • Model Download: Support for downloading pre-trained models from the Hugging Face platform.

 

Using Help

Installation process

system requirements

  • GPU Memory: 4GB (for reasoning), 8GB (for fine-tuning)
  • systems: Linux, Windows

Windows Configuration

professional user
  • Consider using WSL2 or Docker to run the codebase.
non-professional user
  1. Unzip the project zipThe
  2. strike (on the keyboard) install_env.bat installation environmentThe
    • You can decide whether or not to use the mirror download by editing the USE_MIRROR entry in install_env.bat.
      • USE_MIRROR=false Use the original site to download the latest stable version of the torch environment.
      • USE_MIRROR=true Use the mirror site to download the latest torch environment (default).
    • You can decide whether to enable the compilable environment download by editing the INSTALL_TYPE entry of install_env.bat.
      • INSTALL_TYPE=preview Download the development version of the compilation environment.
      • INSTALL_TYPE=stable Download the stable version without the compilation environment.
  3. If step 2 INSTALL_TYPE=previewIf you do not want to use this step, then perform this step (which can be skipped; this step activates the compilation modeling environment).
    • Download the LLVM compiler:
    • After downloading LLVM-17.0.6-win64.exe, double-click it to install it, choose a suitable installation location, and check Add Path to Current User to add environment variables.
  4. Download and install Microsoft Visual C++ Redistributable Packageto solve the potential .dll loss problem.
  5. Download and install Visual Studio Community Editionto get the MSVC++ compilation tool to resolve LLVM header file dependencies.
    • Visual Studio Download
    • After installing the Visual Studio Installer, download Visual Studio Community 2022.
    • Click on the Modify button, find the Desktop Development using C++ item and check Download.
  6. download and install CUDA Toolkit 12The
  7. double-click start.bat Open the Training Reasoning WebUI administration interface. If necessary, modify API_FLAGS as indicated below.
    • Want to start the reasoning WebUI interface? Edit API_FLAGS.txt in the project root directory and change the first three lines to the following format:
      --infer
      # --api
      # --listen ...
      
    • Want to start the API server? Edit API_FLAGS.txt in the root directory of your project and change the first three lines to the following format:
      # --infer
      --api
      --listen ...
      
  8. double-click run_cmd.bat Enter the conda/python command line environment for this projectThe

Linux Configuration

  1. Create a python 3.10 virtual environmentYou can also use virtualenv:
    conda create -n fish-speech python=3.10
    conda activate fish-speech
    
  2. Installing pytorch::
    pip3 install torch torchvision torchaudio
    
  3. Install fish-speech::
    pip3 install -e .[stable]
    
  4. (Ubuntu / Debian users) Install sox::
    apt install libsox-dev
    

Docker Configuration

  1. Installing the NVIDIA Container Toolkit::
    • For Ubuntu users:
      curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
      sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
      sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
      sudo apt-get update
      sudo apt-get install -y nvidia-container-toolkit
      sudo systemctl restart docker
      
    • For users with other Linux distributions, please refer to: NVIDIA Container Toolkit Install-guide for installation instructions.
  2. Pull and run the fish-speech image::
    docker pull lengyue233/fish-speech
    docker run -it \
    --name fish-speech \
    --gpus all \
    -p 7860:7860 \
    lengyue233/fish-speech \
    zsh
    
    • If you need to use a different port, modify the -p parameters are YourPort:7860The
  3. Download Model Dependencies::
    • Make sure you are in a terminal within the docker container before downloading the required vqgan and llama models from our huggingface repository:
      huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
      
    • For users in mainland China, it can be downloaded through the mirror site:
      HF_ENDPOINT=https://hf-mirror.com huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
      
  4. To configure environment variables, access the WebUI::
    • In a terminal inside the docker container, type:
      export GRADIO_SERVER_NAME="0.0.0.0"
      
    • Next, in the terminal inside the docker container, type:
      python tools/webui.py
      
    • If WSL or MacOS, access the http://localhost:7860 The WebUI interface opens.
    • If deployed on a server, replace the localhost is your server IP.

 

Fish Audio One-Click Installer

首席AI分享圈This content has been hidden by the author, please enter the verification code to view the content
Captcha:
Please pay attention to this site WeChat public number, reply "CAPTCHA, a type of challenge-response test (computing)", get the verification code. Search in WeChat for "Chief AI Sharing Circle"or"Looks-AI"or WeChat scanning the right side of the QR code can be concerned about this site WeChat public number.

May not be reproduced without permission:Chief AI Sharing Circle " Fish Speech: Fast and Highly Accurate Cloning of English and Chinese Speech Using Few Samples
en_USEnglish