AI Personal Learning
and practical guidance

Fish Speech: Fast and Highly Accurate Cloning of English and Chinese Speech Using Few Samples

General Introduction

Fish Speech is an open source text-to-speech (TTS) synthesis tool developed by Fish Audio. The tool is based on cutting-edge AI technologies such as VQ-GAN, Llama, and VITS, and is capable of converting text into realistic speech.Fish Speech not only supports multiple languages, but also provides an efficient speech synthesis solution for a variety of application scenarios, such as voice-over, voice assistants, and accessible reading.

Voice cloning project FishSpeech 1.5 is updated ~ similar to what I shared before such as F5-TTS, MaskGCT FishSpeech is a voice cloning program that requires only 5-10 seconds of voice samples to highly reproduce a person's voice characteristics, and supports multiple language interchanges such as Chinese, English, Japanese, and Korean.


An open source Fish Speech v1.5.0 Optimized One Piece Integration Pack has been provided.

 

Fish Speech: an efficient tool for synthesizing sample less speech clones-1

Experience it online at https://fish.audio/zh-CN/

 

Fish Speech: an efficient tool for synthesizing sample less speech clones-1

Recommended 30-second audio

 

 

Function List

  • Multi-language support: Supports text-to-speech conversion in multiple languages.
  • Efficient synthesis: Efficient speech synthesis based on VQ-GAN, Llama and VITS.
  • open source project: The code is open source and users can download and use it freely.
  • Online Demo: Provide online demo function, users can directly experience the effect of speech synthesis.
  • Model Download: Support for downloading pre-trained models from the Hugging Face platform.

 

Using Help

Installation process

system requirements

  • GPU Memory: 4GB (for reasoning), 8GB (for fine-tuning)
  • systems: Linux, Windows

Windows Configuration

professional user
  • Consider using WSL2 or Docker to run the codebase.
non-professional user
  1. Unzip the project zipThe
  2. strike (on the keyboard) install_env.bat installation environmentThe
    • You can decide whether or not to use the mirror download by editing the USE_MIRROR entry in install_env.bat.
      • USE_MIRROR=false Use the original site to download the latest stable version of the torch environment.
      • USE_MIRROR=true Use the mirror site to download the latest torch environment (default).
    • You can decide whether to enable the compilable environment download by editing the INSTALL_TYPE entry of install_env.bat.
      • INSTALL_TYPE=preview Download the development version of the compilation environment.
      • INSTALL_TYPE=stable Download the stable version without the compilation environment.
  3. If step 2 INSTALL_TYPE=previewIf you do not want to use this step, then perform this step (which can be skipped; this step activates the compilation modeling environment).
    • Download the LLVM compiler:
    • After downloading LLVM-17.0.6-win64.exe, double-click it to install it, choose a suitable installation location, and check Add Path to Current User to add environment variables.
  4. Download and install Microsoft Visual C++ Redistributable Packageto solve the potential .dll loss problem.
  5. Download and install Visual Studio Community Editionto get the MSVC++ compilation tool to resolve LLVM header file dependencies.
    • Visual Studio Download
    • After installing the Visual Studio Installer, download Visual Studio Community 2022.
    • Click on the Modify button, find the Desktop Development using C++ item and check Download.
  6. download and install CUDA Toolkit 12The
  7. double-click start.bat Open the Training Reasoning WebUI administration interface. If necessary, modify API_FLAGS as indicated below.
    • Want to start the reasoning WebUI interface? Edit API_FLAGS.txt in the project root directory and change the first three lines to the following format:
      --infer
      # --api
      # --listen ...
      
    • Want to start the API server? Edit API_FLAGS.txt in the root directory of your project and change the first three lines to the following format:
      # --infer
      --api
      --listen ...
      
  8. double-click run_cmd.bat Enter the conda/python command line environment for this projectThe

Linux Configuration

  1. Create a python 3.10 virtual environmentYou can also use virtualenv:
    conda create -n fish-speech python=3.10
    conda activate fish-speech
    
  2. Installing pytorch::
    pip3 install torch torchvision torchaudio
    
  3. Install fish-speech::
    pip3 install -e . [stable]
    
  4. (Ubuntu / Debian users) Install sox::
    apt install libsox-dev
    

Docker Configuration

  1. Installing the NVIDIA Container Toolkit::
    • For Ubuntu users:
      curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit- keyring.gpg \
          && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
              sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
              sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
      sudo apt-get update
      sudo apt-get install -y nvidia-container-toolkit
      sudo systemctl restart docker
      
    • For users with other Linux distributions, please refer to: NVIDIA Container Toolkit Install-guide for installation instructions.
  2. Pull and run the fish-speech image::
    docker pull lengyue233/fish-speech
    docker run -it \
        --name fish-speech \
        --gpus all \
        --p 7860:7860 \
        lengyue233/fish-speech \
        zsh
    
    • If you need to use a different port, modify the -p parameters are YourPort:7860The
  3. Download Model Dependencies::
    • Make sure you are in a terminal within the docker container before downloading the required vqgan and llama models from our huggingface repository:
      huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
      
    • For users in mainland China, it can be downloaded through the mirror site:
      HF_ENDPOINT=https://hf-mirror.com huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
      
  4. To configure environment variables, access the WebUI::
    • In a terminal inside the docker container, type:
      export GRADIO_SERVER_NAME="0.0.0.0"
      
    • Next, in the terminal inside the docker container, type:
      python tools/webui.py
      
    • If WSL or MacOS, access the http://localhost:7860 The WebUI interface opens.
    • If deployed on a server, replace the localhost is your server IP.

 

Fish Audio One-Click Installer

Chief AI Sharing CircleThis content has been hidden by the author, please enter the verification code to view the content
Captcha:
Please pay attention to this site WeChat public number, reply "CAPTCHA, a type of challenge-response test (computing)", get the verification code. Search in WeChat for "Chief AI Sharing Circle"or"Looks-AI"or WeChat scanning the right side of the QR code can be concerned about this site WeChat public number.

May not be reproduced without permission:Chief AI Sharing Circle " Fish Speech: Fast and Highly Accurate Cloning of English and Chinese Speech Using Few Samples

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish