Kokoro: Efficient Speech Synthesis Models to Generate Natural and Smooth Speech

Latest AI Resources7mos agoupdate AI Sharing Circle

2.3K 00

General Introduction

Kokoro 82M is a highly efficient speech synthesis model provided by Hugging Face, designed to generate high-quality speech with fewer parameters and less data. The model has 82 million parameters, is released under the Apache 2.0 license, and supports multiple Voicepacks to generate speech in different styles and languages. kokoro-82M performs well in the TTS (Text-to-Speech) domain, especially in the Elo rankings, and is able to achieve Kokoro-82M has a strong performance in TTS (Text-to-Speech), especially in the Elo ranking, and can achieve high quality speech synthesis with less computational resources.

Kokoro wrapped API:Kokoro TTS API: Dockerized FastAPI wrapper for fast text-to-speech (Kokoro-82M model)

Experience: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

Function List

speech synthesis: Generate natural and smooth speech output.
Multiple voice pack support: A variety of voice packs are available and users can choose from different voice styles.
Efficient models: High-quality speech synthesis using fewer parameters and data.
open source license: Under the Apache 2.0 license, which allows free use and modification.
Community Support: A Discord server is available where users can discuss and give feedback in the community.

Using Help

Installation process

Installation of dependencies::

   git lfs install
git clone https://huggingface.co/hexgrad/Kokoro-82M
cd Kokoro-82M
apt-get -qq -y install espeak-ng > /dev/null 2>&1
pip install -q phonemizer torch transformers scipy munch

Build the model and load the default speech package::

   from models import build_model
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
MODEL = build_model('kokoro-v0_19.pth', device)
VOICE_NAME = 'af'  # 默认语音包
VOICEPACK = torch.load(f'voices/{VOICE_NAME}.pt', weights_only=True).to(device)
print(f'Loaded voice: {VOICE_NAME}')

Generate Speech::

   from kokoro import generate
text = "How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born."
audio, out_ps = generate(MODEL, text, VOICEPACK, lang=VOICE_NAME[0])
from IPython.display import display, Audio
display(Audio(data=audio, rate=24000, autoplay=True))

Instructions for use

Select Voice Pack: The Kokoro-82M offers a variety of voice packages that allow the user to select different voice styles as needed. The default voice pack is afThis can be done in the voices Find other voice packs in the folder.
Generate Speech: Use generate The function inputs text and generates speech. The generated speech is 24kHz and can be played via IPython display.
Adjustment parameters: Users can adjust model parameters and speech packages as needed to get the best speech synthesis results.