General Introduction
Kokoro WebGPU is the WebGPU version of the Kokoro text-to-speech (TTS) model, provided by WebML Community on the Hugging Face platform. The project utilizes WebGPU technology to enable users to run efficient text-to-speech conversions locally in their browsers.WebGPU is a modern graphics and computation API that enables high-performance computational tasks in browsers.The Kokoro WebGPU model is designed to provide users with fast and reliable text-to-speech services for a variety of application scenarios, such as audiobooks, podcasts educational videos, etc.
Kokoro is an open source TTS model with 82 million parameters. Despite its lightweight architecture, Kokoro is comparable in speech synthesis quality to larger models, with speed and cost advantages.Kokoro is licensed under the Apache 2.0 license, which allows it to be freely deployed in a variety of environments, both in production and for personal projects.
Function List
- Efficient Text-to-Speech: High-performance text-to-speech conversion using WebGPU technology.
- The browser runs: No additional software to install, runs directly in your browser.
- Multi-language support: Supports text-to-speech conversion in multiple languages.
- real time response: Provides fast response times for real-time applications.
- Open Source Community Support: Supported by WebML Community, users can participate in community discussions and development.
Using Help
Sample code to run in a browser
To use Kokoro in your browser, you first need to install the kokoro-js
Coop.
npm install kokoro-js
The voice can then be generated using the following code:
import { KokoroTTS } from "kokoro-js".
const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
dtype: "q8", // optional values: "fp32", "fp16", "q8", "q4", "q4f16"
device: "wasm", // Optional values: "wasm", "webgpu" (web) or "cpu" (node). If using "webgpu", dtype="fp32" is recommended.
});
const text = "Life is like a box of chocolates, you never know what you're going to get." ;
const audio = await tts.generate(text, {
// Use `tts.list_voices()` to list all available voices
voice: "af_heart", {
});
audio.save("audio.wav");
The above code runs in a browser environment and utilizes WebGPU technology for efficient speech synthesis.
Python Code
In a Python environment, you can use the kokoro
library for speech synthesis.
# Install kokoro and soundfile
!pip install kokoro>=0.7.11 soundfile
# install espeak-ng for English OOD fallback and some non-English languages
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
# Initialize pipeline
pipeline = KPipeline(lang_code='a') # 'a' for American English
text = '''
The sky is the color of a television set tuned to a channel with no signal.
"It's not like I'm using," Keith heard someone say as he squeezed through the crush of people at Chat's entrance." It's like my body has developed a massive drug deficiency."
It was a big city of sounds and jokes.Chatsubo was a bar for professional expats; you could drink there all week and not hear two words of Japanese.
'''
# Generate and save the audio
generator = pipeline(
text, voice='af_heart', # Change voice
speed=1, split_pattern=r'\n+'
)
for i, (gs, ps, audio) in enumerate(generator):
print(i) # index
print(gs) # character
print(ps) # phoneme
display(Audio(data=audio, rate=24000, autoplay=i==0))
sf.write(f'{i}.wav', audio, 24000) # save each audio file
The above code runs in a Python environment, utilizing the kokoro
The library enables text-to-speech conversion.
Experience it directly in your browser
You can experience Kokoro TTS directly in your browser without any installation. Please visit the link below:
https://huggingface.co/spaces/webml-community/kokoro-webgpu
Please note that the application is approximately 300+ megabytes and needs to be fully loaded before you can experience it. However, as an efficient TTS model that runs entirely in the browser, the wait is worth it.
Main function operation flow
text-to-speech
- input text: Enter the text to be converted to speech in the input box.
- Select Language: Select the language to be converted, e.g., English, French, Japanese, etc.
- operational model: Click the Run button and the model will perform text-to-speech conversion.
- View Results: The speech result will be displayed on the page and the user can play and download the generated speech file.
Featured Functions
- Real-time conversion: Kokoro WebGPU utilizes WebGPU technology to achieve real-time text-to-speech conversion for application scenarios that require fast response.
- Multi-language support: Supports text-to-speech conversion in multiple languages, users can choose different languages according to their needs.
- Community Support: Powered by WebML Community, users can participate in community discussions and get technical support and updates.