AI Personal Learning
and practical guidance

Kokoro WebGPU: A Text-to-Speech Service for Offline Operation in Browsers

General Introduction

Kokoro WebGPU is the WebGPU version of the Kokoro text-to-speech (TTS) model, provided by WebML Community on the Hugging Face platform. The project utilizes WebGPU technology to enable users to run efficient text-to-speech conversions locally in their browsers.WebGPU is a modern graphics and computation API that enables high-performance computational tasks in browsers.The Kokoro WebGPU model is designed to provide users with fast and reliable text-to-speech services for a variety of application scenarios, such as audiobooks, podcasts educational videos, etc.

Kokoro is an open source TTS model with 82 million parameters. Despite its lightweight architecture, Kokoro is comparable in speech synthesis quality to larger models, with speed and cost advantages.Kokoro is licensed under the Apache 2.0 license, which allows it to be freely deployed in a variety of environments, both in production and for personal projects.

Kokoro WebGPU: Text-to-Speech Service Running Natively in the Browser-1

 

Function List

  • Efficient Text-to-Speech: High-performance text-to-speech conversion using WebGPU technology.
  • The browser runs: No additional software to install, runs directly in your browser.
  • Multi-language support: Supports text-to-speech conversion in multiple languages.
  • real time response: Provides fast response times for real-time applications.
  • Open Source Community Support: Supported by WebML Community, users can participate in community discussions and development.

 

Using Help

Sample code to run in a browser

To use Kokoro in your browser, you first need to install the kokoro-js Coop.

npm install kokoro-js

The voice can then be generated using the following code:

import { KokoroTTS } from "kokoro-js".
const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX"; const tts = await KokoroTTS.from_pretrained(model_id, { dtype: "q8", // optional values: "fp32", "fp16", "q8", "q4", "q4f16" device: "wasm", // Optional values: "wasm", "webgpu" (web) or "cpu" (node). If using "webgpu", dtype="fp32" is recommended. }); const text = "Life is like a box of chocolates, you never know what you're going to get." ; const audio = await tts.generate(text, { // Use `tts.list_voices()` to list all available voices voice: "af_heart", { }); audio.save("audio.wav");

The above code runs in a browser environment and utilizes WebGPU technology for efficient speech synthesis.

Python Code

In a Python environment, you can use the kokoro library for speech synthesis.

# Install kokoro and soundfile
!pip install kokoro>=0.7.11 soundfile
# install espeak-ng for English OOD fallback and some non-English languages
!apt-get -qq -y install espeak-ng > /dev/null 2>&1

from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf

# Initialize pipeline
pipeline = KPipeline(lang_code='a') # 'a' for American English

text = '''
The sky is the color of a television set tuned to a channel with no signal.
"It's not like I'm using," Keith heard someone say as he squeezed through the crush of people at Chat's entrance." It's like my body has developed a massive drug deficiency."
It was a big city of sounds and jokes.Chatsubo was a bar for professional expats; you could drink there all week and not hear two words of Japanese.
'''

# Generate and save the audio
generator = pipeline(
    text, voice='af_heart', # Change voice
    speed=1, split_pattern=r'\n+'
)
for i, (gs, ps, audio) in enumerate(generator):
    print(i) # index
    print(gs) # character
    print(ps) # phoneme
    display(Audio(data=audio, rate=24000, autoplay=i==0))
    sf.write(f'{i}.wav', audio, 24000) # save each audio file

The above code runs in a Python environment, utilizing the kokoro The library enables text-to-speech conversion.

Experience it directly in your browser

You can experience Kokoro TTS directly in your browser without any installation. Please visit the link below:

https://huggingface.co/spaces/webml-community/kokoro-webgpu

Please note that the application is approximately 300+ megabytes and needs to be fully loaded before you can experience it. However, as an efficient TTS model that runs entirely in the browser, the wait is worth it.

Main function operation flow

text-to-speech

  1. input text: Enter the text to be converted to speech in the input box.
  2. Select Language: Select the language to be converted, e.g., English, French, Japanese, etc.
  3. operational model: Click the Run button and the model will perform text-to-speech conversion.
  4. View Results: The speech result will be displayed on the page and the user can play and download the generated speech file.

Featured Functions

  • Real-time conversion: Kokoro WebGPU utilizes WebGPU technology to achieve real-time text-to-speech conversion for application scenarios that require fast response.
  • Multi-language support: Supports text-to-speech conversion in multiple languages, users can choose different languages according to their needs.
  • Community Support: Powered by WebML Community, users can participate in community discussions and get technical support and updates.
CDN
May not be reproduced without permission:Chief AI Sharing Circle " Kokoro WebGPU: A Text-to-Speech Service for Offline Operation in Browsers

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish