IMS Toucan: Fast and Controllable Multilingual (7000+ languages supported) Text-to-Speech Tool

Latest AI Resources6mos agoupdate AI Sharing Circle

2.8K 00

General Introduction

IMS Toucan is a state-of-the-art text-to-speech (TTS) toolkit developed by the Institute for Natural Language Processing (IMS) at the University of Stuttgart, Germany. Supporting more than 7000 languages, the toolkit is fast, controllable, and low in computational resource requirements.IMS Toucan is designed to provide efficient speech synthesis solutions for research, teaching, and real-world applications. Users can train, use and teach state-of-the-art speech synthesis techniques with the toolkit, which also provides a rich set of functional modules and a flexible control interface that enables users to generate high-quality speech output on demand.

Demo: https://huggingface.co/spaces/Flux9665/MassivelyMultilingualTTS

Function List

Multi-language support: Supports text-to-speech synthesis in over 7000 languages.
fast synthesis: Efficient speech generation speed for real-time applications.
controllable: Users have precise control over the pitch, rhythm and timbre of their voice.
low computing power: Requires no significant computing resources to run and is suitable for a wide range of hardware environments.
Interactive Demo: An online demo is provided so that users can directly experience the speech synthesis function.
open source: A complete open source code base for easy secondary development and customization.
Pre-trained models: Provides pre-trained speech synthesis models that users can use directly or fine-tune further.

Using Help

Installation process

fundamental requirement: Python version 3.10 is recommended. Make sure to install the following dependencies: libsndfile1, espeak-ng, ffmpeg, libasound-dev, libportaudio2, libsqlite3-dev.
clone warehouse: Clone the IMS Toucan repository to a local machine (CUDA-enabled GPUs are recommended for model training; no GPUs are required for inference).

   git clone https://github.com/DigitalPhonetics/IMS-Toucan.git
cd IMS-Toucan

Creating a Virtual Environment: Create and activate a virtual environment to install basic dependencies.

   python -m venv <path_to_env>
source <path_to_env>/bin/activate
pip install --no-cache-dir -r requirements.txt

Run the demo script: Once the installation is complete, you can run the following script for demonstration purposes.

   python run_advanced_GUI_demo.py

Functional operation flow

text-to-speech: Enter the text in the interactive interface, select the language and voice parameters, and click the Generate button to generate the voice.
voice control: By dragging the pitch and duration sliders, users can precisely adjust the pitch and rhythm of the generated speech.
speech replacement: The user can change to a different speech model while keeping the speech parameters the same.
model training: Users can train new speech models using their own datasets, please refer to the training scripts in the repository and the documentation for instructions.

Featured Functions

Multi-language support: IMS Toucan supports more than 7,000 languages, allowing users to select different languages for speech synthesis as needed.
Efficient synthesisIMS Toucan can generate high-quality speech quickly, even in low-computing-resource environments.
Flexible control: The user can precisely control the parameters of the voice through the interactive interface to generate the voice output that meets the requirements.