AI Personal Learning
and practical guidance

Parler-TTS: Generating speaker-specific text-to-speech models from input text

General Introduction

Parler-TTS is an open-source text-to-speech (TTS) modeling library developed by Hugging Face to generate high-quality, natural-sounding speech. The model is capable of generating speech with specific speaker styles (e.g. gender, pitch, speaking style, etc.) based on input text.Parler-TTS is based on the research results in the paper "Natural language guidance of high-fidelity text-to-speech with synthetic annotations" and is completely open source. Parler-TTS is based on the research results in the paper "Natural language guidance of high-fidelity text-to-speech with synthetic annotations", and is completely open source, with all datasets, preprocessing, training code, and weights publicly available, allowing the community to develop and improve upon them.

Parler-TTS: Generating Speaker-Specific Text-to-Speech Models from Input Text-1


 

Function List

  • High-quality speech generation: Generate natural and smooth speech with support for multiple speaker styles.
  • open source: All code and model weights are publicly available for community development and improvement.
  • Lightweight dependencies: Simple to install and use, with few dependencies.
  • Multiple model versions: Versions of the model with different parameter counts are available, e.g. Parler-TTS Mini and Parler-TTS Large.
  • Quick Generation: Optimized generation speed with support for SDPA and Flash Attention 2.
  • Data sets and weights: Provides rich datasets and pre-trained model weights for easy training and fine-tuning.

 

Using Help

Installation process

  1. Ensure that the Python environment is installed.
  2. Use the following command to install the Parler-TTS library:
   pip install git+https://github.com/huggingface/parler-tts.git
  1. For Apple Silicon users, run the following command to support bfloat16:
   pip3 install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Usage

Generate randomized speech

  1. Import the necessary libraries:
   import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
  1. Load models and disambiguators:
   device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")
  1. Enter text and generate speech:
   prompt = "Hey, how are you doing today?"
description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch."
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, description=description)
sf.write("output.wav", outputs.cpu().numpy(), 22050)

Generate speech in a specific speaker style

  1. Descriptions that use a particular speaker's style:
   description = "A male speaker with a deep voice and slow pace."
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, description=description)
sf.write("output_specific.wav", outputs.cpu().numpy(), 22050)

training model

  1. Download and prepare the dataset.
  2. Use the provided training code for model training:
   python train.py --dataset_path /path/to/dataset --output_dir /path/to/output

Optimized reasoning

  1. Optimized with SDPA and Flash Attention 2:
   model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1", use_flash_attention=True).to(device)
CDN
May not be reproduced without permission:Chief AI Sharing Circle " Parler-TTS: Generating speaker-specific text-to-speech models from input text

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish