AI Personal Learning
and practical guidance

Hibiki: a real-time speech translation model providing streaming translation that preserves native features

General Introduction

Hibiki is a high-fidelity real-time speech translation model developed by Kyutai Labs. Unlike traditional offline translators, Hibiki generates natural speech translations in the target language and provides text translations in real-time while the user is speaking. The model utilizes a multi-stream architecture that simultaneously processes the input speech stream and generates the target speech, ensuring consistent and accurate translation.Hibiki aligns the source and target speech and text through supervised training, and utilizes synthetic data generation techniques to ensure high-quality translations with limited real-world data.

Hibiki relies on supervised training of aligned source and target speech and text from the same speaker. Due to the insufficient amount of such data, we rely on synthetic data generation. Word-level matching between source and target transcripts is performed using a weakly supervised approach of contextual alignment using the off-the-shelf MADLAD machine translation system. The derived alignment rules (a word appears in the target language only when it can be predicted from the source language) are applied by inserting silence or synthesizing the target speech using voice-controlled, alignment-aware TTS.

Hibiki: a real-time speech translation model for high-fidelity streaming translation that preserves the characteristics of the original voice-1

 

Function List

  • real-time speech translation: Generate a natural speech translation of the target language in real time while the user is speaking.
  • text translation: Provides text translation synchronized with speech.
  • multistream architecture (computing): Simultaneously processes the input speech stream and generates the target speech to ensure coherent and accurate translation.
  • high fidelity: Ensure high quality of translation through supervised training and synthetic data generation techniques.
  • phonetic transference: Optional voice transfer function for a more natural translation voice.

 

Using Help

Installation process

PyTorch

  1. mounting moshi Package:
    pip install -U moshi
    
  2. Download the example file:
    wget https://github.com/kyutai-labs/moshi/raw/refs/heads/main/data/sample_fr_hibiki_crepes.mp3
    
  3. Run the translation:
    python -m moshi.run_inference sample_fr_hibiki_crepes.mp3 out_en.wav --hf-repo kyutai/hibiki-1b-pytorch-bf16
    
    • Optional parameters --cfg-coef The default value is 1. The higher the value, the closer the generated speech is to the original speech, and the recommended value is 3.

MLX

  1. mounting moshi_mlx package (requires at least version 0.2.1):
    pip install -U moshi_mlx
    
  2. Download the example file:
    wget https://github.com/kyutai-labs/moshi/raw/refs/heads/main/data/sample_fr_hibiki_crepes.mp3
    
  3. Run the translation:
    python -m moshi_mlx.run_inference sample_fr_hibiki_crepes.mp3 out_en.wav --hf-repo kyutai/hibiki-1b-mlx-bf16
    
    • Optional parameters --cfg-coef The default value is 1. The higher the value, the closer the generated speech is to the original speech, and the recommended value is 3.

MLX-Swift

  • kyutai-labs/moshi-swift The repository contains an implementation of MLX-Swift that runs on the iPhone and has been tested on the iPhone 16 Pro. Note that this code is still in the experimental phase.

Rust

  1. go into hibiki-rs Catalog:
    cd hibiki-rs
    
  2. Download the example file:
    wget https://github.com/kyutai-labs/moshi/raw/refs/heads/main/data/sample_fr_hibiki_crepes.mp3
    
  3. Run the translation:
    cargo run --features metal -r -- gen sample_fr_hibiki_crepes.mp3 out_en.wav
    
    • utilization --features cuda Run on an NVIDIA GPU or use the --features metal Runs on a Mac.

mould

We have released two models for French to English translation:

  • Hibiki 2B: For PyTorch and MLX with 16 RVQ streams.
  • Hibiki 1B: For PyTorch and MLX, with 8 RVQ streams, ideal for device-side reasoning.

Model List:

  • Hibiki 2B for PyTorch (bf16):kyutai/hibiki-2b-pytorch-bf16
  • Hibiki 1B for PyTorch (bf16):kyutai/hibiki-1b-pytorch-bf16
  • Hibiki 2B for MLX (bf16):kyutai/hibiki-2b-mlx-bf16
  • Hibiki 1B for MLX (bf16):kyutai/hibiki-1b-mlx-bf16

All models are released under a CC-BY 4.0 license.

Usage Process

  1. priming model: Follow the installation process to start the model.
  2. Input Voice: Inputs speech in the source language through the microphone.
  3. real time translation: Hibiki generates a real-time speech translation in the target language and displays the text translation simultaneously.
  4. Adjustment of settings: Adjust settings such as voice transfer as needed for a more natural translation.

Main Functions

  • real-time speech translation: After launching the model, input your voice directly through the microphone and Hibiki will automatically translate it.
  • text translationHibiki generates a text translation that is displayed in the interface at the same time as the voice translation.
  • phonetic transference: Enable the voice transfer function in the settings to make the translated voice more in line with the natural pronunciation of the target language.

Detailed Operation Procedure

  1. priming model: Start the model following the installation process to ensure that all dependencies have been installed correctly.
  2. Input Voice: Enter your voice in the source language through the microphone and Hibiki will automatically start translating.
  3. View translation results: View real-time generated speech and text translations in the target language on the interface.
  4. Adjustment of settings: Adjust features such as voice transfer in the settings as needed for optimal translation.

May not be reproduced without permission:Chief AI Sharing Circle " Hibiki: a real-time speech translation model providing streaming translation that preserves native features

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish