AI Personal Learning
and practical guidance

PengChengStarling: Smaller and Faster Multilingual Speech-to-Text Tool than Whisper-Large v3

Post was updated on 2025-01-30 23:28, Part of the content is time-sensitive.

General Introduction

PengChengStarling (PengCheng Labs) is a multilingual Automatic Speech Recognition (ASR) tool capable of converting speech in different languages into corresponding text. Developed based on the icefall project, this toolkit provides a complete speech recognition process including data processing, model training, inference, fine-tuning, and deployment. pengChengStarling supports streaming speech recognition in eight languages, including Chinese, English, Russian, Vietnamese, Japanese, Thai, Indonesian, and Arabic. Its main application scenarios include voice assistants, translation tools, subtitle generation and voice search. The model size is 20% of Whisper-Large v3, and the inference speed is 7 times faster than Whisper-Large v3.

It is characterized by its ability to handle multilingual voice input within a unified framework, support for real-time speech recognition, recognition while speaking, and can be used for international conference recording to text, automatic generation of subtitles for multilingual videos, and cross-language customer service systems.

PengChengStarling: a multilingual speech-to-text tool that supports speech recognition in multiple languages-1

 

Function List

  • Data Processing: Supports preprocessing of multiple datasets to generate the desired input format.
  • Model training: provides flexible training configurations to support multilingual speech recognition tasks.
  • Reasoning: efficient reasoning speed with support for streaming speech recognition.
  • Fine-tuning: Supports the fine-tuning of models to fit specific task requirements.
  • Deployment: provides models in PyTorch and ONNX formats for easy deployment.

 

Using Help

Installation process

  1. Cloning Project Warehouse:
   git clone https://github.com/yangb05/PengChengStarling
cd PengChengStarling
  1. Install the dependencies:
   pip install -r requirements.txt
export PYTHONPATH=/tmp/PengChengStarling:$PYTHONPATH

Data preparation

Before starting the training process, the raw data first needs to be preprocessed into the desired input format. Typically, this involves adapting thezipformer/prepare.pyhit the nail on the headmake_*_listmethod to generate thedata.listFile. Upon completion, the script will generate the corresponding cuts and fbank features for each dataset, which will be used as input data for PengChengStarling.

model training

  1. Configure the training parameters: in theconfig_traindirectory to configure the parameters required for training.
  2. Initiate training:
   . /train.sh

inference

  1. Prepare inference data: preprocess the data into the desired format.
  2. Initiate reasoning:
   . /eval.sh

trimming

  1. Prepare fine-tuned data: preprocesses the data into the desired format.
  2. Initiate fine-tuning:
   . /train.sh --finetune

deployments

PengChengStarling provides models in two formats: PyTorch state dictionary and ONNX format. You can choose the appropriate format for deployment according to your needs.


May not be reproduced without permission:Chief AI Sharing Circle " PengChengStarling: Smaller and Faster Multilingual Speech-to-Text Tool than Whisper-Large v3

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish