AI Personal Learning
and practical guidance

CrisperWhisper: Accurate Verbatim Speech Transcription Tool

General Introduction

CrisperWhisper is an advanced speech recognition tool based on OpenAI Whisper that focuses on fast, accurate and word-by-word speech transcription. It delivers accurate word-level timestamps, even in the presence of speech fills and pauses.CrisperWhisper improves timestamp accuracy by adjusting the tagger and customizing attention loss, and reduces transcription illusions to ensure that every pronounced word is accurately recorded.

 

Paper Summary

CrisperWhisper is an improved version of the Whisper speech recognition model, which, by adjusting the tokenizer and using the Dynamic Time Warping (DTW) algorithm, enables more accurate word-level timestamping, provides more fine-grained speech transcription, enhances the detection of pause and fill events in speech, and reduces the production of illusory ( It also reduces the generation of hallucinations.

summaries

CrisperWhisper is an enhancement based on the Whisper model designed to provide more accurate word-level timestamping and more granular speech transcription. The model improves timestamping accuracy by adjusting Whisper's tokenizer so that the DTW algorithm can more accurately align audio clips with words. This technique is particularly useful for capturing speech transcription across all pronunciations, which is important for clinical assessment of speech, analyzing the language planning process, and identifying indicators of cognitive load.CrisperWhisper also improves attention and noise adaptation to monophonic sound sources by training and counteracting noise, and has been tested on multiple benchmark datasets, demonstrating its use in speech recognition, segmentation, filler event detection, and illusory content reduction. In addition, the code for the model and a synthetic dataset with accurate word-level timestamps have been made available.

Viewpoints

  • Improved Tokenizer: CrisperWhisper improves the accuracy of timestamping by removing redundant spaces in the tokenizer and re-tagging specific words such as "uh" and "um" so that the DTW algorithm is able to more accurately align audio segments with words.
  • Anti-noise technology: The model improves adaptation to noise by including data from noisy and polyphonic sources during training, and reduces the generation of illusory content by introducing blank training samples.
  • Superior performance: CrisperWhisper has been tested on several benchmark datasets, including AMI Meeting Corpus, TED-LIUM, and LibriSpeech, and has demonstrated excellent word-level time stamping and speech recognition performance on these datasets.
  • Open Source Code and Data SetsThe code for the model and a synthesized speech dataset have been made open-source, which will help researchers and developers to further study and improve speech recognition techniques.
  • Reduction of virtual content: CrisperWhisper effectively reduces the generation of fictitious content through precise time stamping and specific processing of fictitious content, which is particularly important for improving the reliability of speech recognition systems.

 


 

Function List

  • Accurate word-level timestamps: Provides accurate timestamps even with speech fills and pauses.
  • verbatim transcription: Record each pronounced word verbatim, including fillers such as "um" and "ah".
  • filler word detection: Detect and accurately transcribe filler words.
  • Reduced hallucinations: Reducing transcriptional hallucinations and improving accuracy.
  • open source: The code is publicly available for easy viewing and use.

 

 

Using Help

Installation process

  1. environmental preparation::
    • Ensure that Python 3.7 and above is installed.
    • Install the necessary dependency libraries:pip install -r requirements.txtThe
  2. Download Code::
    • Clone a GitHub repository:git clone https://github.com/nyrahealth/CrisperWhisper.gitThe
  3. Running the application::
    • Go to the project catalog:cd CrisperWhisperThe
    • Run the application:python app.pyThe

Guidelines for use

  1. Basic use::
    • After opening the app, upload the audio file that needs to be transcribed.
    • Select the transcription mode (verbatim or standard transcription).
    • Click the "Start Transcription" button and wait for the transcription to complete.
  2. Advanced Features::
    • Timestamp adjustment: The precision of the timestamp can be adjusted in the settings.
    • filler word detection: Enables or disables filler word detection.
    • Export results: Once the transcription is complete, the results can be exported to a text file or other format.
  3. common problems::
    • inaccurate transcription: Ensure good audio quality and avoid background noise.
    • imprecise time stamp: Try adjusting the timestamp settings, or using a higher quality audio file.

typical example

  1. Example of verbatim transcription::
    Original audio: Well, I find this project very interesting.
    TRANSCRIPT RESULT: Well, I find this project very interesting.
    Timestamp: [0:00:01] Well, [0:00:02] I, [0:00:03] find, [0:00:04] this, [0:00:05] project, [0:00:06] very, [0:00:07] interesting.
    
  2. Example of filler word detection::
    Original audio: Well, I find this project very interesting.
    TRANSCRIPT RESULT: Well, I find this project very interesting.
    Filler word: [0:00:01] hmmm
    
May not be reproduced without permission:Chief AI Sharing Circle " CrisperWhisper: Accurate Verbatim Speech Transcription Tool

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish