CrisperWhisper: Accurate Verbatim Speech Transcription Tool

Latest AI Resources7mos agoupdate AI Sharing Circle

2.7K 00

General Introduction

CrisperWhisper is an advanced speech recognition tool based on OpenAI Whisper that focuses on fast, accurate and word-by-word speech transcription. It delivers accurate word-level timestamps, even in the presence of speech fills and pauses.CrisperWhisper improves timestamp accuracy by adjusting the tagger and customizing attention loss, and reduces transcription illusions to ensure that every pronounced word is accurately recorded.

Paper Summary
CrisperWhisper is an improved version of the Whisper speech recognition model, which, by adjusting the tokenizer and using the Dynamic Time Warping (DTW) algorithm, enables more accurate word-level timestamping, provides more fine-grained speech transcription, enhances the detection of pause and fill events in speech, and reduces the production of illusory ( It also reduces the generation of hallucinations.
summaries
CrisperWhisper is an enhancement based on the Whisper model designed to provide more accurate word-level timestamping and more granular speech transcription. The model improves timestamping accuracy by adjusting Whisper's tokenizer so that the DTW algorithm can more accurately align audio clips with words. This technique is particularly useful for capturing speech transcription across all pronunciations, which is important for clinical assessment of speech, analyzing the language planning process, and identifying indicators of cognitive load.CrisperWhisper also improves attention and noise adaptation to monophonic sound sources by training and counteracting noise, and has been tested on multiple benchmark datasets, demonstrating its use in speech recognition, segmentation, filler event detection, and illusory content reduction. In addition, the code for the model and a synthetic dataset with accurate word-level timestamps have been made available.
Viewpoints
Improved Tokenizer: CrisperWhisper improves the accuracy of timestamping by removing redundant spaces in the tokenizer and re-tagging specific words such as "uh" and "um" so that the DTW algorithm is able to more accurately align audio segments with words.
Anti-noise technology: The model improves adaptation to noise by including data from noisy and polyphonic sources during training, and reduces the generation of illusory content by introducing blank training samples.
Superior performance: CrisperWhisper has been tested on several benchmark datasets, including AMI Meeting Corpus, TED-LIUM, and LibriSpeech, and has demonstrated excellent word-level time stamping and speech recognition performance on these datasets.
Open Source Code and Data SetsThe code for the model and a synthesized speech dataset have been made open-source, which will help researchers and developers to further study and improve speech recognition techniques.
Reduction of virtual content: CrisperWhisper effectively reduces the generation of fictitious content through precise time stamping and specific processing of fictitious content, which is particularly important for improving the reliability of speech recognition systems.

Function List

Accurate word-level timestamps: Provides accurate timestamps even with speech fills and pauses.
verbatim transcription: Record each pronounced word verbatim, including fillers such as "um" and "ah".
filler word detection: Detect and accurately transcribe filler words.
Reduced hallucinations: Reducing transcriptional hallucinations and improving accuracy.
open source: The code is publicly available for easy viewing and use.

Using Help

Installation process

environmental preparation::
- Ensure that Python 3.7 and above is installed.
- Install the necessary dependency libraries:pip install -r requirements.txtThe
Download Code::
- Clone a GitHub repository:git clone https://github.com/nyrahealth/CrisperWhisper.gitThe
Running the application::
- Go to the project catalog:cd CrisperWhisperThe
- Run the application:python app.pyThe

Guidelines for use

Basic use::
- After opening the app, upload the audio file that needs to be transcribed.
- Select the transcription mode (verbatim or standard transcription).
- Click the "Start Transcription" button and wait for the transcription to complete.
Advanced Features::
- Timestamp adjustment: The precision of the timestamp can be adjusted in the settings.
- filler word detection: Enables or disables filler word detection.
- Export results: Once the transcription is complete, the results can be exported to a text file or other format.
common problems::
- inaccurate transcription: Ensure good audio quality and avoid background noise.
- imprecise time stamp: Try adjusting the timestamp settings, or using a higher quality audio file.

typical example

Example of verbatim transcription::

原音频：嗯，我觉得这个项目非常有趣。
转录结果：嗯，我觉得这个项目非常有趣。
时间戳：[0:00:01] 嗯，[0:00:02] 我，[0:00:03] 觉得，[0:00:04] 这个，[0:00:05] 项目，[0:00:06] 非常，[0:00:07] 有趣。

Example of filler word detection::

原音频：嗯，我觉得这个项目非常有趣。
转录结果：嗯，我觉得这个项目非常有趣。
填充词：[0:00:01] 嗯

The article is copyrighted and should not be reproduced without permission.

FramePainter: AI-powered doodle-style image editing tool

Latest AI Resources # AI image editing # AI Java Open Source Projecct # AI Doodle Generation Painting

7mos ago

01.7K

Voicepanel: the research tool where AI automatically collects and analyzes customer feedback

Latest AI Resources # AI Marketing

5mos ago

01.2K

WeWe RSS: open source tool for generating WeChat public RSS feeds

Latest AI Resources # AI Java Open Source Projecct

5mos ago

02.2K

NVIDIA PDF to Podcast：设置引导提示词将PDF转换为播客的AI工具

NVIDIA PDF to Podcast: AI Tool for Converting PDF to Podcast by Setting Guiding Prompts

AI News # AI Java Open Source Projecct # AI text-to-speech

6mos ago

01.1K

No comments

You must be logged in to leave a comment!

No comments...

CrisperWhisper: Accurate Verbatim Speech Transcription Tool

General Introduction

Paper Summary

summaries

Viewpoints

Function List