RealtimeSTT: Real-time Speech-to-Text Tool for Low-Latency Streaming Speech Recognition Based on Whisper

1.8K 00

General Introduction

RealtimeSTT is an efficient, low-latency real-time speech-to-text library with advanced speech activity detection and wake word activation. It was developed by Kolja Beigel to support applications that require fast and accurate speech-to-text transcription. Whether it's a voice assistant or an application that requires accurate speech transcription, RealtimeSTT provides excellent performance and ease of use.

RealtimeSTT：实时语音转文字工具，基于Whisper实现低延迟流式语音识别

Function List

Real-time speech to text: transcribe speech to text in real time for a variety of application scenarios.
Speech Activity Detection: Automatically detects when a user starts and stops speaking, improving transcription accuracy.
Wake-up word activation: Support wake-up word function, users can activate the system by specific words.
Low Latency: Ensure low latency in the speech-to-text process to enhance user experience.
Multi-Platform Support: Compatible with multiple operating systems and platforms for easy integration.
Open source code: Provide complete open source code for developers to carry out secondary development and customization.

Using Help

Installation process

Cloning Project Warehouse:

   git clone https://github.com/KoljaB/RealtimeSTT.git

Go to the project catalog:

   cd RealtimeSTT

Install the dependencies:

   pip install -r requirements.txt

(Optional) Install GPU support:

   pip install -r requirements-gpu.txt

Usage

Start the server

Start the speech-to-text server:

   stt-server

After the server starts, wait for the prompt "speak now".

Client Usage

Start the client and connect to the server:

stt

Once the client is launched, start talking and the system will transcribe the speech to text in real time.

Main function operation flow

real time speech to text

import (data) AudioToTextRecorder Class:

   from RealtimeSTT import AudioToTextRecorder

Defines functions that process text:

   def process_text(text):
print(text)

Starts the recording and processes the text:

   if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)

Voice Activity Detection

The system automatically detects when the user starts and stops talking, with no additional configuration required.

wake-up call activation

Configure the wake-up word function so that users can activate the system with specific words, please refer to the project documentation for specific configuration.

Detailed operation examples

Typing everything that is said

import (data) AudioToTextRecorder cap (a poem) pyautogui::

   from RealtimeSTT import AudioToTextRecorder
import pyautogui

Defines functions that process text:

   def process_text(text):
pyautogui.typewrite(text + " ")

Starts the recording and processes the text:

   if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)

The article is copyrighted and should not be reproduced without permission.

Flow (Laminar): a lightweight task engine for building intelligences that simplifies and flexibly manages tasks

Latest AI Resources # AI Java Open Source Projecct # Low-code workflow

8mos ago

01.4K

Refly: an AI writing platform based on process orchestration on a free canvas for automated article generation

Latest AI Resources # AI Writing # AI Java Open Source Projecct

6mos ago

01.8K

Extra: o1-mini has been fully opened to ChatGPT free account experience

AI News

11mos ago

01.4K

Google Releases AI Co-scientist, Gemini-powered Intelligent Research Assistant

AI News

6mos ago

01.1K

No comments

You must be logged in to leave a comment!

No comments...

RealtimeSTT: Real-time Speech-to-Text Tool for Low-Latency Streaming Speech Recognition Based on Whisper

General Introduction

Function List

Using Help

Installation process

Usage

Start the server

Client Usage

Main function operation flow

real time speech to text

Voice Activity Detection

wake-up call activation

Detailed operation examples

Typing everything that is said

Claude CEO's latest 10,000 word article is more rational and practical than Sam Altman!

Microsoft CEO's Bold Prediction, "AI Agent Will Replace All SaaS"

Related posts

Flow (Laminar): a lightweight task engine for building intelligences that simplifies and flexibly manages tasks

Refly: an AI writing platform based on process orchestration on a free canvas for automated article generation

Extra: o1-mini has been fully opened to ChatGPT free account experience

Google Releases AI Co-scientist, Gemini-powered Intelligent Research Assistant

No comments

Latest Collections

Latest Articles

RealtimeSTT: Real-time Speech-to-Text Tool for Low-Latency Streaming Speech Recognition Based on Whisper

General Introduction

Function List

Using Help

Installation process

Usage

Start the server

Client Usage

Main function operation flow

real time speech to text

Voice Activity Detection

wake-up call activation

Detailed operation examples

Typing everything that is said

Claude CEO's latest 10,000 word article is more rational and practical than Sam Altman!

Microsoft CEO's Bold Prediction, "AI Agent Will Replace All SaaS"

Related posts

Flow (Laminar): a lightweight task engine for building intelligences that simplifies and flexibly manages tasks

Refly: an AI writing platform based on process orchestration on a free canvas for automated article generation

Extra: o1-mini has been fully opened to ChatGPT free account experience

Google Releases AI Co-scientist, Gemini-powered Intelligent Research Assistant

No comments

Selected AI Tools

Latest Collections

Latest Articles