General Introduction
RealtimeSTT is an efficient, low-latency real-time speech-to-text library with advanced speech activity detection and wake word activation. It was developed by Kolja Beigel to support applications that require fast and accurate speech-to-text transcription. Whether it's a voice assistant or an application that requires accurate speech transcription, RealtimeSTT provides excellent performance and ease of use.
Function List
- Real-time speech to text: transcribe speech to text in real time for a variety of application scenarios.
- Speech Activity Detection: Automatically detects when a user starts and stops speaking, improving transcription accuracy.
- Wake-up word activation: Support wake-up word function, users can activate the system by specific words.
- Low Latency: Ensure low latency in the speech-to-text process to enhance user experience.
- Multi-Platform Support: Compatible with multiple operating systems and platforms for easy integration.
- Open source code: Provide complete open source code for developers to carry out secondary development and customization.
Using Help
Installation process
- Cloning Project Warehouse:
git clone https://github.com/KoljaB/RealtimeSTT.git
- Go to the project catalog:
cd RealtimeSTT
- Install the dependencies:
pip install -r requirements.txt
- (Optional) Install GPU support:
pip install -r requirements-gpu.txt
Usage
Start the server
- Start the speech-to-text server:
stt-server
- After the server starts, wait for the prompt "speak now".
Client Usage
- Start the client and connect to the server:
stt
- Once the client is launched, start talking and the system will transcribe the speech to text in real time.
Main function operation flow
real time speech to text
- import (data)
AudioToTextRecorder
Class:
from RealtimeSTT import AudioToTextRecorder
- Defines functions that process text:
def process_text(text).
print(text)
- Starts the recording and processes the text:
if __name__ == '__main__'.
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True.
recorder.text(process_text)
Voice Activity Detection
- The system automatically detects when the user starts and stops talking, with no additional configuration required.
wake-up call activation
- Configure the wake-up word function so that users can activate the system with specific words, please refer to the project documentation for specific configuration.
Detailed operation examples
Typing everything that is said
- import (data)
AudioToTextRecorder
cap (a poem)pyautogui
::
from RealtimeSTT import AudioToTextRecorder
import pyautogui
- Defines functions that process text:
def process_text(text):
pyautogui.typewrite(text + " ")
- Starts the recording and processes the text:
if __name__ == '__main__'.
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
recorder = AudioToTextRecorder()
recorder.text(process_text)