General Introduction
Whisper Input is an open source speech transcription tool that allows users to start recording speech by pressing the Option button and end the recording by lifting the button. The tool calls Groq Whisper Large V3 Turbo model for speech translation, with fast feedback in 1-2 seconds.Whisper Input also supports speech translation by the SiliconFlow Hosted FunAudioLLM/SenseVoiceSmall model that provides faster recognition and higher accuracy. This program is particularly suitable for users who require efficient voice input, including the visually impaired.
Function List
- Voice Recording and Translation: Press the Option button to start recording, lift the button to end recording, and automatically call the model for translation.
- Multi-language support: Supports speech transcription in multiple languages.
- Rapid feedback: Most voice inputs can be returned within 1-2 seconds.
- free of charge: Supports free usage provided by Groq and SiliconFlow at no cost.
- Punctuation support: Automatically add punctuation to improve the readability of the translated text.
- Accessibility support: A simple macOS client is being developed to make it easier for visually impaired users.
Using Help
Installation process
- pre-conditions: Ensure that you have a local Python environment, no less than version 3.10.
- cloning project::
git clone https://github.com/ErlichLiu/Whisper-Input.git
- Creating a Virtual Environment::
python -m venv venv
- Activate the virtual environment::
- macOS/Linux.
bash
source venv/bin/activate
- Windows.
bash
.\venv\Scripts\activate
- macOS/Linux.
- Installation of dependencies::
pip install pip-tools
pip-compile requirements.in
pip install -r requirements.txt
configuration model
Groq Whisper Large V3 model
- Sign up for a Groq account::Groq Registration Page
- Get API KEY::Groq API KEY
- Configuring Environment Variables::
cp .env.example .env
Paste the API KEY into the .env
Documentation:
SERVICE_PLATFORM=groq
GROQ_API_KEY=你的API_KEY
SiliconFlow FunAudioLLM/SenseVoiceSmall Models
- Register for a SiliconFlow Account::SiliconFlow Registration Page
- Get API KEY::SiliconFlow API KEY
- Configuring Environment Variables::
cp .env.example .env
Paste the API KEY into the .env
Documentation:
SERVICE_PLATFORM=siliconflow
SILICONFLOW_API_KEY=你的API_KEY
running program
- triggering program::
python main.py
- Usage: Press the Option button to start recording voice, lift the button to end the recording, the program will automatically perform voice translation and feedback the result.
caveat
- background operation: The program needs to run in the background all the time, so it is recommended to run it in a terminal or terminal tab that is not closed very often.
- Accessibility support: In the future, a macOS client will be made available for visually impaired users.
One sentence description (brief)
Whisper Input is an efficient voice conversion tool that supports multi-language voice input and quickly and accurately converts speech to text for users who need efficient voice input.