Whisper Input: a free and high-speed voice-to-text transcription service using Groq

Latest AI Resources6mos agorelease AI Sharing Circle

1.5K 00

General Introduction

Whisper Input is an open source speech transcription tool that allows users to start recording speech by pressing the Option button and end the recording by lifting the button. The tool calls Groq Whisper Large V3 Turbo model for speech translation, with fast feedback in 1-2 seconds.Whisper Input also supports speech translation by the SiliconFlow Hosted FunAudioLLM/SenseVoiceSmall model that provides faster recognition and higher accuracy. This program is particularly suitable for users who require efficient voice input, including the visually impaired.

Function List

Voice Recording and Translation: Press the Option button to start recording, lift the button to end recording, and automatically call the model for translation.
Multi-language support: Supports speech transcription in multiple languages.
Rapid feedback: Most voice inputs can be returned within 1-2 seconds.
free of charge: Supports free usage provided by Groq and SiliconFlow at no cost.
Punctuation support: Automatically add punctuation to improve the readability of the translated text.
Accessibility support: A simple macOS client is being developed to make it easier for visually impaired users.

Using Help

Installation process

pre-conditions: Ensure that you have a local Python environment, no less than version 3.10.
cloning project::

   git clone https://github.com/ErlichLiu/Whisper-Input.git

Creating a Virtual Environment::

   python -m venv venv

Activate the virtual environment::
- macOS/Linux. bash source venv/bin/activate
- Windows. bash .\venv\Scripts\activate
Installation of dependencies::

   pip install pip-tools
pip-compile requirements.in
pip install -r requirements.txt

configuration model

Groq Whisper Large V3 model

Sign up for a Groq account::Groq Registration Page
Get API KEY::Groq API KEY
Configuring Environment Variables::

   cp .env.example .env

Paste the API KEY into the .env Documentation:

   SERVICE_PLATFORM=groq
GROQ_API_KEY=你的API_KEY

SiliconFlow FunAudioLLM/SenseVoiceSmall Models

Register for a SiliconFlow Account::SiliconFlow Registration Page
Get API KEY::SiliconFlow API KEY
Configuring Environment Variables::

   cp .env.example .env

Paste the API KEY into the .env Documentation:

   SERVICE_PLATFORM=siliconflow
SILICONFLOW_API_KEY=你的API_KEY

running program

triggering program::

   python main.py

Usage: Press the Option button to start recording voice, lift the button to end the recording, the program will automatically perform voice translation and feedback the result.

caveat

background operation: The program needs to run in the background all the time, so it is recommended to run it in a terminal or terminal tab that is not closed very often.
Accessibility support: In the future, a macOS client will be made available for visually impaired users.

One sentence description (brief)

Whisper Input is an efficient voice conversion tool that supports multi-language voice input and quickly and accurately converts speech to text for users who need efficient voice input.