AI Personal Learning
and practical guidance
豆包Marscode1

Whisper Input: a free and high-speed voice-to-text transcription service using Groq

General Introduction

Whisper Input is an open source speech transcription tool that allows users to start recording speech by pressing the Option button and end the recording by lifting the button. The tool calls Groq Whisper Large V3 Turbo model for speech translation, with fast feedback in 1-2 seconds.Whisper Input also supports speech translation by the SiliconFlow Hosted FunAudioLLM/SenseVoiceSmall model that provides faster recognition and higher accuracy. This program is particularly suitable for users who require efficient voice input, including the visually impaired.

Whisper Input:利用Groq免费且高速的语音转录文本服务-1


 

Function List

  • Voice Recording and Translation: Press the Option button to start recording, lift the button to end recording, and automatically call the model for translation.
  • Multi-language support: Supports speech transcription in multiple languages.
  • Rapid feedback: Most voice inputs can be returned within 1-2 seconds.
  • free of charge: Supports free usage provided by Groq and SiliconFlow at no cost.
  • Punctuation support: Automatically add punctuation to improve the readability of the translated text.
  • Accessibility support: A simple macOS client is being developed to make it easier for visually impaired users.

 

Using Help

Installation process

  1. pre-conditions: Ensure that you have a local Python environment, no less than version 3.10.
  2. cloning project::
   git clone https://github.com/ErlichLiu/Whisper-Input.git
  1. Creating a Virtual Environment::
   python -m venv venv
  1. Activate the virtual environment::
    • macOS/Linux. bash
      source venv/bin/activate
    • Windows. bash
      .\venv\Scripts\activate
  2. Installation of dependencies::
   pip install pip-tools
pip-compile requirements.in
pip install -r requirements.txt

configuration model

Groq Whisper Large V3 model

  1. Sign up for a Groq account::Groq Registration Page
  2. Get API KEY::Groq API KEY
  3. Configuring Environment Variables::
   cp .env.example .env

Paste the API KEY into the .env Documentation:

   SERVICE_PLATFORM=groq
GROQ_API_KEY=你的API_KEY

SiliconFlow FunAudioLLM/SenseVoiceSmall Models

  1. Register for a SiliconFlow Account::SiliconFlow Registration Page
  2. Get API KEY::SiliconFlow API KEY
  3. Configuring Environment Variables::
   cp .env.example .env

Paste the API KEY into the .env Documentation:

   SERVICE_PLATFORM=siliconflow
SILICONFLOW_API_KEY=你的API_KEY

running program

  1. triggering program::
   python main.py
  1. Usage: Press the Option button to start recording voice, lift the button to end the recording, the program will automatically perform voice translation and feedback the result.

caveat

  • background operation: The program needs to run in the background all the time, so it is recommended to run it in a terminal or terminal tab that is not closed very often.
  • Accessibility support: In the future, a macOS client will be made available for visually impaired users.

One sentence description (brief)

Whisper Input is an efficient voice conversion tool that supports multi-language voice input and quickly and accurately converts speech to text for users who need efficient voice input.

May not be reproduced without permission:Chief AI Sharing Circle " Whisper Input: a free and high-speed voice-to-text transcription service using Groq
en_USEnglish