General Introduction
Weebo is an open source real-time voice chatbot that utilizes the Whisper Small for speech recognition, Llama 3.2 for natural language generation, and Kokoro-82M for speech synthesis. Developed by Amanvir Parhar, the project aims to provide an efficient voice dialog solution that runs on native devices.Weebo supports multiple voices and smoothly generates real-time responses for a wide range of application scenarios that require voice interaction.
Function List
- Real-time speech recognition: Efficient speech-to-text processing using the Whisper Small model.
- Natural Language Generation: Generate natural language responses with the Llama 3.2 model.
- Speech Synthesis: Converts text to speech using the Kokoro-82M model.
- Multi-sound support: Provides multiple sound options to enhance the user experience.
- Runs locally: No need to rely on cloud services, all processing is done on the local device.
- Open source code: the code is open, allowing users to freely modify and extend the functionality.
Using Help
Installation process
- Download the required model:
- Download Kokoro-82M model file
kokoro-v0_19.onnx
and placed in the project folder. - utilization Ollama The tool pulls the Llama 3.2 model.
- Download Kokoro-82M model file
- Clone Weebo project code:
git clone https://github.com/amanvirparhar/weebo.git
cd weebo
- Install the dependencies:
pip install -r requirements.txt
- Run the chatbot:
python main.py
Instructions for use
- After starting the program, Weebo will start listening for voice input.
- Users can speak naturally and Weebo will generate a voice response after a short pause.
- check or refer to
Ctrl+C
The program can be stopped.
Main function operation flow
- speech recognition: Weebo uses the Whisper Small model for speech recognition and is able to accurately convert a user's speech into text.
- natural language generation: Using the Llama 3.2 model, Weebo understands the user's speech input and generates a natural language response.
- speech synthesis: Using the Kokoro-82M model, Weebo converts the generated text response into speech and plays it back over the loudspeaker.
- Multi-Voice Support: Users can select different sound models in the configuration file to meet different application requirements.
Detailed steps
- Launch Weebo: Run
python main.py
The program will start listening to the user's voice input. - voice input: Users can speak directly into the microphone and Weebo will automatically recognize and process the voice.
- Generating a Response: After recognizing the speech, Weebo generates a natural language response using the Llama 3.2 model and converts it to speech using the Kokoro-82M model.
- Playback Response: The generated voice response is played through the speaker and the user can hear Weebo's answer.
- stop program: Press
Ctrl+C
Weebo can be stopped at any time.
With the above steps, users can easily start using Weebo to have real-time voice conversations and experience natural and smooth voice interaction.