General Introduction
Bailing (Bailing) is an open-source voice conversation assistant designed to engage in natural conversations with users through speech. The project combines speech recognition (ASR), voice activity detection (VAD), large language modeling (LLM), and speech synthesis (TTS) technologies to implement a GPT-4o-like voice conversation bot. With an end-to-end latency as low as 800ms, BaiLing is suitable for a variety of edge devices and low-resource environments. Its efficient open-source model and modular design allows it to run without a GPU, providing a high-quality voice conversation experience. With features such as memory function, tool invocation and task management, Biolabs is able to memorize user preferences and historical conversations to provide a personalized interactive experience.
Function List
- Speech Input: Accurate speech recognition through FunASR.
- Speech Activity Detection: Filtering invalid audio using silero-vad to improve recognition efficiency.
- Intelligent dialog generation: relying on deepseek The powerful language understanding provided generates natural text responses.
- Speech Output: Converts text to speech via edge-tts to provide users with realistic auditory feedback.
- Interruption support: Flexible configuration of interruption policies, capable of recognizing keywords and voice interruptions, ensuring immediate feedback and control for users in the conversation.
- Memory support: continuous learning capability to remember user preferences and history of conversations to provide a personalized interactive experience.
- Support for tool invocation: Flexible integration of external tools allows users to request information or perform actions directly through voice.
- Support for task management: Efficiently manage user tasks with the ability to track progress, set reminders, and provide dynamic updates.
Using Help
Installation and operation
Dependent environment
Make sure you have the following tools and libraries installed in your development environment:
- Python 3.8 or higher
- pip package manager
- Required dependencies for FunASR, silero-vad, deepseek, edge-tts
Installation steps
- Cloning Project Warehouse:
git clone https://github.com/wwbin2017/bailing.git
cd bailing
- Install the required dependencies:
pip install -r requirements.txt
- Configure environment variables: Open
config/config.yaml
Configure ASR, LLM and other related configurations. Download SenseVoiceSmall to your catalog.models/SenseVoiceSmall
The following is an example of a deepseek API key. Get the API key of deepseek and configure it, of course, you can also configure other models such as openai, qwen, gemini, 01yi and so on. - Run the project:
cd server
python server.py # Starts the backend service, or you can leave this step alone
python main.py
Instructions for use
After launching the app, the system will wait for voice input. Here is the detailed operation procedure:
- Convert user speech to text with FunASR.
- Use silero-vad for voice activity detection to ensure that only valid speech is processed.
- deepseek processes text input and generates smart responses.
- edge-tts, ChatTTS, macOS say converts the generated text to speech and plays it back to the user.
Functional operation flow
- voice input: The user inputs voice through the microphone and the system automatically performs voice recognition.
- Voice Activity Detection: The system automatically filters invalid audio to ensure recognition efficiency.
- Intelligent dialog generation: The system generates natural text responses based on user input.
- voice output: The system converts text responses to speech and plays them back to the user.
- Support for interruptions: The user can interrupt the current conversation by voice and the system will respond instantly.
- memory function: The system remembers the user's preferences and history of conversations to provide a personalized interactive experience.
- Tool Call: Users can request information or perform actions by voice, and the system flexibly integrates external tools.
- task management: Users can set up task reminders and the system will efficiently manage task progress and provide dynamic updates.
sample operation (computing)
- Get weather information: The user says, "What's the weather like in Hangzhou?" The system will return the weather conditions in Hangzhou.
- Creating Timed Tasks: User says, "Remind me to drink water every morning at 8:00 am." The system will set a timed reminder.
With the above detailed usage help, users can easily get started with BaiLing and enjoy an efficient voice conversation experience.