Bailing: a low-latency open source voice dialog assistant that easily realizes natural conversational communication

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

Bailing (Bailing) is an open-source voice conversation assistant designed to engage in natural conversations with users through speech. The project combines speech recognition (ASR), voice activity detection (VAD), large language modeling (LLM), and speech synthesis (TTS) technologies to implement a GPT-4o-like voice conversation bot. With an end-to-end latency as low as 800ms, BaiLing is suitable for a variety of edge devices and low-resource environments. Its efficient open-source model and modular design allows it to run without a GPU, providing a high-quality voice conversation experience. With features such as memory function, tool invocation and task management, Biolabs is able to memorize user preferences and historical conversations to provide a personalized interactive experience.

BaiLing: a low-latency open-source voice conversation assistant that easily realizes natural conversational communication-1

Function List

Speech Input: Accurate speech recognition through FunASR.
Speech Activity Detection: Filtering invalid audio using silero-vad to improve recognition efficiency.
Intelligent dialog generation: relying on deepseek The powerful language understanding provided generates natural text responses.
Speech Output: Converts text to speech via edge-tts to provide users with realistic auditory feedback.
Interruption support: Flexible configuration of interruption policies, capable of recognizing keywords and voice interruptions, ensuring immediate feedback and control for users in the conversation.
Memory support: continuous learning capability to remember user preferences and history of conversations to provide a personalized interactive experience.
Support for tool invocation: Flexible integration of external tools allows users to request information or perform actions directly through voice.
Support for task management: Efficiently manage user tasks with the ability to track progress, set reminders, and provide dynamic updates.

Using Help

Installation and operation

Dependent environment

Make sure you have the following tools and libraries installed in your development environment:

Python 3.8 or higher
pip package manager
Required dependencies for FunASR, silero-vad, deepseek, edge-tts

Installation steps

Cloning Project Warehouse:

   git clone https://github.com/wwbin2017/bailing.git
cd bailing

Install the required dependencies:

   pip install -r requirements.txt

Configure environment variables: Open config/config.yaml Configure ASR, LLM and other related configurations. Download SenseVoiceSmall to your catalog. models/SenseVoiceSmallThe following is an example of a deepseek API key. Get the API key of deepseek and configure it, of course, you can also configure other models such as openai, qwen, gemini, 01yi and so on.
Run the project:

   cd server
python server.py  # 启动后端服务，也可不执行这一步
python main.py

Instructions for use

After launching the app, the system will wait for voice input. Here is the detailed operation procedure:

Convert user speech to text with FunASR.
Use silero-vad for voice activity detection to ensure that only valid speech is processed.
deepseek processes text input and generates smart responses.
edge-tts, ChatTTS, macOS say converts the generated text to speech and plays it back to the user.

Functional operation flow

voice input: The user inputs voice through the microphone and the system automatically performs voice recognition.
Voice Activity Detection: The system automatically filters invalid audio to ensure recognition efficiency.
Intelligent dialog generation: The system generates natural text responses based on user input.
voice output: The system converts text responses to speech and plays them back to the user.
Support for interruptions: The user can interrupt the current conversation by voice and the system will respond instantly.
memory function: The system remembers the user's preferences and history of conversations to provide a personalized interactive experience.
Tool Call: Users can request information or perform actions by voice, and the system flexibly integrates external tools.
task management: Users can set up task reminders and the system will efficiently manage task progress and provide dynamic updates.

sample operation (computing)

Get weather information: The user says, "What's the weather like in Hangzhou?" The system will return the weather conditions in Hangzhou.
Creating Timed Tasks: User says, "Remind me to drink water every morning at 8:00 am." The system will set a timed reminder.

With the above detailed usage help, users can easily get started with BaiLing and enjoy an efficient voice conversation experience.

Bailing: a low-latency open source voice dialog assistant that easily realizes natural conversational exchanges

General Introduction

Function List

Using Help

Installation and operation

Dependent environment

Installation steps

Instructions for use

Functional operation flow

sample operation (computing)

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification