xiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end services

Latest AI Resources5mos agorelease AI Sharing Circle

2.1K 00

General Introduction

The xiaozhi-esp32-server is a program for the Xiaozhi AI Chatbot (xiaozhi-esp32)Provide tools for back-end services. It is written in Python and based on the WebSocket protocol to help users quickly build a server to control ESP32 devices. This project is suitable for people who have already purchased ESP32 hardware and want to build their own backend. Its features are very practical , such as support for voice dialog , multi-language recognition and smart home control . The project is open source on GitHub, last updated March 2025, with detailed official documentation and an active community. Note that it is still under development and is not recommended to be used directly in a production environment.

Function List

WebSocket communication: Exchange data with the hardware in real time using the xiaozhi-esp32 protocol.
voice dialog: Wake-up dialog, manual dialog and real-time interruptions are supported, and it will automatically hibernate if it is not used for a long time.
Intent recognition: Recognize user intent with a large model and also call functions to execute instructions.
Multi-language support: Understand Chinese, Cantonese, English, Japanese, and Korean, and use FunASR by default.
Language Model SwitchingChatGLM: ChatGLM is used by default, but also supports Alibaba Refinement, DeepSeek and so on.
speech synthesis: Supports EdgeTTS and Volcano Engine TTS to generate natural speech.
memory mode: There are three options: extra-long memory, local summarization and no memory.
Smart Home Control: Can connect to HomeAssistant to control appliances on and off.

Using Help

Installation process

To use xiaozhi-esp32-server, you have to prepare your hardware and environment. Here are the steps:

1. Preparatory work

software: An ESP32 device that supports the xiaozhi-esp32 firmware, see theOfficial ListThe
laptops: 4-core CPU, 8GB RAM is recommended. If you use the API to run speech recognition, 2 cores, 2GB will also work.
hardware: Install Python 3.10 and Conda.

2. Downloading the project

Open https://github.com/xinnan-tech/xiaozhi-esp32-server.
Click "Code" and select "Download ZIP" to download.
After unzipping, rename the folder to "xiaozhi-esp32-server".

3. Configuring the Conda environment

Open the command line and run:

conda create -n xiaozhi-esp32-server python=3.10 -y
conda activate xiaozhi-esp32-server
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
conda install libopus -y
conda install ffmpeg -y

Check the output of each step to make sure no errors are reported.

4. Installation of dependencies

Go to the project folder:

cd xiaozhi-esp32-server/main/xiaozhi-server

Set up a domestic source and install it:

pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
pip install -r requirements.txt

5. Downloading the model

By default, FunASR speech recognition is used. Go to the official guidelines (see README) and download the models into the "models" folder.
Check the "SenseVoiceSmall" folder for "model.pt" and download it if it is not there.

6. Modification of the configuration

Open the "config.yaml" file and adjust the settings:
language model: default ChatGLM, can be changed to DeepSeek or HomeAssistant.
speech synthesis: Default EdgeTTS, replaceable Volcano Engine TTS.
silent time:: Putting min_silence_duration_ms Tune to 1000 (for slow talk).
Save the file.

7. Starting the server

Run in the "xiaozhi-server" folder:

python main.py

If you see "Server running on port 8000", you've succeeded.

8. Updating firmware

Put the server address (e.g. http://你的IP:8000) is written into the xiaozhi-esp32 firmware.
check or refer toFirmware Compilation GuidelinesRecompile and burn.

Other deployment options

Docker Rapid Deployment: Suitable for novices. Running docker pull xinnan-tech/xiaozhi-esp32-server, follow the guidelines for startup.
Docker source code to run: For people who change code. Install Docker first, then run the project with the official documentation.
For details, seeDeployment DocumentationThe

Function Operation

voice dialog

awakens: Say the wake word (set in firmware) and the server will respond.
manual trigger: Press the device button to start a dialog.
break off: Interrupting while talking is handled immediately by the device.
To operate: say "hello" to the device and listen to the reply.

multilingual recognition

Five languages are supported. Priority can be adjusted in the configuration file.
Operation: Say "Hello" or "Konichiwa" and the device understands.

speech synthesis

When TTS is configured, text becomes speech.
Action: Enter "today's weather" with a test script and the device will read it out.
Toggle: Change the TTS interface in the configuration file.

smart home

Connect HomeAssistant: Fill in the token.
Operation: Say "turn on the light" and the light comes on.
Test: Run python performance_tester.py Check the speed.

caveat

reticulation: To maintain stability, WebSocket relies on real-time connections.
surety: Public network deployments to open protection (auth: enabled: true).
adjust components during testing: Look at the command line logs and solve the problem.

application scenario

smart home

Connect to HomeAssistant, say "turn off air conditioning" and the air conditioning stops.

voice assistant

Put it on the table, ask "tomorrow's weather" and it will tell you.

language practice

Conversing in English, the device helps you practice your pronunciation.

QA

What about recognizing garbled language?

Check for "model.pt" in "models/SenseVoiceSmall". If not, go toguidelineDownload.

TTS error saying file does not exist?

Confirmed. libopus cap (a poem) ffmpeg. Run without loading. conda install conda-forge::libopus cap (a poem) conda install conda-forge::ffmpegThe

What if the response is too slow?

Switch to a faster model such as AliLLM + DoubaoTTS. run python performance_tester.py Take a measurement.

What if you are slow to speak and get robbed of your words?

In "config.yaml" put the min_silence_duration_ms Change to 1000.

How do you control the appliances?

In the configuration, select HomeAssistant, fill in the token, and say the command.

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

The article is copyrighted and should not be reproduced without permission.

AI Test Kitchen: Google's Experimental Platform for Idea Generation and AI Technology

Latest AI Resources # AI Writing # AI online image generation # AI Music

11 months ago

02.2K

DeepSeek-R1-0528 - DeepSeek开源的R1最新版 AI 模型

DeepSeek-R1-0528 - The latest version of DeepSeek's open source R1 AI model

Latest AI Resources

2 months ago

0787

文本提取API（text-extract-api）：视觉提取文本信息，匿名化的PDF提取工具

Text Extraction API (text-extract-api): visual extraction of text information, anonymized PDF extraction tool

Latest AI Resources # AI Java Open Source Projecct # OCR # Document Extraction and Cleaning

7 months ago

02K

Gemini Balance：Gemini模型API兼容OpenAI格式，解锁区域限制并支持多API Key轮询

Gemini Balance: Gemini model API is compatible with OpenAI format, unlocks region restrictions and supports multi-API key polling.

Latest AI Resources # AI Java Open Source Projecct

4 months ago

02K

No comments

You must be logged in to leave a comment!

No comments...

xiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end services

General Introduction

Function List

Using Help

Installation process

1. Preparatory work

2. Downloading the project

3. Configuring the Conda environment

4. Installation of dependencies

5. Downloading the model

6. Modification of the configuration

7. Starting the server

8. Updating firmware

Other deployment options

Function Operation

voice dialog

multilingual recognition

speech synthesis

smart home

caveat

application scenario

QA

Coze on WeChat: Bringing the Coze (button) bot to WeChat

Twin AI: AI tool for generating digital twin videos

Related articles

AI Test Kitchen: Google's Experimental Platform for Idea Generation and AI Technology

DeepSeek-R1-0528 - The latest version of DeepSeek's open source R1 AI model

Text Extraction API (text-extract-api): visual extraction of text information, anonymized PDF extraction tool

Gemini Balance: Gemini model API is compatible with OpenAI format, unlocks region restrictions and supports multi-API key polling.

No comments

Latest Collections

Latest Articles

xiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end services

General Introduction

Function List

Using Help

Installation process

1. Preparatory work

2. Downloading the project

3. Configuring the Conda environment

4. Installation of dependencies

5. Downloading the model

6. Modification of the configuration

7. Starting the server

8. Updating firmware

Other deployment options

Function Operation

voice dialog

multilingual recognition

speech synthesis

smart home

caveat

application scenario

QA

Coze on WeChat: Bringing the Coze (button) bot to WeChat

Twin AI: AI tool for generating digital twin videos

Related articles

AI Test Kitchen: Google's Experimental Platform for Idea Generation and AI Technology

DeepSeek-R1-0528 - The latest version of DeepSeek's open source R1 AI model

Text Extraction API (text-extract-api): visual extraction of text information, anonymized PDF extraction tool

Gemini Balance: Gemini model API is compatible with OpenAI format, unlocks region restrictions and supports multi-API key polling.

No comments

Selected AI Tools

Latest Collections

Latest Articles