AI Personal Learning
and practical guidance
CyberKnife Drawing Mirror

xiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end services

General Introduction

The xiaozhi-esp32-server is a program for the Xiaozhi AI Chatbot (xiaozhi-esp32)Provide tools for back-end services. It is written in Python and based on the WebSocket protocol to help users quickly build a server to control ESP32 devices. This project is suitable for people who have already purchased ESP32 hardware and want to build their own backend. Its features are very practical , such as support for voice dialog , multi-language recognition and smart home control . The project is open source on GitHub, last updated March 2025, with detailed official documentation and an active community. Note that it is still under development and is not recommended to be used directly in a production environment.

xiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end service-1


 

Function List

  • WebSocket communication: Exchange data with the hardware in real time using the xiaozhi-esp32 protocol.
  • voice dialog: Wake-up dialog, manual dialog and real-time interruptions are supported, and it will automatically hibernate if it is not used for a long time.
  • Intent recognition: Recognize user intent with a large model and also call functions to execute instructions.
  • Multi-language support: Understand Chinese, Cantonese, English, Japanese, and Korean, and use FunASR by default.
  • Language Model SwitchingChatGLM: ChatGLM is used by default, but also supports Alibaba Refinement, DeepSeek and so on.
  • speech synthesis: Supports EdgeTTS and Volcano Engine TTS to generate natural speech.
  • memory mode: There are three options: extra-long memory, local summarization and no memory.
  • Smart Home Control: Can connect to HomeAssistant to control appliances on and off.

 

Using Help

Installation process

To use xiaozhi-esp32-server, you have to prepare your hardware and environment. Here are the steps:

1. Preparatory work

  • software: An ESP32 device that supports the xiaozhi-esp32 firmware, see theOfficial ListThe
  • laptops: 4-core CPU, 8GB RAM is recommended. If you use the API to run speech recognition, 2 cores, 2GB will also work.
  • hardware: Install Python 3.10 and Conda.

2. Downloading the project

  • Open https://github.com/xinnan-tech/xiaozhi-esp32-server.
  • Click "Code" and select "Download ZIP" to download.
  • After unzipping, rename the folder to "xiaozhi-esp32-server".

3. Configuring the Conda environment

  • Open the command line and run:
conda create -n xiaozhi-esp32-server python=3.10 -y
conda activate xiaozhi-esp32-server
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
conda install libopus -y
conda install ffmpeg -y
  • Check the output of each step to make sure no errors are reported.

4. Installation of dependencies

  • Go to the project folder:
cd xiaozhi-esp32-server/main/xiaozhi-server
  • Set up a domestic source and install it:
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
pip install -r requirements.txt

5. Downloading the model

  • By default, FunASR speech recognition is used. Go to the official guidelines (see README) and download the models into the "models" folder.
  • Check the "SenseVoiceSmall" folder for "model.pt" and download it if it is not there.

6. Modification of the configuration

  • Open the "config.yaml" file and adjust the settings:
  • language model: default ChatGLM, can be changed to DeepSeek or HomeAssistant.
  • speech synthesis: Default EdgeTTS, replaceable Volcano Engine TTS.
  • silent time:: Putting min_silence_duration_ms Tune to 1000 (for slow talk).
  • Save the file.

7. Starting the server

  • Run in the "xiaozhi-server" folder:

python main.py

  • If you see "Server running on port 8000", you've succeeded.

8. Updating firmware

  • Put the server address (e.g. http://你的IP:8000) is written into the xiaozhi-esp32 firmware.
  • check or refer toFirmware Compilation GuidelinesRecompile and burn.

Other deployment options

  • Docker Rapid Deployment: Suitable for novices. Running docker pull xinnan-tech/xiaozhi-esp32-server, follow the guidelines for startup.
  • Docker source code to run: For people who change code. Install Docker first, then run the project with the official documentation.
  • For details, seeDeployment DocumentationThe

Function Operation

voice dialog

  • awakens: Say the wake word (set in firmware) and the server will respond.
  • manual trigger: Press the device button to start a dialog.
  • break off: Interrupting while talking is handled immediately by the device.
  • To operate: say "hello" to the device and listen to the reply.

multilingual recognition

  • Five languages are supported. Priority can be adjusted in the configuration file.
  • Operation: Say "Hello" or "Konichiwa" and the device understands.

speech synthesis

  • When TTS is configured, text becomes speech.
  • Action: Enter "today's weather" with a test script and the device will read it out.
  • Toggle: Change the TTS interface in the configuration file.

smart home

  • Connect HomeAssistant: Fill in the token.
  • Operation: Say "turn on the light" and the light comes on.
  • Test: Run python performance_tester.py Check the speed.

caveat

  • reticulation: To maintain stability, WebSocket relies on real-time connections.
  • surety: Public network deployments to open protection (auth: enabled: true).
  • adjust components during testing: Look at the command line logs and solve the problem.

 

application scenario

  1. smart home
  • Connect to HomeAssistant, say "turn off air conditioning" and the air conditioning stops.
  1. voice assistant
  • Put it on the table, ask "tomorrow's weather" and it will tell you.
  1. language practice
  • Conversing in English, the device helps you practice your pronunciation.

 

QA

  1. What about recognizing garbled language?
  • Check for "model.pt" in "models/SenseVoiceSmall". If not, go toguidelineDownload.
  1. TTS error saying file does not exist?
  • Confirmed. libopus cap (a poem) ffmpeg. Run without loading. conda install conda-forge::libopus cap (a poem) conda install conda-forge::ffmpegThe
  1. What if the response is too slow?
  • Switch to a faster model such as AliLLM + DoubaoTTS. run python performance_tester.py Take a measurement.
  1. What if you are slow to speak and get robbed of your words?
  • In "config.yaml" put the min_silence_duration_ms Change to 1000.
  1. How do you control the appliances?
  • In the configuration, select HomeAssistant, fill in the token, and say the command.
May not be reproduced without permission:Chief AI Sharing Circle " xiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end services

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish