Ichigo (llama3-s): local real-time voice AI assistant, open source version of Siri

Latest AI Resources7mos agoupdate AI Sharing Circle

2.1K 00

General Introduction

Ichigo is an open source, real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo aims to be an open-source data, open-weighted, native-device voice assistant, similar to Siri.The project is open for partners to join in the crowdsourcing of speech datasets.

Function List

Real-time speech recognition: The ability to process and understand user voice input in real time.
multicast dialogue capability: Supports multiple rounds of dialog and is able to maintain context in a conversation.
noise management: The ability to refuse to process non-speech audio inputs through training improves the user experience.
Open source and scalable: The project code and model weights are completely open source and users are free to download and extend them.
local deployment: Supports deployment on local devices to protect user privacy.

Using Help

Installation process

environmental preparation ::
- Ensure that Python 3.8 or above is installed.
- Install the necessary dependency libraries:pip install -r requirements.txtThe

Download model ::

Use the following command to download the Ichigo model:

git clone https://github.com/homebrewltd/ichigo.git
cd ichigo
pip install -e .

Configuring the dataset ::
- Download the required dataset from HuggingFace and set the dataset path in the configuration file.
Launch Demo ::
- Start the local Gradio Demo with the following command:
```
python demo.py --use-4bit --use-8bit
```

Usage Process

Starting services ::
- After running the above command, visit the locally provided URL to access Ichigo's Web UI interface.
voice input ::
- In the Web UI interface, click the microphone icon to start recording, and the system will process and display the speech recognition results in real time.
many rounds of dialogue ::
- The system supports multiple rounds of dialog, where the user can continuously input speech and the system will maintain the context to understand and respond.
noise management ::
- The system is trained to recognize and reject the processing of non-speech audio inputs to ensure the accuracy of the recognition results.
Custom extensions ::
- Users can modify the code and model as needed to add new features or improve existing ones.

Detailed Operation Procedure

Download and Installation ::
- Visit Ichigo's GitHub page and follow the installation process to download and install the necessary dependencies and models.
Configuration and startup ::
- According to the configuration file provided by the project, set the dataset path and model parameters to start the local service.
Using the Web UI ::
- Experience Ichigo's real-time speech recognition and multi-round dialog features by performing voice input and interaction through the Web UI interface.
Extension and customization ::
- Understand the architecture and workings of the system based on project documentation and code comments for custom extensions.