General Introduction
Ichigo is an open source, real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo aims to be an open-source data, open-weighted, native-device voice assistant, similar to Siri.The project is open for partners to join in the crowdsourcing of speech datasets.
Function List
- Real-time speech recognition: The ability to process and understand user voice input in real time.
- multicast dialogue capability: Supports multiple rounds of dialog and is able to maintain context in a conversation.
- noise management: The ability to refuse to process non-speech audio inputs through training improves the user experience.
- Open source and scalable: The project code and model weights are completely open source and users are free to download and extend them.
- local deployment: Supports deployment on local devices to protect user privacy.
Using Help
Installation process
- environmental preparation ::
- Ensure that Python 3.8 or above is installed.
- Install the necessary dependency libraries:
pip install -r requirements.txt
The
- Download model ::
- Use the following command to download the Ichigo model:
git clone https://github.com/homebrewltd/ichigo.git cd ichigo pip install -e .
- Use the following command to download the Ichigo model:
- Configuring the dataset ::
- Download the required dataset from HuggingFace and set the dataset path in the configuration file.
- Launch Demo ::
- Start the local Gradio Demo with the following command:
python demo.py --use-4bit --use-8bit
- Start the local Gradio Demo with the following command:
Usage Process
- Starting services ::
- After running the above command, visit the locally provided URL to access Ichigo's Web UI interface.
- voice input ::
- In the Web UI interface, click the microphone icon to start recording, and the system will process and display the speech recognition results in real time.
- many rounds of dialogue ::
- The system supports multiple rounds of dialog, where the user can continuously input speech and the system will maintain the context to understand and respond.
- noise management ::
- The system is trained to recognize and reject the processing of non-speech audio inputs to ensure the accuracy of the recognition results.
- Custom extensions ::
- Users can modify the code and model as needed to add new features or improve existing ones.
Detailed Operation Procedure
- Download and Installation ::
- Visit Ichigo's GitHub page and follow the installation process to download and install the necessary dependencies and models.
- Configuration and startup ::
- According to the configuration file provided by the project, set the dataset path and model parameters to start the local service.
- Using the Web UI ::
- Experience Ichigo's real-time speech recognition and multi-round dialog features by performing voice input and interaction through the Web UI interface.
- Extension and customization ::
- Understand the architecture and workings of the system based on project documentation and code comments for custom extensions.