AI Personal Learning
and practical guidance

Ichigo (llama3-s): local real-time voice AI assistant, open source version of Siri

General Introduction

Ichigo is an open source, real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo aims to be an open-source data, open-weighted, native-device voice assistant, similar to Siri.The project is open for partners to join in the crowdsourcing of speech datasets.

Ichigo (llama3-s): local real-time voice AI assistant, open source version of Siri


 

Function List

  • Real-time speech recognition: The ability to process and understand user voice input in real time.
  • multicast dialogue capability: Supports multiple rounds of dialog and is able to maintain context in a conversation.
  • noise management: The ability to refuse to process non-speech audio inputs through training improves the user experience.
  • Open source and scalable: The project code and model weights are completely open source and users are free to download and extend them.
  • local deployment: Supports deployment on local devices to protect user privacy.

 

Using Help

Installation process

  1. environmental preparation ::
    • Ensure that Python 3.8 or above is installed.
    • Install the necessary dependency libraries:pip install -r requirements.txtThe
  2. Download model ::
    • Use the following command to download the Ichigo model:
      git clone https://github.com/homebrewltd/ichigo.git
      cd ichigo
      pip install -e .
      
  3. Configuring the dataset ::
    • Download the required dataset from HuggingFace and set the dataset path in the configuration file.
  4. Launch Demo ::
    • Start the local Gradio Demo with the following command:
      python demo.py --use-4bit --use-8bit
      

Usage Process

  1. Starting services ::
    • After running the above command, visit the locally provided URL to access Ichigo's Web UI interface.
  2. voice input ::
    • In the Web UI interface, click the microphone icon to start recording, and the system will process and display the speech recognition results in real time.
  3. many rounds of dialogue ::
    • The system supports multiple rounds of dialog, where the user can continuously input speech and the system will maintain the context to understand and respond.
  4. noise management ::
    • The system is trained to recognize and reject the processing of non-speech audio inputs to ensure the accuracy of the recognition results.
  5. Custom extensions ::
    • Users can modify the code and model as needed to add new features or improve existing ones.

Detailed Operation Procedure

  1. Download and Installation ::
    • Visit Ichigo's GitHub page and follow the installation process to download and install the necessary dependencies and models.
  2. Configuration and startup ::
    • According to the configuration file provided by the project, set the dataset path and model parameters to start the local service.
  3. Using the Web UI ::
    • Experience Ichigo's real-time speech recognition and multi-round dialog features by performing voice input and interaction through the Web UI interface.
  4. Extension and customization ::
    • Understand the architecture and workings of the system based on project documentation and code comments for custom extensions.
May not be reproduced without permission:Chief AI Sharing Circle " Ichigo (llama3-s): local real-time voice AI assistant, open source version of Siri

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish