General Introduction
Infini-Megrez is an edge intelligence solution developed by Infinigence AI, aiming to achieve efficient multimodal understanding and analysis through hardware and software co-design. At the heart of the project is the Megrez-3B model, which supports integrated image, text and audio understanding with high accuracy and fast inference.The Megrez-3B model performs well in a number of mainstream benchmarks and is suitable for tasks such as scene understanding and optical character recognition (OCR). The project provides complete deployment code for developers to easily apply it on various platforms.
Function List
- graphic understanding: Constructs image markers using SigLip-400M and performs well in benchmarks such as MME, MMVet, and OCRBench.
- language understanding: Maintains excellent text comprehension and performs well in benchmark tests such as C-EVAL, MMLU, etc.
- speech understanding: Supports Chinese and English voice input, multi-round dialog and voice command response.
- fast inference: Achieve up to 300% inference speedup through hardware and software co-design.
- easy-to-use: Adopts the classic LLaMA architecture, making it easy for developers to deploy on a variety of platforms.
- Rich Applications: Provides a full-stack WebSearch solution that automatically determines the timing of search calls to provide better summary results.
Using Help
Installation process
- clone warehouse: Clone the Infini-Megrez repository by running the following command in a terminal:
git clone https://github.com/infinigence/Infini-Megrez.git
- Installation of dependencies: Go to the project directory and install the required dependencies:
cd Infini-Megrez
pip install -r requirements.txt
- Download model: Download the required model files according to the guidelines in the README file and place them in the specified directory.
Guidelines for use
- graphic understanding::
- Places the image file in the specified directory.
- Run the image understanding script:
python image_understanding.py --input_dir . /images
- View the output with image markers and analysis results.
- language understanding::
- Places the text file in the specified directory.
- Running language understanding scripts:
python text_understanding.py --input_dir . /texts
- View the output, containing text analysis and comprehension results.
- speech understanding::
- Places the audio file in the specified directory.
- Run the speech understanding script:
bash
python speech_understanding.py --input_dir . /audios
- View the output with speech-to-text and analysis results.
Featured Functions Operation Procedure
- multimodal understanding::
- Place the image, text and audio files in the corresponding directories.
- Run the multimodal comprehension script:
python multimodal_understanding.py --image_dir . /images --text_dir . /texts --audio_dir . /audios
- View the results of a comprehensive analysis that includes joint comprehension and analysis of images, text, and speech.
- WebSearch Solutions::
- Configure the WebSearch module and make sure the network connection is working.
- Run the WebSearch script:
bash
python websearch.py --query "Enter the query"
- View search results and summaries. The system automatically determines whether the search function needs to be invoked and provides optimized summary results.
Through the above steps, users can fully understand and use the functions of Infini-Megrez to realize efficient multimodal understanding and analysis.