AI Personal Learning
and practical guidance

MiniRAG: Simplified Retrieval Enhanced Generation Framework, Entity Graph Index Recall Relevant Text Blocks

General Introduction

MiniRAG is an extremely simple Retrieval Augmented Generation (RAG) framework that aims to achieve good RAG performance even for small models through heterogeneous graph indexing and lightweight topology-enhanced retrieval. Developed by the Data Science Laboratory of the University of Hong Kong (HKUDS), the project focuses on solving the performance degradation problem faced by Small Language Models (SLMs) in existing RAG frameworks. miniRAG reduces the reliance on complex semantic understanding by combining text blocks and named entities in a unified structure, and utilizes graph structures for efficient knowledge discovery. The framework achieves comparable performance with only 251 TP3T of storage space of the Large Language Model (LLM) approach.

MiniRAG: Simplified Retrieval Enhanced Generation Framework, Entity Graph Index Recall Related Text Blocks-1


 

Function List

  • Heterogeneous graph indexing mechanism: combining text blocks and named entities to reduce reliance on complex semantic understanding.
  • Lightweight topology-enhanced retrieval: efficient knowledge discovery using graph structures.
  • Compatible with small language models: provides efficient RAG performance in resource-constrained scenarios.
  • Comprehensive benchmark dataset: the LiHua-World dataset is provided to evaluate the performance of lightweight RAG systems under complex queries.
  • Easy installation: supports installation from source code and PyPI.

 

Using Help

Installation process

Installation from source (recommended)

  1. Cloning the MiniRAG repository:
   git clone https://github.com/HKUDS/MiniRAG.git
cd MiniRAG
  1. Install the dependencies:
   pip install -e .

Installation from PyPI

MiniRAG is based on LightRAG and can therefore be installed directly:

pip install lightrag-hku

Quick Start

  1. Download the desired dataset and place it in the. /datasetcatalog. For example, the LiHua-World dataset has been placed in the. /dataset/LiHua-World/data/Catalog.
  2. Use the following command to index the dataset:
   python . /reproduce/Step_0_index.py
  1. Run the Q&A module:
   python . /reproduce/Step_1_QA.py
  1. Alternatively, use the. /main.pyThe code in initializes the MiniRAG.

Main function operation flow

Heterogeneous graph indexing mechanism

MiniRAG creates heterogeneous graph indexes by combining text blocks and named entities in a unified structure. Users can achieve this by following the steps below:

  1. Prepare the dataset and ensure that the dataset is formatted as required.
  2. Run the indexing script:
   python . /reproduce/Step_0_index.py
  1. After indexing is complete, the data will be stored in the specified directory for subsequent retrieval.

Lightweight Topology Enhanced Search

MiniRAG utilizes the graph structure for efficient knowledge discovery, and users can retrieve it through the following steps:

  1. Initialize the MiniRAG:
   from minirag import MiniRAG
model = MiniRAG()
  1. Load the dataset and retrieve it:
   results = model.retrieve("your query")
  1. Processes the search results and generates a response:
   response = model.generate(results)

With the above steps, users can fully utilize MiniRAG's features for efficient search enhancement generation.

May not be reproduced without permission:Chief AI Sharing Circle " MiniRAG: Simplified Retrieval Enhanced Generation Framework, Entity Graph Index Recall Relevant Text Blocks

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish