General Introduction
SimGRAG (SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation) is a Knowledge Graphs Driven Retrieval-Augmented Generation (RAG) based approach. The project aims to enhance the performance of Knowledge Graphs in tasks such as Q&A and fact verification by utilizing similar subgraphs.SimGRAG supports plug-and-play usage, combining large language models, embedding models and vector databases to provide efficient similarity search and generation capabilities. The project relies on open source solutions such as Ollama, Nomic embedding models and Milvus vector databases, and users can replace these components as needed.
Function List
- Large Language Model Generation: Generation task using the Llama 3 70B model.
- Node and relationship embedding: Embedding of nodes and relations using the Nomic embedding model.
- vector database: Supports efficient similarity search using Milvus to store embeddings of nodes and relations.
- Data preparation: Supports download and preparation of MetaQA and FactKG datasets.
- configuration file: Provide modifiable configuration files to suit different needs.
- Pipeline operation: Provide scripts to run pipelines, support MetaQA and FactKG indexing and querying.
Using Help
Installation process
- Installation of Ollama::
- Visit the official Ollama website and follow the instructions to install Ollama.
- After the installation is complete, run the following command to start the Llama 3 70B model:
ollama run llama3:70b
- Start the services required by SimGRAG:
bash ollama_server.sh
- Installation of Nomic Embedded Models::
- Cloning Nomic Embedding Models:
mkdir -p data/raw cd data/raw git clone https://huggingface.co/nomic-ai/nomic-embed-text-v1
- Installing Milvus::
- Visit the Milvus website and follow the documentation to install Milvus.
- After the installation is complete, start the Milvus service.
Data preparation
- MetaQA dataset::
- Download the MetaQA dataset and place it in the
data/raw
folder.
- Download the MetaQA dataset and place it in the
- FactKG data set::
- Download the FactKG dataset and place it in the
data/raw
folder.
- Download the FactKG dataset and place it in the
Running Pipes
- MetaQA::
- Run the following commands for indexing and querying:
cd pipeline python metaQA_index.py python metaQA_query1hop.py python metaQA_query2hop.py python metaQA_query3hop.py
- FactKG::
- Run the following commands for indexing and querying:
bash
cd pipeline
python factKG_index.py
python factKG_query.py
- Run the following commands for indexing and querying:
configuration file
- The configuration file is located in the
configs
folder, the user can modify the profile as needed to accommodate different tasks and data sets.
Results View
- The results of the query will be saved in the output file specified in the configuration file, for example
results/FactKG_query.txt
. The result of each line is a dictionary with the keycorrect
Indicates the correctness of the final answer.
With the above steps, users can quickly get started with the knowledge graph-driven retrieval enhancement generation task using SimGRAG.