KG Gen: An Open Source Tool for Automatic Knowledge Graph Generation from Plain Text

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

KGGen is an open source tool developed by the Stanford Trusted Artificial Intelligence Research Lab (STAIR Lab) and hosted on GitHub, designed to automatically generate knowledge graphs from arbitrary text. It uses advanced language models and clustering algorithms to transform unstructured textual data into structured networks of entities and relationships for researchers, developers, and data analysts. The project has received attention since its release and has been praised for its improvements in knowledge extraction accuracy and graph connectivity.KGGen's core strengths of simplicity of operation and reliability of results have been used for both academic research and AI application development, and was last updated on February 20, 2025.The project has also been used for the development of AI applications.

KG Gen：从纯文本中自动生成知识图谱的开源工具-1

Function List

Text-to-Knowledge Graph Conversion: Extract entities and relationships from arbitrary text input to generate a structured knowledge graph.
Support for multilingual models: Integrate mainstream language models to enhance text comprehension and structuring.
Clustering Algorithm Optimization: Enhancing the connectivity and logic of the knowledge graph through clustering techniques.
Open Source Customizable: Full code is provided, and users can modify and extend the functionality according to their needs.
Data export: The generated knowledge graph supports export in multiple formats for subsequent analysis and application.

Using Help

Installation process

KGGen is a Python-based tool that requires some programming environment configuration for deployment. The following are the detailed installation steps:

1. Environmental preparation

operating system: Windows, MacOS and Linux are supported.
Python version: Python 3.8 or above is recommended.
Git: Make sure you have Git installed for cloning your code base.
Dependency management tools: Recommended use pip maybe condaThe

2. Cloning the code base

Clone the KGGen project locally by entering the following command in a terminal or command line:

git clone https://github.com/stair-lab/kg-gen.git
cd kg-gen

3. Installation of dependencies

The program provides a requirements.txt file containing the required dependency libraries. Run the following command to install it:

pip install -r requirements.txt

If you use the conda, you can create a virtual environment first:

conda create -n kggen python=3.8
conda activate kggen
pip install -r requirements.txt

4. Verification of installation

Once the installation is complete, go to the Python interpreter and enter the following code to check for success:

import kg_gen
print(kg_gen.__version__)

If the output version number (e.g. 1.0.0), indicating a successful installation.

Usage

The main function of KGGen is to generate knowledge graphs from text, and the following is the specific operation process:

1. Preparation of input text

Create a text file (e.g. input.txt), write the text to be processed. For example:

人工智能正在改变世界。机器学习是人工智能的核心技术。斯坦福大学的研究团队开发了许多创新工具。

Save the file to the kg-gen Catalog.

2. Running KGGen

Go to the project directory in the terminal and execute the following command:

python -m kg_gen --input input.txt --output graph.json

--input: Specifies the input text file path.
--output: Specify the path to the generated Knowledge Graph output file (JSON format is supported).

3. Viewing the results

After the run is complete, open the graph.json, you will see something like the following:

{
"entities": ["人工智能", "机器学习", "斯坦福大学"],
"relations": [
{"source": "人工智能", "target": "机器学习", "relation": "包含"},
{"source": "斯坦福大学", "target": "创新工具", "relation": "开发"}
]
}

This means that KGGen has extracted the entity from the text and created a relationship.

4. Customized configuration (optional)

KGGen supports tuning parameters to optimize results. Editing config.py Documentation, if any, may be modified:

language model: Replace with another pre-trained model (e.g. BERT).
clustering parameter: Adjust the clustering threshold to change the plot density.
Save and re-run the above command after modification.

Featured Function Operation

Batch processing of multiple files

If you need to process multiple text files, you can use a script loop call:

for file in *.txt; do python -m kg_gen --input "$file" --output "${file%.txt}.json"; done

This will provide the opportunity for each .txt file generates the corresponding .json Atlas file.

visual knowledge graph

KGGen does not have a built-in visualization tool, but you can use third-party libraries (such as the networkx cap (a poem) matplotlib) Mapping:

Install the dependencies:

pip install networkx matplotlib

Write the following Python script (visualize.py):

import json
import networkx as nx
import matplotlib.pyplot as plt
with open('graph.json', 'r') as f:
data = json.load(f)
G = nx.DiGraph()
for rel in data['relations']:
G.add_edge(rel['source'], rel['target'], label=rel['relation'])
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_color='lightblue', font_size=10)
edge_labels = nx.get_edge_attributes(G, 'label')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
plt.show()

Run the script:

python visualize.py

The generated knowledge graphic can be seen.

Debugging and Logging

If the generated results are not as expected, debug mode can be enabled:

python -m kg_gen --input input.txt --output graph.json --verbose

This will output a detailed log to help locate the problem.

caveat

Text quality: The clearer the input text, the more accurate the generated maps.
computing resource: Processing long text may require high memory, at least 8GB RAM is recommended.
Update Maintenance: Check your GitHub repositories regularly to make sure you're using the latest version.

With these steps, you can easily get started with KGGen, extract structured knowledge from text and apply it to real projects.

KG Gen: an open source tool for automatic knowledge graph generation from plain text