DeepSeek Hands-on: Knowledge Graph Construction in Three Steps - Single Extraction, Multiple Fusion, Topic Generation

AI hands-on tutorials6mos agorelease AI Sharing Circle

3.4K 00

Question: Knowledge graphs are important, DeepSeek language model is hot, can it be used to build knowledge graphs quickly? I want to try it. DeepSeek of the real thing to see how well it does at extracting information, integrating knowledge, and creating graphs out of thin air.

Methods: I did three experiments to test DeepSeek's knowledge graph building capabilities:

Single article build mapping: Give DeepSeek an article and see if it can accurately grab the information and build a graph.
Multiple article fusion mapping: Give DeepSeek multiple articles to see if it can add new knowledge to an existing graph.
Theme Generation Mapping: Just tell DeepSeek the topic of the graph, without giving the article, and see if it can come up with a graph on its own.

Results: Experiments prove that DeepSeek builds and fuses knowledge graphs well, but there is still room for improvement.

Here's the exact procedure and results of the experiment, which I'll make clear in simpler terms.

I. Extracting Knowledge from a Single Article to Build a Graph

Purpose: Test DeepSeek's ability to extract knowledge from an article and build a knowledge graph.

Process: I found an article on NebulaGraph graph databases and had DeepSeek read the article, extract the key information, and then generate NebulaGraph database code (nGQL) to create a knowledge graph.

Instructions: "Can you help me generate a knowledge graph? I'll give you the article, you extract the key information and build the knowledge graph with nGQL code."

Operation: Copy and paste the article content directly to DeepSeek.

DeepSeek generated nGQL code:

# Schema 定义（TAG 类型）
CREATE TAG IF NOT EXISTS product (
name STRING,
description STRING
);
CREATE TAG IF NOT EXISTS company (
name STRING
);
CREATE TAG IF NOT EXISTS feature (
name STRING
);
CREATE TAG IF NOT EXISTS tool (
name STRING
);
CREATE TAG IF NOT EXISTS scenario (
name STRING
);
CREATE TAG IF NOT EXISTS language (
name STRING
);
CREATE TAG IF NOT EXISTS auth_service (
name STRING
);
CREATE TAG IF NOT EXISTS ecosystem (
name STRING
);
# Schema 定义（EDGE 类型）
CREATE EDGE IF NOT EXISTS has_feature ();
CREATE EDGE IF NOT EXISTS used_by ();
CREATE EDGE IF NOT EXISTS provides_tool ();
CREATE EDGE IF NOT EXISTS supports_language ();
CREATE EDGE IF NOT EXISTS applicable_scenario ();
CREATE EDGE IF NOT EXISTS compatible_with ();
CREATE EDGE IF NOT EXISTS supports_auth ();
CREATE EDGE IF NOT EXISTS integrates_with ();
# 数据插入 - 核心产品
INSERT VERTEX product(name, description) VALUES
"NebulaGraph": ("NebulaGraph", "开源的分布式图数据库，支持千亿点万亿边数据集");
# 数据插入 - 产品特性
INSERT VERTEX feature(name) VALUES
"high_performance": ("高性能"),
"open_source": ("开源"),
"scalability": ("易扩展"),
"flexible_modeling": ("灵活数据建模"),
"read_write_balance": ("读写平衡"),
"compatibility": ("查询语言兼容性");
/* ... 此处省略 n多数据 ... */

Results: The DeepSeek code is fine, the syntax is correct, and it synthesizes multiple pieces of data into a single statement, which is efficient. The code is put into NebulaGraph to run, the plot looks like this:

The graph is centered on the NebulaGraph, which displays relevant information.

II. Integration of knowledge across multiple articles

Purpose: Testing DeepSeek's ability to fuse knowledge from multiple articles to extend an existing graph.

Process: First build the graph with one article, then give DeepSeek more articles about NebulaGraph (from Baidu's encyclopedia) to merge the new knowledge into the existing graph.

Instructions: "Next, another article for you. Try to blend the mapping with the current table structure. To change the table structure use the ALTER command."

Operation: The summaries of NebulaGraph and 360 Encyclopedia entries were given to DeepSeek, respectively.

The nGQL code generated by DeepSeek (NebulaGraph lexicon):

# 结构变更 1：为 product 标签增加版本和时间属性
ALTER TAG product ADD (
license STRING,
current_version STRING,
release_date STRING,
last_updated STRING
);
# 结构变更 2：新增开发关系类型
CREATE EDGE IF NOT EXISTS developed_by();
# 数据更新 - 补充产品属性
INSERT VERTEX product(name, description, license, current_version, release_date, last_updated)
VALUES
"NebulaGraph": ("NebulaGraph", "分布式图数据库", "Apache 2.0", "v3.3.0", "2019-05-15", "2022-11-14");
# 新增开发公司（复用 company 标签）
INSERT VERTEX company(name)
VALUES
"Yueshu": ("杭州悦数科技有限公司");
# 新增特性（复用 feature 标签）
INSERT VERTEX feature(name)
VALUES
"acl_mechanism": ("ACL 机制"),
"linear_scaling": ("线性扩容"),
"data_recovery": ("数据恢复"),
"htap": ("HTAP 混合负载"),
"nGQL_language": ("nGQL 查询语言");
# 关系建立（复用已有关系类型）
INSERT EDGE has_feature()
VALUES
"NebulaGraph"->"acl_mechanism":(),
"NebulaGraph"->"linear_scaling":(),
"NebulaGraph"->"data_recovery":(),
"NebulaGraph"->"htap":(),
"NebulaGraph"->"nGQL_language":();
# 新增开发关系
INSERT EDGE developed_by()
VALUES
"NebulaGraph"->"Yueshu":();

DeepSeek generated nGQL code (360 words):

# 结构变更 1：为 company 标签增加创始人、成立时间和收入属性
ALTER TAG company ADD (founder STRING, founded_date STRING, revenue DOUBLE);
# 结构变更 2：新增事件类型 TAG
CREATE TAG IF NOT EXISTS event (
type STRING,
date STRING,
amount DOUBLE,
details STRING
);
# 数据更新：补充 360 公司信息
INSERT VERTEX company(name, founder, founded_date, revenue)
VALUES
"360": ("奇虎360", "周鸿祎", "2005-09", 116.15);
# 数据插入：产品生态（复用 product 标签）
INSERT VERTEX product(name)
VALUES
"360_Security_Guard": ("360安全卫士"),
"360_Browser": ("360安全浏览器"),
"360_Cloud": ("360安全云");
# 数据插入：重大事件（使用新增 event 标签）
INSERT VERTEX event(type, date, amount, details)
VALUES
"acquisition_360.com": ("域名收购", "2015-02-04", 1700.0, "1.1亿人民币收购 360.com"),
"privatization": ("私有化", "2015-12", 9300.0, "93 亿美元私有化交易");
# 关系建立：复用 developed_by 边连接产品

Results: DeepSeek can modify the table structure based on the new article (e.g. by giving the product cap (a poem) company table plus fields) and also added a new relationship type. It does this as required with the ALTER command to change the table structure. The small problem is that the comment uses the --nGQL does not recognize it, change it manually. # On the line.

The code is put into the database for execution, and the fused mapping works:

Summary: DeepSeek is good for building graphs and fusing knowledge. However, the server is sometimes slow, and NebulaGraph changes the schema asynchronously and takes a while to take effect.

III. Theme generation: mapping agriculture from nothing

Purpose: Test DeepSeek's ability to generate knowledge graphs based on topics without articles.

Process: Create a new dialog and let DeepSeek generate the mapping code directly for the topic "Agriculture". No article is given, it's a "free-for-all".

Instructions: "I want to build an agricultural knowledge graph. Use NebulaGraph database and generate nGQL script. Table names are big hump, attribute names are small hump. schema names are wrapped with `` to avoid conflicts. Generate at least 3000 nodes."

DeepSeek generated nGQL code (partial):

# ======================================= #
# 创建图空间
# ======================================= #
CREATE SPACE `Agriculture` (vid_type = FIXED_STRING(32));
USE `Agriculture`;
# ======================================= #
# 定义标签 Schema（大驼峰命名）
# ======================================= #
CREATE TAG IF NOT EXISTS `Crop` (
`cropId` int,
`cropName` string,
`growthPeriod` int,
`suitableTemp` string,
`yieldPerMu` float
);
# ======================================= #
# 此处省略好多脚本
# ======================================= #

Problems and improvements:

Node count limit: Started with 3000 nodes, DeepSeek refused and gave Python to import the CSV code. I didn't want to use Python, so I reduced the number of nodes.
The annotation problem comes back: The code comments are again --I would like to point out the problem again.

Improvement Instructions: "Use # for comments, no Python code, 3000 nodes is too many. Just give me the ngql script for 50 nodes."

Follow-up dialog and instructions: To refine the atlas, I continued to talk to DeepSeek, asking it to supplement the data, strengthen the associations, organize the atlas by taxa (phylum, order, family, genus, and species), and also ask it to generate crop rotation data.

For example, my instructions:

"Supplemental data for stronger data correlation."
"Make an atlas of these classifications [of phylums, orders, families, genera and species]."
"Identify contraindications and gain crops in the rotation of existing crops."
"Combining mapped crop tissue data to give nGQL scripts in the same format as before"

Experimental Interlude: DeepSeek, once. INSERT statement uses Cypher syntax, which is not supported by nGQL, and when pointed out it was changed.

Instructions: "This insert statement is not nGQL syntax. Change it so that DDL comes first and DML comes second."

Final data volume: After a few rounds of dialog, the amount of data is shown:

Mapping effects: Expand a few random nodes and take a look:

Examples of yield-enhancing combinations of rotational species: Yield-enhancing combinatorial effects of adventitious planting:

IV. Summary

Conclusion: DeepSeek excels at knowledge graph construction and fusion, and experiments prove its capabilities:

Extracting information is fast and accurate: DeepSeek quickly extracts key information from text, generates compliant nGQL scripts, and has strong language comprehension to recognize entities, relationships, and events.
Strong ability to integrate knowledge: DeepSeek fuses knowledge from multiple articles well, and can expand and update the graph based on new articles to ensure graph completeness and accuracy.
You can build a map from nothing: No articles can generate charts by topic. There are some syntax hiccups in the generation process, but adjustments produce passable scripts.
Details need to be optimized: Scripts generated by DeepSeek occasionally have syntax issues, such as incorrect comments. When generating a large number of nodes, the server may be slow to respond. You need to pay attention to these problems when you actually use it.