AI个人学习
和实操指南

cognee:基于知识图谱构建的RAG开源框架,核心prompts学习

综合介绍

Cognee是一个专为AI应用和AI代理设计的可靠数据层解决方案。旨在加载和构建LLM(大型语言模型)上下文,通过知识图谱和向量存储创建准确和可解释的AI解决方案。该框架有利于成本节约、可解释性强和用户可引导控制,适合科研和教育用途。官方网站提供了入门教程、概念概述、学习材料和相关研究资讯。

cognee 最大的优势感觉就是丢给他数据,然后自动处理数据并建立知识图谱,并将有关联 topic 的图谱重新连接在一起,帮助你更好的发掘数据的关联以及 RAG for LLM 时提供极致的可解释性

1. 添加数据,基于 LLM 自动识别和处理数据,抽取成 Knowledge Graph 并可以存储 weaviate 向量数据库 2. 优点是:省钱、可解释性 - 图可视化数据、可控 - 整合进代码等

-1


 

功能列表

  • ECL管道:实现数据的提取、认知和加载,支持互连和检索历史数据。
  • 多数据库支持:支持PostgreSQL、Weaviate、Qdrant、Neo4j、Milvus等数据库。
  • 减少幻觉:通过优化管道设计,减少AI应用中的幻觉现象。
  • 开发者友好:提供详细的文档和示例,降低开发者的使用门槛。
  • 可扩展性:模块化设计,方便扩展和定制。

 

使用帮助

安装流程

  1. 使用pip安装
    pip install cognee
    

    或者安装特定数据库支持:

    pip install 'cognee[<database>]'
    

    例如,安装PostgreSQL和Neo4j支持:

    pip install 'cognee[postgres, neo4j]'
    
  2. 使用poetry安装
    poetry add cognee
    

    或者安装特定数据库支持:

    poetry add cognee -E <database>
    

    例如,安装PostgreSQL和Neo4j支持:

    poetry add cognee -E postgres -E neo4j
    

使用流程

  1. 设置API密钥
    import os
    os.environ["LLM_API_KEY"] = "YOUR_OPENAI_API_KEY"
    

    或者:

    import cognee
    cognee.config.set_llm_api_key("YOUR_OPENAI_API_KEY")
    
  2. 创建.env文件: 创建一个.env文件并设置API密钥:
    LLM_API_KEY=YOUR_OPENAI_API_KEY
    
  3. 使用不同的LLM提供商: 参考文档了解如何配置不同的LLM提供商。
  4. 可视化结果: 如果使用Network,创建Graphistry账户并配置:
    cognee.config.set_graphistry_config({
    "username": "YOUR_USERNAME",
    "password": "YOUR_PASSWORD"
    })
    

主要功能操作流程

  1. 数据提取: 使用Cognee的ECL管道提取数据,支持多种数据源和格式。
  2. 数据认知: 通过Cognee的认知模块处理和分析数据,减少幻觉现象。
  3. 数据加载: 将处理后的数据加载到目标数据库或存储中,支持多种数据库和向量存储。

特色功能操作流程

  1. 互连和检索历史数据: 使用Cognee的模块化设计,方便互连和检索过去的对话、文档和音频转录。
  2. 减少开发者工作量: 提供详细的文档和示例,降低开发者的使用门槛,减少开发时间和成本。

 

访问官网获取更多cognee框架信息
阅读概述掌握cognee理论基础
查看教程和学习材料开始使用

 

核心提示指令

classify_content:分类内容

You are a classification engine and should classify content. Make sure to use one of the existing classification options nad not invent your own.
The possible classifications are:
{
"Natural Language Text": {
"type": "TEXT",
"subclass": [
"Articles, essays, and reports",
"Books and manuscripts",
"News stories and blog posts",
"Research papers and academic publications",
"Social media posts and comments",
"Website content and product descriptions",
"Personal narratives and stories"
]
},
"Structured Documents": {
"type": "TEXT",
"subclass": [
"Spreadsheets and tables",
"Forms and surveys",
"Databases and CSV files"
]
},
"Code and Scripts": {
"type": "TEXT",
"subclass": [
"Source code in various programming languages",
"Shell commands and scripts",
"Markup languages (HTML, XML)",
"Stylesheets (CSS) and configuration files (YAML, JSON, INI)"
]
},
"Conversational Data": {
"type": "TEXT",
"subclass": [
"Chat transcripts and messaging history",
"Customer service logs and interactions",
"Conversational AI training data"
]
},
"Educational Content": {
"type": "TEXT",
"subclass": [
"Textbook content and lecture notes",
"Exam questions and academic exercises",
"E-learning course materials"
]
},
"Creative Writing": {
"type": "TEXT",
"subclass": [
"Poetry and prose",
"Scripts for plays, movies, and television",
"Song lyrics"
]
},
"Technical Documentation": {
"type": "TEXT",
"subclass": [
"Manuals and user guides",
"Technical specifications and API documentation",
"Helpdesk articles and FAQs"
]
},
"Legal and Regulatory Documents": {
"type": "TEXT",
"subclass": [
"Contracts and agreements",
"Laws, regulations, and legal case documents",
"Policy documents and compliance materials"
]
},
"Medical and Scientific Texts": {
"type": "TEXT",
"subclass": [
"Clinical trial reports",
"Patient records and case notes",
"Scientific journal articles"
]
},
"Financial and Business Documents": {
"type": "TEXT",
"subclass": [
"Financial reports and statements",
"Business plans and proposals",
"Market research and analysis reports"
]
},
"Advertising and Marketing Materials": {
"type": "TEXT",
"subclass": [
"Ad copies and marketing slogans",
"Product catalogs and brochures",
"Press releases and promotional content"
]
},
"Emails and Correspondence": {
"type": "TEXT",
"subclass": [
"Professional and formal correspondence",
"Personal emails and letters"
]
},
"Metadata and Annotations": {
"type": "TEXT",
"subclass": [
"Image and video captions",
"Annotations and metadata for various media"
]
},
"Language Learning Materials": {
"type": "TEXT",
"subclass": [
"Vocabulary lists and grammar rules",
"Language exercises and quizzes"
]
},
"Audio Content": {
"type": "AUDIO",
"subclass": [
"Music tracks and albums",
"Podcasts and radio broadcasts",
"Audiobooks and audio guides",
"Recorded interviews and speeches",
"Sound effects and ambient sounds"
]
},
"Image Content": {
"type": "IMAGE",
"subclass": [
"Photographs and digital images",
"Illustrations, diagrams, and charts",
"Infographics and visual data representations",
"Artwork and paintings",
"Screenshots and graphical user interfaces"
]
},
"Video Content": {
"type": "VIDEO",
"subclass": [
"Movies and short films",
"Documentaries and educational videos",
"Video tutorials and how-to guides",
"Animated features and cartoons",
"Live event recordings and sports broadcasts"
]
},
"Multimedia Content": {
"type": "MULTIMEDIA",
"subclass": [
"Interactive web content and games",
"Virtual reality (VR) and augmented reality (AR) experiences",
"Mixed media presentations and slide decks",
"E-learning modules with integrated multimedia",
"Digital exhibitions and virtual tours"
]
},
"3D Models and CAD Content": {
"type": "3D_MODEL",
"subclass": [
"Architectural renderings and building plans",
"Product design models and prototypes",
"3D animations and character models",
"Scientific simulations and visualizations",
"Virtual objects for AR/VR environments"
]
},
"Procedural Content": {
"type": "PROCEDURAL",
"subclass": [
"Tutorials and step-by-step guides",
"Workflow and process descriptions",
"Simulation and training exercises",
"Recipes and crafting instructions"
]
}
}

generate_cog_layers:生成认知层

You are tasked with analyzing `{{ data_type }}` files, especially in a multilayer network context for tasks such as analysis, categorization, and feature extraction. Various layers can be incorporated to capture the depth and breadth of information contained within the {{ data_type }}.

These layers can help in understanding the content, context, and characteristics of the `{{ data_type }}`.

Your objective is to extract meaningful layers of information that will contribute to constructing a detailed multilayer network or knowledge graph.

Approach this task by considering the unique characteristics and inherent properties of the data at hand.

VERY IMPORTANT: The context you are working in is `{{ category_name }}` and the specific domain you are extracting data on is `{{ category_name }}`.

Guidelines for Layer Extraction:
Take into account: The content type, in this case, is: `{{ category_name }}`, should play a major role in how you decompose into layers.

Based on your analysis, define and describe the layers you've identified, explaining their relevance and contribution to understanding the dataset. Your independent identification of layers will enable a nuanced and multifaceted representation of the data, enhancing applications in knowledge discovery, content analysis, and information retrieval.

generate_graph_prompt:生成图形提示

You are a top-tier algorithm
designed for extracting information in structured formats to build a knowledge graph.
- **Nodes** represent entities and concepts. They're akin to Wikipedia nodes.
- **Edges** represent relationships between concepts. They're akin to Wikipedia links.
- The aim is to achieve simplicity and clarity in the
knowledge graph, making it accessible for a vast audience.
YOU ARE ONLY EXTRACTING DATA FOR COGNITIVE LAYER `{{ layer }}`
## 1. Labeling Nodes
- **Consistency**: Ensure you use basic or elementary types for node labels.
- For example, when you identify an entity representing a person,
always label it as **"Person"**.
Avoid using more specific terms like "mathematician" or "scientist".
- Include event, entity, time, or action nodes to the category.
- Classify the memory type as episodic or semantic.
- **Node IDs**: Never utilize integers as node IDs.
Node IDs should be names or human-readable identifiers found in the text.
## 2. Handling Numerical Data and Dates
- Numerical data, like age or other related information,
should be incorporated as attributes or properties of the respective nodes.
- **No Separate Nodes for Dates/Numbers**:
Do not create separate nodes for dates or numerical values.
Always attach them as attributes or properties of nodes.
- **Property Format**: Properties must be in a key-value format.
- **Quotation Marks**: Never use escaped single or double quotes within property values.
- **Naming Convention**: Use snake_case for relationship names, e.g., `acted_in`.
## 3. Coreference Resolution
- **Maintain Entity Consistency**:
When extracting entities, it's vital to ensure consistency.
If an entity, such as "John Doe", is mentioned multiple times
in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
always use the most complete identifier for that entity throughout the knowledge graph.
In this example, use "John Doe" as the entity ID.
Remember, the knowledge graph should be coherent and easily understandable,
so maintaining consistency in entity references is crucial.
## 4. Strict Compliance
Adhere to the rules strictly. Non-compliance will result in termination"""

 

read_query_prompt:阅读查询提示

from os import path
import logging
from cognee.root_dir import get_absolute_path

def read_query_prompt(prompt_file_name: str):
"""Read a query prompt from a file."""
try:
file_path = path.join(get_absolute_path("./infrastructure/llm/prompts"), prompt_file_name)

with open(file_path, "r", encoding = "utf-8") as file:
return file.read()
except FileNotFoundError:
logging.error(f"Error: Prompt file not found. Attempted to read: %s {file_path}")
return None
except Exception as e:
logging.error(f"An error occurred: %s {e}")
return None

 

render_prompt:渲染提示

from jinja2 import Environment, FileSystemLoader, select_autoescape
from cognee.root_dir import get_absolute_path

def render_prompt(filename: str, context: dict) -> str:
"""Render a Jinja2 template asynchronously.
:param filename: The name of the template file to render.
:param context: The context to render the template with.
:return: The rendered template as a string."""

# Set the base directory relative to the cognee root directory
base_directory = get_absolute_path("./infrastructure/llm/prompts")

# Initialize the Jinja2 environment to load templates from the filesystem
env = Environment(
loader = FileSystemLoader(base_directory),
autoescape = select_autoescape(["html", "xml", "txt"])
)

# Load the template by name
template = env.get_template(filename)

# Render the template with the provided context
rendered_template = template.render(context)

return rendered_template

 

summarize_content:总结内容

You are a summarization engine and you should sumamarize content. Be brief and concise
未经允许不得转载:首席AI分享圈 » cognee:基于知识图谱构建的RAG开源框架,核心prompts学习

首席AI分享圈

首席AI分享圈专注于人工智能学习,提供全面的AI学习内容、AI工具和实操指导。我们的目标是通过高质量的内容和实践经验分享,帮助用户掌握AI技术,一起挖掘AI的无限潜能。无论您是AI初学者还是资深专家,这里都是您获取知识、提升技能、实现创新的理想之地。

联系我们
zh_CN简体中文