For ease of distinction, this paper will Dify Knowledge repositories outside the Platform are collectively referred to as "external knowledge repositories".
Function Introduction
Dify's built-in knowledge base functionality and text retrieval mechanism may not meet the needs of some advanced developers who may require more precise control over text recall results.
Some teams choose to self-research RAG algorithms and maintain a text recall system independently, or use a knowledge base service provided by a cloud provider (e.g., AWS Bedrock).
Dify, as an open LLM application development platform, wants to give developers more options.
Dify can connect to external knowledge bases through the "Connect to External Knowledge Bases" feature. This gives AI applications access to more sources of information.
Specifically, there are the following advantages:
- Dify can directly access text hosted in the cloud provider's knowledge base, eliminating the need for developers to copy content into Dify's knowledge base.
- Dify can directly access the algorithmically processed text in the self-built knowledge base, and developers only need to focus on optimizing the information retrieval mechanism to improve recall accuracy.
- Compared to using cloud vendors' knowledge base services directly, Dify provides more flexible application layer integration capabilities, making it easy for developers to build diverse AI applications.
The following figure illustrates the principle of connecting to an external knowledge base:
Connection steps
1. Establishment of a compliant external knowledge base API
Be sure to read the External Knowledge Base API specification written by Dify carefully before setting up your API service.
2. Associate external knowledge base APIs
Please note that Dify currently only supports retrieving external knowledge bases, not modifying them. Developers need to maintain external knowledge bases on their own.
Go to the "Knowledge Base" page, click "External Knowledge Base API" in the upper right corner, and then click "Add External Knowledge Base API".
Follow the page prompts to fill out the form:
- Knowledge base name: Can be customized to differentiate between different external knowledge base APIs.
- API interface address: The address of the link to the external knowledge base, e.g.
api-endpoint/retrieval
. Refer to the External Knowledge Base API for detailed instructions. - API Key: The connection key for the external knowledge base, see External Knowledge Base API for details.
3. Connecting to external knowledge bases
On the "Knowledge Base" page, click "Connect to external knowledge base" under "Add knowledge base" to enter the parameter configuration page.
Fill in the following parameters:
- Knowledge Base Name and Description
- External Knowledge Base API: Select the external knowledge base API associated in step 2. Dify will call the text content of the external knowledge base through the API connection.
- External Knowledge Base ID: Specify the ID of the external knowledge base to be associated with, refer to External Knowledge Base API for details.
- Adjust recall settings:
- Top K: The larger the value, the more text fragments are recalled. It is recommended to start experimenting with smaller values and gradually increase them until the optimal balance is found.
- Score Threshold: The higher the value, the more relevant the recalled text segments are to the question, but the number decreases. It is recommended to start with a higher value and gradually decrease it to get a sufficient amount of relevant text.
4. Test connections and recalls
After the connection is established, you can simulate the problem keywords in "Recall Test" and preview the text fragments recalled from the external knowledge base. If you are not satisfied with the result, you can try to modify the recall parameters or adjust the search settings of the external knowledge base.
5. Integration within applications
- Chatbot / Agent type application: In the Context on the Organizer page, select the screen with the
EXTERNAL
Tagged external knowledge base.
- Chatflow / Workflow type application: Add the Knowledge Retrieval node, select the node with the
EXTERNAL
Tagged external knowledge base.
6. Managing the external knowledge base
On the Knowledge Base page, the External Knowledge Base card will have the following text in the upper right corner of the card EXTERNAL
Tab. Enter the knowledge base you want to modify and click "Settings" to modify it:
- Knowledge base name and description
- Visible scope ("Only me", "All team members" and "Some team members"). Members without permissions cannot access the knowledge base.
- Recall settings (Top K and Score thresholds)
Note: It is not possible to modify the associated External Knowledge Base API and External Knowledge ID, if you want to modify them, please associate a new External Knowledge Base API and reconnect it.
Connection example: How do I connect to the AWS Bedrock Knowledge Base?
This paper will outline how the Dify platform can be connected to the AWS Bedrock Knowledge Base via an external knowledge base API, enabling AI applications within the Dify platform to directly access content stored in the AWS Bedrock Knowledge Base, expanding access to new sources of information.
pre-positioning
- AWS Bedrock Knowledge Base
- Dify SaaS Services / Dify Community Edition
- Backend API Development Basics
1. Register and create AWS Bedrock Knowledge Base
Visit AWS Bedrock to create a Knowledge Base service.
2. Build back-end API services
Dify platform can't connect to AWS Bedrock Knowledge Base directly yet, it needs the development team to refer to Dify's API definition on external knowledge base connection, and manually create back-end API service to establish connection with AWS Bedrock. Please refer to the architecture diagram for details:
You can refer to the following 2 code files to build the backend service API.
knowledge.py
from flask import request
from flask_restful import Resource, reqparse
from bedrock.knowledge_service import ExternalDatasetService
class BedrockRetrievalApi(Resource).
# url : /retrieval
def post(self): parser = reqparse
parser = reqparse.RequestParser()
parser.add_argument("retrieval_setting", nullable=False, required=True, type=dict, location="json")
parser.add_argument("query", nullable=False, required=True, type=str,)
parser.add_argument("knowledge_id", nullable=False, required=True, type=str)
args = parser.parse_args()
# Authorization check
auth_header = request.headers.get("Authorization")
if " " not in auth_header.
return {
"error_code": 1001, "error_msg": 1001, "error_code": 1001, "error_msg": 1001
"error_msg": "Invalid Authorization header format. Expected 'Bearer ' format."
}, 403
auth_scheme, auth_token = auth_header.split(None, 1)
auth_scheme = auth_scheme.lower()
if auth_scheme ! = "bearer".
return {
"error_code": 1001, "error_msg": 1001
"error_msg": "Invalid Authorization header format. Expected 'Bearer ' format."
}, 403
if auth_token: "Expected 'Bearer ' format.
# process your authorization logic here
pass
# Call the knowledge retrieval result = ExternalDatasetService.knowledge_retrieval()
result = ExternalDatasetService.knowledge_retrieval(
args["retrieval_setting"], args["query"], args["knowledge_id"]
)
return result, 200
knowledge_service.py
import boto3
class ExternalDatasetService.
@staticmethod
def knowledge_retrieval(retrieval_setting: dict, query: str, knowledge_id: str):
# get bedrock client
client = boto3.client(
"bedrock-agent-runtime",
aws_secret_access_key="AWS_SECRET_ACCESS_KEY",
aws_access_key_id="AWS_ACCESS_KEY_ID",
# example: us-east-1
region_name="AWS_REGION_NAME",
)
# fetch external knowledge retrieval
response = client.retrieve(
knowledgeBaseId=knowledge_id,
retrievalConfiguration={
"vectorSearchConfiguration": {"numberOfResults": retrieval_setting.get("top_k"), "overrideSearchType": "HYBRID"}
},
retrievalQuery={"text": query},
)
# parse response
results = []
if response.get("retrievalResults")::
retrieval_results = response.get("retrievalResults")
for retrieval_result in retrieval_results: # filter out results with
# filter out results with score less than threshold
if retrieval_result.get("score") < retrieval_setting.get("score_threshold", .0):: # filter out results with score less than threshold
continue
result = {
"metadata": retrieval_result.get("metadata"),
"score": retrieval_result.get("score"),
"title": retrieval_result.get("metadata").get("x-amz-bedrock-kb-source-uri"),
"content": retrieval_result.get("content").get("text"),
}
results.append(result)
return {
"records": results
}
In this process, you can build the API interface address and the API Key for authentication and subsequent connections.
3. Obtaining an AWS Bedrock Knowledge Base ID
Log in to the AWS Bedrock Knowledge backend and get the ID of the Knowledge Base that has been created. this parameter will be used in subsequent steps to connect to the Dify platform.
4. Linked External Knowledge API
Go to the Dify platform "Knowledge base" page, click on the upper right corner of the "External Knowledge Base API"Lighten up. "Add external knowledge base API"The
Follow the page prompts and fill out the following in order:
- The name of the knowledge base, allowing for a customized name that can be used to differentiate between the different external knowledge APIs connected within the Dify platform;
- API interface address, the connection address of the external knowledge base, can be customized in the second step. Example
api-endpoint/retrieval
.; - API Key, the external knowledge base connection key, can be customized in step two.
5. Connecting to external knowledge bases
leave for "Knowledge base" page, click the Add Knowledge Base card below the "Connecting to external knowledge bases" Jump to the parameter configuration page.
Fill in the following parameters:
- Knowledge Base Name and Description
- External Knowledge Base APISelect the external knowledge base API associated in step 4
- External Knowledge Base IDFill in the AWS Bedrock knowledge base ID obtained in step 3
- Adjusting Recall SettingsTop K: When a user initiates a question, an external knowledge API will be requested to obtain highly relevant content segments. This parameter is used to filter text segments that have a high degree of similarity to the user's question. The default value is 3. The higher the value, the more relevant text segments will be recalled.
Score Threshold: the similarity threshold for text fragment filtering, only the text fragments exceeding the set score will be recalled, the default value is 0.5. The higher the value, the higher the similarity between the text and the question, the less the number of text is expected to be recalled, and the result will be more accurate in relative terms.
Once the settings are complete, you can establish a connection to the external Knowledge Base API.
6. Testing external knowledge base connections and recalls
After establishing a connection to an external knowledge base, a developer can "Recall testing." Model possible problem keywords in the preview of text segments recalled from AWS Bedrock Knowledge Base.
If you are not satisfied with the results of the recall, you can try modifying the recall parameters or adjusting the AWS Bedrock Knowledge Base search settings yourself.
common problems
What if I get an error connecting to the external Knowledge Base API?
Below are the error codes and corresponding solutions:
error code | false | method settle an issue |
---|---|---|
1001 | Invalid Authorization header format | Check the format of the request's Authorization header |
1002 | validate anomalies | Check if the API Key is correct |
2001 | Knowledge base does not exist | Checking the external knowledge base |
External Knowledge Base API Specification
starting point or ending point (in stories etc)
POST /retrieval
request header
This API is used to connect to independently maintained knowledge bases within a team. For more guidance on how to do this, see Connecting to External Knowledge Bases.
can be found in the HTTP request header's Authorization
fields using the API-Key
to authenticate permissions. The authentication logic is defined by you in the Retrieval API, as follows:
Authorization: Bearer {API_KEY}
requestor
The request accepts data in the following JSON format:
causality | mandatory field | typology | descriptive | example value |
---|---|---|---|---|
knowledge_id | be | string (computer science) | Knowledge Base Unique ID | AAA-BBB-CCC |
query | be | string (computer science) | User's query | What's Dify? |
retrieval_setting | be | boyfriend | Knowledge retrieval parameters | see below |
retrieval_setting
attribute contains the following keys:
causality | mandatory field | typology | descriptive | example value |
---|---|---|---|---|
top_k | be | integer (math.) | Maximum number of search results | 5 |
score_threshold | be | floating point | Score limit for relevance of results to the query, range: 0~1 | 0.5 |
Example of a request
POST /retrieval HTTP/1.1
Content-Type: application/json
Authorization: Bearer your-api-key
{
"knowledge_id": "your-knowledge-id",
"query": "your-question",
"retrieval_setting": {
"top_k": 2, "score_threshold": {
"score_threshold": 0.5
}
}
response body
If the operation is successful, the service returns an HTTP 200 response with the following data in JSON format:
causality | mandatory field | typology | descriptive | example value |
---|---|---|---|---|
records | be | object list | List of records queried from the knowledge base | see below |
records
attribute is a list of objects containing the following keys:
causality | mandatory field | typology | descriptive | example value |
---|---|---|---|---|
content | be | string (computer science) | Text blocks in the knowledge base | Dify: GenAI Application Development Platform |
score | be | floating point | Correlation score between results and query, range: 0~1 | 0.98 |
title | be | string (computer science) | Document Title | About Dify |
metadata | clogged | JSON | Metadata attributes and their values for documents in the data source | See example |
Response Example
HTTP/1.1 200
Content-Type: application/json
{
"records": [
{
"metadata": {
"path": "s3://dify/knowledge.txt",
"description": "dify knowledge document"
},
"score": 0.98, "title": "knowledge.txt".
"title": "knowledge.txt", "content": "This is an external document".
"content": "This is the external knowledge document."
},
{
"metadata": {
"path": "s3://dify/introduce.txt", {
"description": "Introducing dify."
},
"title": "introduce.txt", "content": "GenAI Apps": "GenAI Apps".
"content": "Innovation engine for GenAI applications"
}
]
}
incorrect
If the operation fails, the service returns the following error message (in JSON format):
causality | mandatory field | typology | descriptive | example value |
---|---|---|---|---|
error_code | be | integer (math.) | error code | 1001 |
error_msg | be | string (computer science) | API Exception Description | Invalid Authorization header format. |
error_code
Attribute Type:
coding | descriptive |
---|---|
1001 | Invalid Authorization header format |
1002 | Authorization Failure |
2001 | Knowledge base does not exist |
HTTP Status Code
- AccessDeniedException: Lack of access rights. (HTTP status code: 403)
- InternalServerException: Internal server error. (HTTP status code: 500)