AI Personal Learning
and practical guidance
CyberKnife Drawing Mirror

Dify Connecting to External Knowledge Bases Tutorial

For ease of distinction, this paper will Dify Knowledge repositories outside the Platform are collectively referred to as "external knowledge repositories".

Function Introduction

Dify's built-in knowledge base functionality and text retrieval mechanism may not meet the needs of some advanced developers who may require more precise control over text recall results.

Some teams choose to self-research RAG algorithms and maintain a text recall system independently, or use a knowledge base service provided by a cloud provider (e.g., AWS Bedrock).


Dify, as an open LLM application development platform, wants to give developers more options.

Dify can connect to external knowledge bases through the "Connect to External Knowledge Bases" feature. This gives AI applications access to more sources of information.

Specifically, there are the following advantages:

  • Dify can directly access text hosted in the cloud provider's knowledge base, eliminating the need for developers to copy content into Dify's knowledge base.
  • Dify can directly access the algorithmically processed text in the self-built knowledge base, and developers only need to focus on optimizing the information retrieval mechanism to improve recall accuracy.
  • Compared to using cloud vendors' knowledge base services directly, Dify provides more flexible application layer integration capabilities, making it easy for developers to build diverse AI applications.

The following figure illustrates the principle of connecting to an external knowledge base:

Dify Connecting to External Knowledge Bases Tutorial-1

 

Connection steps

1. Establishment of a compliant external knowledge base API

Be sure to read the External Knowledge Base API specification written by Dify carefully before setting up your API service.

2. Associate external knowledge base APIs

Please note that Dify currently only supports retrieving external knowledge bases, not modifying them. Developers need to maintain external knowledge bases on their own.

Go to the "Knowledge Base" page, click "External Knowledge Base API" in the upper right corner, and then click "Add External Knowledge Base API".

Follow the page prompts to fill out the form:

  • Knowledge base name: Can be customized to differentiate between different external knowledge base APIs.
  • API interface address: The address of the link to the external knowledge base, e.g. api-endpoint/retrieval. Refer to the External Knowledge Base API for detailed instructions.
  • API Key: The connection key for the external knowledge base, see External Knowledge Base API for details.

Dify Connecting to External Knowledge Bases Tutorial-1

3. Connecting to external knowledge bases

On the "Knowledge Base" page, click "Connect to external knowledge base" under "Add knowledge base" to enter the parameter configuration page.

Dify Connecting to External Knowledge Bases Tutorial-1

Fill in the following parameters:

  • Knowledge Base Name and Description
  • External Knowledge Base API: Select the external knowledge base API associated in step 2. Dify will call the text content of the external knowledge base through the API connection.
  • External Knowledge Base ID: Specify the ID of the external knowledge base to be associated with, refer to External Knowledge Base API for details.
  • Adjust recall settings:
    • Top K: The larger the value, the more text fragments are recalled. It is recommended to start experimenting with smaller values and gradually increase them until the optimal balance is found.
    • Score Threshold: The higher the value, the more relevant the recalled text segments are to the question, but the number decreases. It is recommended to start with a higher value and gradually decrease it to get a sufficient amount of relevant text.

Dify Connecting to External Knowledge Bases Tutorial-1

4. Test connections and recalls

After the connection is established, you can simulate the problem keywords in "Recall Test" and preview the text fragments recalled from the external knowledge base. If you are not satisfied with the result, you can try to modify the recall parameters or adjust the search settings of the external knowledge base.

Dify Connecting to External Knowledge Bases Tutorial-1

5. Integration within applications

  • Chatbot / Agent type application: In the Context on the Organizer page, select the screen with the EXTERNAL Tagged external knowledge base.

Dify Connecting to External Knowledge Bases Tutorial-1

  • Chatflow / Workflow type application: Add the Knowledge Retrieval node, select the node with the EXTERNAL Tagged external knowledge base.

Dify Connecting to External Knowledge Bases Tutorial-1

6. Managing the external knowledge base

On the Knowledge Base page, the External Knowledge Base card will have the following text in the upper right corner of the card EXTERNAL Tab. Enter the knowledge base you want to modify and click "Settings" to modify it:

  • Knowledge base name and description
  • Visible scope ("Only me", "All team members" and "Some team members"). Members without permissions cannot access the knowledge base.
  • Recall settings (Top K and Score thresholds)

Note: It is not possible to modify the associated External Knowledge Base API and External Knowledge ID, if you want to modify them, please associate a new External Knowledge Base API and reconnect it.

 

Connection example: How do I connect to the AWS Bedrock Knowledge Base?

This paper will outline how the Dify platform can be connected to the AWS Bedrock Knowledge Base via an external knowledge base API, enabling AI applications within the Dify platform to directly access content stored in the AWS Bedrock Knowledge Base, expanding access to new sources of information.

pre-positioning

  • AWS Bedrock Knowledge Base
  • Dify SaaS Services / Dify Community Edition
  • Backend API Development Basics

1. Register and create AWS Bedrock Knowledge Base

Visit AWS Bedrock to create a Knowledge Base service.

Dify Connecting to External Knowledge Bases Tutorial-1

Create an AWS Bedrock Knowledge Base

2. Build back-end API services

Dify platform can't connect to AWS Bedrock Knowledge Base directly yet, it needs the development team to refer to Dify's API definition on external knowledge base connection, and manually create back-end API service to establish connection with AWS Bedrock. Please refer to the architecture diagram for details:

Dify Connecting to External Knowledge Bases Tutorial-1

Building Backend API Services

You can refer to the following 2 code files to build the backend service API.

knowledge.py

from flask import request
from flask_restful import Resource, reqparse

from bedrock.knowledge_service import ExternalDatasetService

class BedrockRetrievalApi(Resource).
    # url : /retrieval
    def post(self): parser = reqparse
        parser = reqparse.RequestParser()
        parser.add_argument("retrieval_setting", nullable=False, required=True, type=dict, location="json")
        parser.add_argument("query", nullable=False, required=True, type=str,)
        parser.add_argument("knowledge_id", nullable=False, required=True, type=str)
        args = parser.parse_args()

        # Authorization check
        auth_header = request.headers.get("Authorization")
        if " " not in auth_header.
            return {
                "error_code": 1001, "error_msg": 1001, "error_code": 1001, "error_msg": 1001
                "error_msg": "Invalid Authorization header format. Expected 'Bearer ' format."
            }, 403
        auth_scheme, auth_token = auth_header.split(None, 1)
        auth_scheme = auth_scheme.lower()
        if auth_scheme ! = "bearer".
            return {
                "error_code": 1001, "error_msg": 1001
                "error_msg": "Invalid Authorization header format. Expected 'Bearer ' format."
            }, 403
        if auth_token: "Expected 'Bearer ' format.
            # process your authorization logic here
            pass

        # Call the knowledge retrieval result = ExternalDatasetService.knowledge_retrieval()
        result = ExternalDatasetService.knowledge_retrieval(
            args["retrieval_setting"], args["query"], args["knowledge_id"]
        )
        return result, 200

knowledge_service.py

import boto3

class ExternalDatasetService.
    @staticmethod
    def knowledge_retrieval(retrieval_setting: dict, query: str, knowledge_id: str):
        # get bedrock client
        client = boto3.client(
            "bedrock-agent-runtime",
            aws_secret_access_key="AWS_SECRET_ACCESS_KEY",
            aws_access_key_id="AWS_ACCESS_KEY_ID",
            # example: us-east-1
            region_name="AWS_REGION_NAME",
        )
        # fetch external knowledge retrieval
        response = client.retrieve(
            knowledgeBaseId=knowledge_id,
            retrievalConfiguration={
                "vectorSearchConfiguration": {"numberOfResults": retrieval_setting.get("top_k"), "overrideSearchType": "HYBRID"}
            },
            retrievalQuery={"text": query},
        )
        # parse response
        results = []
        
            if response.get("retrievalResults")::
                retrieval_results = response.get("retrievalResults")
                for retrieval_result in retrieval_results: # filter out results with
                    # filter out results with score less than threshold
                    if retrieval_result.get("score") < retrieval_setting.get("score_threshold", .0):: # filter out results with score less than threshold
                        continue
                    result = {
                        "metadata": retrieval_result.get("metadata"),
                        "score": retrieval_result.get("score"),
                        "title": retrieval_result.get("metadata").get("x-amz-bedrock-kb-source-uri"),
                        "content": retrieval_result.get("content").get("text"),
                    }
                    results.append(result)
        return {
            "records": results
        }

In this process, you can build the API interface address and the API Key for authentication and subsequent connections.

3. Obtaining an AWS Bedrock Knowledge Base ID

Log in to the AWS Bedrock Knowledge backend and get the ID of the Knowledge Base that has been created. this parameter will be used in subsequent steps to connect to the Dify platform.

Dify Connecting to External Knowledge Bases Tutorial-1

Get an AWS Bedrock Knowledge Base ID

4. Linked External Knowledge API

Go to the Dify platform "Knowledge base" page, click on the upper right corner of the "External Knowledge Base API"Lighten up. "Add external knowledge base API"The

Follow the page prompts and fill out the following in order:

  • The name of the knowledge base, allowing for a customized name that can be used to differentiate between the different external knowledge APIs connected within the Dify platform;
  • API interface address, the connection address of the external knowledge base, can be customized in the second step. Example api-endpoint/retrieval.;
  • API Key, the external knowledge base connection key, can be customized in step two.

Dify Connecting to External Knowledge Bases Tutorial-1
5. Connecting to external knowledge bases

leave for "Knowledge base" page, click the Add Knowledge Base card below the "Connecting to external knowledge bases" Jump to the parameter configuration page.

Dify Connecting to External Knowledge Bases Tutorial-1

Fill in the following parameters:

  • Knowledge Base Name and Description
  • External Knowledge Base APISelect the external knowledge base API associated in step 4
  • External Knowledge Base IDFill in the AWS Bedrock knowledge base ID obtained in step 3
  • Adjusting Recall SettingsTop K: When a user initiates a question, an external knowledge API will be requested to obtain highly relevant content segments. This parameter is used to filter text segments that have a high degree of similarity to the user's question. The default value is 3. The higher the value, the more relevant text segments will be recalled.

    Score Threshold: the similarity threshold for text fragment filtering, only the text fragments exceeding the set score will be recalled, the default value is 0.5. The higher the value, the higher the similarity between the text and the question, the less the number of text is expected to be recalled, and the result will be more accurate in relative terms.

Dify Connecting to External Knowledge Bases Tutorial-1

Once the settings are complete, you can establish a connection to the external Knowledge Base API.

6. Testing external knowledge base connections and recalls

After establishing a connection to an external knowledge base, a developer can "Recall testing." Model possible problem keywords in the preview of text segments recalled from AWS Bedrock Knowledge Base.

Dify Connecting to External Knowledge Bases Tutorial-1

Test the connection and recall of external knowledge bases

If you are not satisfied with the results of the recall, you can try modifying the recall parameters or adjusting the AWS Bedrock Knowledge Base search settings yourself.

Dify Connecting to External Knowledge Bases Tutorial-1

Adjusting AWS Bedrock Knowledge Base Text Processing Parameters

 

common problems

What if I get an error connecting to the external Knowledge Base API?

Below are the error codes and corresponding solutions:

error code false method settle an issue
1001 Invalid Authorization header format Check the format of the request's Authorization header
1002 validate anomalies Check if the API Key is correct
2001 Knowledge base does not exist Checking the external knowledge base

 

External Knowledge Base API Specification

starting point or ending point (in stories etc)

POST /retrieval

request header

This API is used to connect to independently maintained knowledge bases within a team. For more guidance on how to do this, see Connecting to External Knowledge Bases.

can be found in the HTTP request header's Authorization fields using the API-Key to authenticate permissions. The authentication logic is defined by you in the Retrieval API, as follows:

Authorization: Bearer {API_KEY}

requestor

The request accepts data in the following JSON format:

causality mandatory field typology descriptive example value
knowledge_id be string (computer science) Knowledge Base Unique ID AAA-BBB-CCC
query be string (computer science) User's query What's Dify?
retrieval_setting be boyfriend Knowledge retrieval parameters see below

retrieval_setting attribute contains the following keys:

causality mandatory field typology descriptive example value
top_k be integer (math.) Maximum number of search results 5
score_threshold be floating point Score limit for relevance of results to the query, range: 0~1 0.5

Example of a request

POST /retrieval HTTP/1.1
Content-Type: application/json
Authorization: Bearer your-api-key
{
"knowledge_id": "your-knowledge-id",
"query": "your-question",
"retrieval_setting": {
"top_k": 2, "score_threshold": {
"score_threshold": 0.5
}
}

response body

If the operation is successful, the service returns an HTTP 200 response with the following data in JSON format:

causality mandatory field typology descriptive example value
records be object list List of records queried from the knowledge base see below

records attribute is a list of objects containing the following keys:

causality mandatory field typology descriptive example value
content be string (computer science) Text blocks in the knowledge base Dify: GenAI Application Development Platform
score be floating point Correlation score between results and query, range: 0~1 0.98
title be string (computer science) Document Title About Dify
metadata clogged JSON Metadata attributes and their values for documents in the data source See example

Response Example

HTTP/1.1 200
Content-Type: application/json
{
"records": [
{
"metadata": {
"path": "s3://dify/knowledge.txt",
"description": "dify knowledge document"
},
"score": 0.98, "title": "knowledge.txt".
"title": "knowledge.txt", "content": "This is an external document".
"content": "This is the external knowledge document."
},
{
"metadata": {
"path": "s3://dify/introduce.txt", {
"description": "Introducing dify."
},

"title": "introduce.txt", "content": "GenAI Apps": "GenAI Apps".
"content": "Innovation engine for GenAI applications"
}
]
}

incorrect

If the operation fails, the service returns the following error message (in JSON format):

causality mandatory field typology descriptive example value
error_code be integer (math.) error code 1001
error_msg be string (computer science) API Exception Description Invalid Authorization header format.

error_code Attribute Type:

coding descriptive
1001 Invalid Authorization header format
1002 Authorization Failure
2001 Knowledge base does not exist

HTTP Status Code

  • AccessDeniedException: Lack of access rights. (HTTP status code: 403)
  • InternalServerException: Internal server error. (HTTP status code: 500)
May not be reproduced without permission:Chief AI Sharing Circle " Dify Connecting to External Knowledge Bases Tutorial

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish