Today, we are pleased to present Dify Version v0.15.0 brings a new "Parent-Child Retrieval" feature. This is an advanced technique implemented in the Retrieval Augmented Generation (RAG) system, which aims to further enhance information acquisition and contextual understanding.With this capability, Dify is able to provide more comprehensive and contextualized information for AI generation, which significantly improves the quality and accuracy of LLM application responses.
Dilemma of Context and Precision
When using a knowledge base retrieval system, users are often faced with an awkward dilemma: search results are either too fragmented, causing the LLM to lack sufficient context to understand the information, or too broad, resulting in information overload and sacrificing precision. This makes it difficult for LLMs to efficiently find and use the information they need.
In this context, the right chunk size is critical for AI applications to generate accurate and comprehensive responses. As a result, Dify brings a new parent-child retrieval feature that strikes the ideal balance between accuracy and context, significantly improving the overall performance and reliability of the knowledge retrieval process.
Parent-child retrieval: balancing precision and context
Parent-child retrieval is accomplished by employing a two-tier structure in the form of data for the RAG The system brings a more flexible and effective way of searching, allowing both accurate matching and comprehensive contextual information. Its basic mechanisms include:
- Subblock Matching Query
- Splitting documents into smaller, centralized units of information (e.g., a sentence) matches user queries more accurately.
- Subblocks can quickly provide preliminary results that are most relevant to the user's needs.
2. Parent blocks provide context
- A larger portion of the document (e.g., a paragraph, section, or even the entire document) that contains matching sub-blocks is treated as a parent block and made available to the Larger Language Model (LLM).
- The parent block provides complete contextual information for the LLM and avoids leaving out important details.
This hierarchical retrieval approach ensures the accuracy of the retrieval results while preserving the context. In the case of customer support, for example, parent-child retrieval can provide more detailed and globally contextualized answers by referencing exhaustive product documentation, thus improving the accuracy and information richness of the language model output in terms of content generation.
Generic Search VS Parent-Child Search
As shown in the figure below, in the same document, the contextual information provided by using parent-child retrieval will be more comprehensive and can maintain a high level of accuracy, which is greatly superior to the traditional single-layer generalized retrieval.
How to use parent-child search
- Data source:Select a data source and import documents for knowledge retrieval.
- chunking
- Select a generic chunking or parent-child chunking strategy and set parameters such as chunk size, followed by a preview of the chunking results.
- If parent-child chunking is selected, there are two modes available:
- Paragraph Mode: Splits text into paragraphs based on separators and maximum chunk length, and treats these paragraphs as parent chunks. Ideal for documents with clear and relatively independent paragraphs.
- Whole document mode: the whole document as a parent block, suitable for scenarios that require complete contextual retrieval.
Regardless of the mode, the child block will be further subdivided from the parent block. After completing the indexing method and retrieval settings, the user can edit either the parent block or the child block. Parent block editing can be done with the option to regenerate the child block or not, while child block editing will not affect the content of the parent block but can be used as a customized tag for better retrieval of the corresponding parent block. For more details, please check 📖 help fileThe
Other update highlights: more intuitive display of parent-child blocks
As a low-code platform, Dify strives to make it easy for users without a technical background to understand and use the parent-child search feature. In this update, we've made the following improvements to the chunked preview:
- Clearer block structure:Each parent block is shown as a separate module, with child blocks marked against a gray background and labeled with the block number.
- Convenient mouse hover information:When the mouse hovers over a sub-block, the sub-block is highlighted in blue and displays word count information.
- Retrieve test preview:The parent block is displayed on the left side of the preview window, and all matched child blocks are highlighted in blue with their corresponding scores for the user to see at a glance.
With this update, Dify's parent-child search function brings more accurate and comprehensive search results to LLM applications, significantly improving the efficiency and accuracy of information acquisition, helping enterprises and developers achieve more efficient knowledge management and value creation in intelligent workflows.