GraphReader: Graph-based Intelligents to Enhance Long Text Processing for Large Language Models
Graphic ExpertsIt is like a tutor who is good at making mind maps, transforming long text into a clear knowledge network, so that the AI can easily find the key points needed for the answer like exploring along a map, effectively overcoming the problem of "getting lost" when dealing with long text.
- Published: 2024.01.20
- Paper name: GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
- Paper address: https://arxiv.org/abs/2406.14550
This article describes GraphReader, a graph-structure-based intelligent body system designed to address the challenges encountered by Large Language Models (LLMs) in processing long texts, and to perform well on tasks such as multi-hop quizzes. Below is a summary of the article:
I. Motivation for the thesis
With the advancement of natural language understanding and generation technologies, one of the major limitations faced by LLMs is the limitation of context window size and memory usage, which makes it difficult for them to efficiently process large amounts of text input. To address this problem, researchers have explored a variety of approaches, including improving the model structure, introducing retrieval enhancement mechanisms, and utilizing agents for complex reasoning. However, each of these approaches has certain limitations, such as increased training costs, ignoring detailed information, or inflexible decision-making mechanisms.
II. Innovative points of the paper
- goal: By constructing graph structures and employing an autonomous agent exploration strategy, GraphReader is able to capture long-range dependencies within a limited context window, thus enabling efficient processing of lengthy documents.
- innovation point::
- Segment long text into discrete chunks and extract key elements and atomic facts;
- Use these components to build a graph structure that reflects the relationships within the text;
- The intelligent body navigates through the graph and collects the necessary information according to a predefined function and a stepwise rational plan;
- The whole process involves taking notes and reflecting to ensure the accuracy and completeness of the final answer.
III. Thesis ideas
The operation of GraphReader is divided into three stages:
- graphical construction: The document is divided into parts, each of which is summarized as an atomic fact from which key elements are extracted to form nodes; links between nodes are established based on shared key elements.
- graphical exploration: The intelligent body selects the starting node according to a rational plan and traverses the entire graph structure by checking neighboring nodes; in the process, the intelligent body records supporting facts for subsequent analysis.
- Exploring Atomic Facts: Since it is not possible to include all the raw text blocks associated with a node in the context window, the agent adopts a coarse-to-fine strategy, starting with reading the atomic facts and gradually exploring the raw text. All atomic facts can fit into the context window, so the agent first groups all atomic facts associated with each node by their corresponding text chunks and labels them with the corresponding text chunk IDs, which are then fed to the agent. This allows the agent to capture an overview of each text block by reading through all the atomic fact groups. At the same time, the agent uses the questions, rational plans, and notes in the notebook to reflect on the desired clues and determine which text blocks may contain useful information. Subsequently, the agent is given two functions:
- read_neighbor_node, the agent selects a neighboring node that may be helpful in answering the question and re-enters the process of exploring the atomic facts and text blocks;
- termination, the agent determines that no neighboring node contains useful information, and it will complete the exploration.
- Exploring Text Blocks: When the text block queue is not empty, it means that the agent has recognized more than one text block of interest. The GraphReader then traverses the queue and reads each text block one by one. This step is crucial because atomic facts only summarize key information and provide short clues, while specific details are best obtained directly from the original text block. As the text blocks are read, the agent will consider the problem and plan again, thinking about what can be added to the current notebook. Any supporting facts found will be recorded in the notebook. Depending on the updated notebook, the agent will choose one of the following four functions:
- search_more, the agent will continue to explore the text blocks in the queue if there are insufficient support facts;
- read_previous_chunk and 3) read_subsequent_chunk, due to truncation issues, neighboring text chunks may contain relevant and useful information, and the agent can insert these IDs into the queue;
- termination, if enough information has been gathered to answer the question, the agent will complete the exploration.
- Explore neighboring nodes: When both the atomic facts and the text block queue of the current node have been completely processed, this means that this node has been thoroughly explored and the agent needs to visit the next node. Considering the problem, the rational plan and the contents of the notebook, the agent examines all neighboring nodes, i.e., key elements, and executes one of the following two functions:
- read_chunk, if the agent determines that certain chunks of text are worthy of further reading, it completes the function argument with the chunk IDs, i.e., read_chunk(List[ID]), and adds those IDs to the chunk queue.
- stop_and_read_neighbor, instead, if the agent decides that no chunk of text is worth reading further, it will finish reading the current node and start exploring neighboring nodes.
- Exploring Atomic Facts: Since it is not possible to include all the raw text blocks associated with a node in the context window, the agent adopts a coarse-to-fine strategy, starting with reading the atomic facts and gradually exploring the raw text. All atomic facts can fit into the context window, so the agent first groups all atomic facts associated with each node by their corresponding text chunks and labels them with the corresponding text chunk IDs, which are then fed to the agent. This allows the agent to capture an overview of each text block by reading through all the atomic fact groups. At the same time, the agent uses the questions, rational plans, and notes in the notebook to reflect on the desired clues and determine which text blocks may contain useful information. Subsequently, the agent is given two functions:
- reasoning in an answer: Compile notes from different intelligences and use thought chain reasoning to generate answers to given questions.
IV. Performance evaluation
By experimenting with several datasets from long context benchmarking, GraphReader demonstrates significantly better performance than other methods. For example, on the HotpotQA dataset, GraphReader achieves an EM of 55.01 TP3T and an F1 score of 70.01 TP3T, outperforming GPT-4-128k and other existing methods. In addition, GraphReader also maintains good performance when dealing with very long contexts, especially in the LV-Eval benchmark, where it shows a relative performance improvement of 75.00% with respect to GPT-4-128k.
Experimental results show that GraphReader achieves significant performance improvements in long text processing, especially in multi-hop problems and very long texts.
V. Impact and outlook
GraphReader not only represents an important advance in solving the challenges of long context processing in LLMs, but also paves the way for more advanced language models in the future. It demonstrates that long-range dependencies can be efficiently captured and utilized even with a small context window, which has important implications for tasks involving lengthy documents and complex multi-step reasoning. This work may revolutionize several fields such as document analysis and research assistance, opening up new possibilities for AI applications.