RAPTOR: The Future of Retrieval-Augmented Language Models

RAPTOR: The Future of Retrieval-Augmented Language Models

A New Approach

RAPTOR is a method proposed by a research team at Stanford University, focused on how LLMs interact with complex and lengthy documents. The core of RAPTOR lies in recursively summarizing text fragments to build a hierarchical tree from the document.
 
So, what exactly is recursive summarization? Imagine ‘recursively summarizing’ a very thick book. First, you summarize each chapter, then combine those summaries to create an overall summary of the book. If you want to know detailed information, you look at the chapter summaries; if you want the overall context, you refer to the final summary. Recursive summarization compresses information step-by-step, forming a tree-like structure. This structure allows the model to retrieve both detailed information and thematic overviews based on the query.
 
As a result, RAPTOR excels in tasks that require multi-step reasoning from long documents. For instance, in the QuALITY benchmark, a test designed for multi-step reasoning from long texts, RAPTOR achieved a 20% absolute accuracy improvement. It demonstrated far superior performance in complex reasoning tasks compared to other models. Ready to dive deeper? 

The Two Key Retrieval Methods of RAPTOR

RAPTOR’s core retrieval mechanism consists of two main methods: Tree Traversal Retrieval and Collapsed Tree Retrieval. These two approaches allow RAPTOR to efficiently locate information within documents, each offering distinct advantages.

RAPTOR's Two Mechanisms

RAPTOR's Two Mechanisms.

1. Tree Traversal Retrieval

The Tree Traversal Retrieval method involves RAPTOR navigating the hierarchical tree structure it creates, starting from the root and sequentially exploring lower levels. This approach calculates the similarity between the query and the root node, first assessing higher-level concepts, then exploring child nodes one by one to ultimately deliver search results.

  • How it works: Starting at the root, RAPTOR selects the most similar node at each layer and gradually moves down to the lower levels, refining the information. The selected nodes at each layer contain the information most relevant to the query.
  • Advantages: This method is ideal for grasping the big picture of a document before drilling down into finer details. It gradually explores from broader concepts to specific information.
  • Disadvantages: Since the traversal is sequential, it may struggle to simultaneously consider both detailed information and higher-level concepts that a query might require.
 
2. Collapsed Tree Retrieval

The Collapsed Tree Retrieval method flattens all layers of the tree, allowing RAPTOR to evaluate all nodes at once. This approach does not rely on a specific layer of the tree, enabling it to extract relevant information from multiple levels simultaneously.

  • How it works: RAPTOR flattens the tree and calculates cosine similarity for all nodes, selecting those most similar to the query. The selected nodes contain a mix of high-level concepts and detailed information, pulling data from various layers at once.
  • Advantages: Since it evaluates multiple layers simultaneously, this method flexibly combines detailed information with higher-level summaries, providing results that are well-suited to the query.
  • Disadvantages: Evaluating every node in the tree can be computationally expensive, but RAPTOR addresses this issue by utilizing fast retrieval libraries like FAISS to optimize the process.

How is it different from traditional methods?

Comparison of various top-k tree searches and Collapsed Tree Retrieval on the QASPER dataset.

Comparison of various top-k tree searches and Collapsed Tree Retrieval on the QASPER dataset.

Innovations in Tree-Based Retrieval Systems

RAPTOR introduces a tree-based retrieval method that sets it apart from traditional search systems. While conventional methods break text into small fragments and search them individually, RAPTOR goes beyond this. First, it vectorizes the text using SBERT embeddings, then clusters similar texts through a Gaussian Mixture Model (GMM).
 
The key innovation of RAPTOR lies in its recursive summarization of these clusters, forming a hierarchical tree. The root of the tree contains high-level summaries of the entire document, while the leaf nodes hold more detailed information. This multi-layered summary structure allows flexible navigation between abstract and detailed information, depending on the user’s query. This approach enables RAPTOR to capture both granular details and broader topics within a document, significantly enhancing the efficiency of information integration.

RAPTOR's tree construction process.

Thanks to its tree structure, RAPTOR excels in handling complex documents. In particular, on the QASPER dataset, RAPTOR achieved an F1 score of 55.7%, a measure that balances precision and recall. This surpasses the previous best performance of 53.9% by CoLT5. These results demonstrate RAPTOR’s ability to effectively integrate details and overarching themes in complex documents, such as scientific papers, allowing it to retrieve accurate information with minimal error.

Why is this research important?

Language models are playing an increasingly central role in tasks like answering questions, summarizing documents, and generating reports. They must quickly and accurately retrieve relevant information from vast amounts of data. However, traditional retrieval methods only provide small text fragments, limiting the model’s capability. For example, answering a question like “How did Cinderella reach her happy ending?” requires an understanding of the entire context of the novel, not just a few sentences.
 
RAPTOR addresses this issue by providing a way to retrieve information summarized at different levels of abstraction. It can flexibly handle both detailed snippets and high-level overviews, depending on the need. Using the RAPTOR method enhances performance, regardless of which information retrieval approach it’s combined with. Shall we take a look at the table below?
Performance comparison across the QuALITY and QASPER datasets.

Performance comparison across the QuALITY and QASPER datasets.

In the NarrativeQA dataset, SBERT, BM25, and DPR, when combined with RAPTOR, all demonstrated higher ROUGE-L scores compared to when they were used independently. Notably, the combination of BM25 and RAPTOR improved performance from 23.52% to 27.93%, while DPR combined with RAPTOR achieved the highest score at 30.94%. This clearly demonstrates that RAPTOR consistently enhances retrieval accuracy when paired with various search methods.

The introduction of RAPTOR marks a significant leap forward for retrieval-augmented language models. RAPTOR’s recursive summarization and tree-based retrieval system enable more effective integration of complex documents, allowing LLMs to perform tasks such as question answering, document summarization, and analysis with unprecedented precision and efficiency.
 
By setting new performance benchmarks across multiple datasets, RAPTOR strikes a balance between efficiency and accuracy, paving the way for retrieval-augmented models to evolve from simple information gatherers to comprehensive knowledge integrators.

Your AI Data Standard

LLM Evaluation Platform
About Datumo
Related Posts