AI Personal Learning
and practical guidance

PRAG: Parameterized Retrieval Augmentation Generation Tool for Improving the Performance of Q&A Systems

General Introduction

PRAG (Parametric Retrieval-Augmented Generation) is an innovative retrieval-augmented generation tool designed to enhance generation by embedding external knowledge directly into the parameter space of a Large Language Model (LLM). The tool overcomes the limitations of traditional contextual retrieval-augmented generation methods, reduces computational overhead, and enhances the model's reasoning and synthesis capabilities by deeply integrating external knowledge.PRAG provides end-to-end implementations including a data enhancement module, a parameter training module, and an inference module for performance testing of various quiz datasets.

PRAG: Parameterized Retrieval Augmentation Generation Tool for Improving the Performance of Q&A Systems-1


 

Function List

  • Data Enhancement Module: Convert documents into data-enhanced datasets.
  • Parameter Training Module: Train additional LoRA parameters to generate a parameterized representation of the document.
  • inference module: Merge parameterized representations of related documents and insert them into the LLM for inference.
  • Environment Installation: Provides detailed steps and dependencies for installing the environment.
  • self-improvement: Supports direct use of pre-enhanced data files or self-processed data enhancements.
  • Search preparation: Download and prepare Wikipedia datasets for retrieval.

 

Using Help

Environment Installation

  1. Create and activate a virtual environment:
   conda create -n prag python=3.10.4
conda activate prag
  1. Install the necessary dependencies:
   pip install torch==2.1.0
pip install -r requirements.txt
  1. modifications src/root_dir_path.py hit the nail on the head ROOT_DIR variable is the address of the folder where the PRAG is stored.

data enhancement

  1. Use pre-enhanced data files:
   tar -xzvf data_aug.tar.gz
  1. Self-processing data enhancement:
    • Download the Wikipedia dataset: bash
      mkdir -p data/dpr
      wget -O data/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
    • intend BM25 Retrieve: bash
      # Please refer to the project documentation for specific steps.

parametric training

  1. Generate a parameterized representation of the document:
   # Please refer to the project documentation for specific steps.

inference

  1. Parameterized representations of related documents are merged and inserted into the LLM for inference:
   # Please refer to the project documentation for specific steps.
May not be reproduced without permission:Chief AI Sharing Circle " PRAG: Parameterized Retrieval Augmentation Generation Tool for Improving the Performance of Q&A Systems

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish