Search-R1: A Tool for Reinforcement Learning to Train Large Models for Search and Inference

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

Search-R1 is an open source project developed by PeterGriffinJin on GitHub and built on the veRL framework. It uses reinforcement learning (RL) techniques to train large language models (LLMs), allowing the models to autonomously learn to reason and invoke search engines to solve problems. The project supports basic models such as Qwen2.5-3B and Llama3.2-3B, extending the DeepSeek-R1 and TinyZero methods. Users can use it to train models to handle single- or multi-round tasks, with code, datasets, and experiment logs provided. Officialdiscuss a paper or thesis (old)Released in March 2025, the project model and data are available for download at Hugging Face for researchers and developers.

Search-R1: A Tool for Reinforcement Learning to Train Large Models for Search and Reasoning-1

Function List

Training large models through reinforcement learning to improve reasoning and search.
Support for calling Google, Bing, Brave and other search engines API.
Provides LoRA tuning and supervised fine-tuning capabilities to optimize model performance.
Built-in off-the-shelf reorderer to improve search result accuracy.
Includes detailed lab logs and papers to support reproduction of results.
Provides local search server functionality for easy customization of searches.
Support for user upload of customized datasets and corpora.

Using Help

Search-R1 is aimed at users with a basic knowledge of programming and machine learning. Below is a detailed installation and usage guide to get you started quickly.

Installation process

To use Search-R1, you need to set up the environment first. The steps are as follows:

Creating a Search-R1 Environment
Runs in the terminal:

conda create -n searchr1 python=3.9
conda activate searchr1

This creates a virtual environment for Python 3.9.

Installing PyTorch
Install PyTorch 2.4.0 (supports CUDA 12.1) by entering the following command:

pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121

Installing vLLM
vLLM is a key library for running large models, install version 0.6.3:

pip3 install vllm==0.6.3

Versions 0.5.4, 0.4.2 or 0.3.1 are also available.

Installing veRL
Run it in the project root directory:

pip install -e .

This will install the veRL framework.

Install optional dependencies
To improve performance, install Flash Attention and Wandb:

pip3 install flash-attn --no-build-isolation
pip install wandb

Installation of Retriever Environment (optional)
If a local retrieval server is required, create another environment:

conda create -n retriever python=3.10
conda activate retriever
conda install pytorch==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
pip install uvicorn fastapi

Quick Start

Following are the steps to train a model based on NQ dataset:

Download indexes and corpora
Set the save path and run it:

save_path=/你的保存路径
python scripts/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gz

Processing NQ data
Run the script to generate training data:

python scripts/data_process/nq_search.py

Starting the Retrieval Server
Runs in the Retriever environment:

conda activate retriever
bash retrieval_launch.sh

Run RL training
Runs in a Search-R1 environment:

conda activate searchr1
bash train_ppo.sh

This will use the Llama-3.2-3B base model for PPO training.

Using custom datasets

Preparing QA Data
The data needs to be in JSONL format and each row contains the following fields:

{
"data_source": "web",
"prompt": [{"role": "user", "content": "问题"}],
"ability": "fact-reasoning",
"reward_model": {"style": "rule", "ground_truth": "答案"},
"extra_info": {"split": "train", "index": 1}
}

consultation <scripts/data_process/nq_search.py>The

Preparing the corpus
The corpus needs to be in JSONL format, with each line containing id cap (a poem) contents, e.g.:
```
{"id": "0", "contents": "文本内容"}
```
referable <example/corpus.jsonl>The
Indexing corpus (optional)
If using a local search, run:
```
bash search_r1/search/build_index.sh
```

Calling a customized search engine

modifications <search_r1/search/retriever_server.py>The configuration APIs.

Start the server:

python search_r1/search/retriever_server.py

The model will be passed through the http://127.0.0.1:8000/retrieve Call Search.

inference operation

Start the retrieval server:
```
bash retrieval_launch.sh
```
Running Reasoning:
```
python infer.py
```
modifications <infer.py> Line 7 question, enter the question you want to ask.

caveat

Training requires a GPU with at least 24GB of video memory (e.g. NVIDIA A100).
Make sure the API key is valid and the network connection is stable.
Official papers and lab journals (<Full experiment log 1> cap (a poem) <Full experiment log 2>) Provide more details.

With these steps, you can use Search-R1 to train a model that can reason and search to handle a variety of tasks.

application scenario

research experiment
Researchers can use Search-R1 to reproduce the results of the paper and explore the application of reinforcement learning in model training.
Intelligent Assistant Development
Developers can train models to be integrated into chat tools to provide search and reasoning capabilities.
Knowledge Query
Users can use it to quickly answer complex questions and get up-to-date information through search.

QA

What models does Search-R1 support?
Currently, Qwen2.5-3B and Llama3.2-3B base models are supported, other models need to be adapted by themselves.
How long does the training take?
Depending on the dataset and hardware, the NQ dataset takes about a few hours to train on a 24GB graphics GPU.
How do I verify the effectiveness of my training?
ferret out <Preliminary results> in the chart, or check the Wandb log.

Search-R1: A Tool for Reinforcement Learning to Train Large Models for Search and Reasoning

General Introduction

Function List

Using Help

Installation process

Quick Start

Using custom datasets

Calling a customized search engine

inference operation

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification