General Introduction
X-R1 is a reinforcement learning framework open-sourced on GitHub by the dhcode-cpp team, aiming to provide developers with a low-cost, efficient tool for training models based on end-to-end reinforcement learning. The project is supported by DeepSeek-R1 Inspired by X-R1 and open-r1, X-R1 focuses on building a training environment that is easy to get started with and has low resource requirements.The core product of X-R1 is R1-Zero, a 0.5B covariate model that claims to be able to train efficiently on common hardware. It supports multiple base models (0.5B, 1.5B, 3B) and enhances the model's reasoning and format compliance through reinforcement learning. The project is developed in C++ and combines vLLM Inference engines and GRPO Algorithms for mathematical reasoning, Chinese task processing and other scenarios. Whether you are an individual developer or a researcher, X-R1 provides an open source option worth trying.
Function List
- Low-cost model training: Support for training R1-Zero models with 0.5B parameters on common hardware (e.g., 4 3090 GPUs).
- Enhanced Learning Optimization: Enhance the inference ability and output format accuracy of models through end-to-end reinforcement learning (RL).
- Multi-model supportCompatible with 0.5B, 1.5B, 3B and other basic models, flexible adaptation to different task requirements.
- GPU Accelerated Sampling: Integrate vLLM as an online inference engine for fast GRPO data sampling.
- Chinese Mathematical Reasoning: Supports mathematical reasoning tasks in Chinese environments, generating clear answer processes.
- Benchmarking assessment: Provides benchmark.py scripts for evaluating the accuracy and format output capabilities of the model.
- Open Source Configuration: Provide a detailed configuration file (e.g. zero3.yaml) for user-defined training parameters.
Using Help
X-R1 is an open source project based on GitHub, users need some programming foundation and hardware environment to install and use it. The following is a detailed installation and usage guide to help you get started quickly.
Installation process
- environmental preparation
- hardware requirement: At least 1 NVIDIA GPU (e.g., 3090), 4 recommended for optimal performance; CPU supports AVX/AVX2 instruction set; at least 16GB of RAM.
- operating system: Linux (e.g. Ubuntu 20.04+), Windows (requires WSL2 support) or macOS.
- Dependent tools: Install Git, Python 3.8+, CUDA Toolkit (to match the GPU, e.g. 11.8), C++ compiler (e.g. g++).
sudo apt update sudo apt install git python3 python3-pip build-essential
- GPU Driver: Ensure that the NVIDIA driver is installed and matches the CUDA version, which can be done via the
nvidia-smi
Check.
- cloning project
Run the following command in a terminal to download the X-R1 repository locally:git clone https://github.com/dhcode-cpp/X-R1.git cd X-R1
- Installation of dependencies
- Install the Python dependencies:
pip install -r requirements.txt
- Install Accelerate (for distributed training):
pip install accelerate
- Install the Python dependencies:
- Configuration environment
- Edit the profile according to the number of GPUs (e.g.
recipes/zero3.yaml
), setnum_processes
(e.g., 3, indicating that 3 of the 4 GPUs are used for training and 1 for vLLM inference). - Example Configuration:
num_processes: 3 per_device_train_batch_size: 1 num_generations: 3
- Edit the profile according to the number of GPUs (e.g.
- Verify Installation
Run the following command to check that the environment is properly configured:accelerate config
Follow the prompts to select hardware and distribution settings.
Main Functions
1. Model training
- priming training: Train the R1-Zero model using the GRPO algorithm. Run the following command:
ACCELERATE_LOG_LEVEL=info accelerate launch \ --config_file recipes/zero3.yaml \ --num_processes=3 \ src/x_r1/grpo.py \ --config recipes/X_R1_zero_0dot5B_config.yaml \ > ./output/x_r1_0dot5B_sampling.log 2>&1
- Functional Description: This command initiates the training of the 0.5B model, and the logs are saved in the
output
Folder. The training process will use reinforcement learning to optimize the model's reasoning ability. - caveat: Ensure that the global batch size (
num_processes * per_device_train_batch_size
) can benum_generations
Integrate, otherwise an error will be reported.
2. Chinese mathematical reasoning
- Configuring Math Tasks: Use the configuration file designed for Chinese:
ACCELERATE_LOG_LEVEL=info accelerate launch \ --config_file recipes/zero3.yaml \ --num_processes=3 \ src/x_r1/grpo.py \ --config recipes/examples/mathcn_zero_3B_config.yaml \ > ./output/mathcn_3B_sampling.log 2>&1
- workflow::
- Prepare a Chinese math problem dataset (e.g., in JSON format).
- Modify the dataset path in the configuration file.
- Run the above command and the model will generate the inference process and output to the log.
- Featured Functions: Support for generating detailed Chinese answer steps, suitable for educational scenarios.
3. Benchmarking
- Operational assessment: Use
benchmark.py
Test model performance:CUDA_VISIBLE_DEVICES=0,1 python ./src/x_r1/benchmark.py \ --model_name='xiaodongguaAIGC/X-R1-0.5B' \ --dataset_name='HuggingFaceH4/MATH-500' \ --output_name='./output/result_benchmark_math500' \ --max_output_tokens=1024 \ --num_gpus=2
- Interpretation of results: The script outputs the accuracy-metric and format-metric and saves the results as a JSON file.
- Usage Scenarios: Validating model performance in mathematical tasks and optimizing training parameters.
operating skill
- Log View: After training or testing is complete, view the log file (e.g., the
x_r1_0dot5B_sampling.log
) to debug the problem. - Multi-GPU Optimization: If more GPUs are available, adjust the
num_processes
cap (a poem)num_gpus
parameters to improve parallel efficiency. - error detection: If a batch size error is encountered, adjust the
per_device_train_batch_size
cap (a poem)num_generations
Make it match.
Recommendations for use
- When using it for the first time, it is recommended to start with a small-scale model (e.g., 0.5B) and then expand to a 3B model once you are familiar with the process.
- Regularly check your GitHub repository for updates and new features and fixes.
- For support, contact the developer at dhcode95@gmail.com.
With these steps, you can easily set up an X-R1 environment and start training or testing models. This framework provides efficient support for both research and development.