X-R1: Low-cost training of 0.5B models in common devices

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

X-R1 is a reinforcement learning framework open-sourced on GitHub by the dhcode-cpp team, aiming to provide developers with a low-cost, efficient tool for training models based on end-to-end reinforcement learning. The project is supported by DeepSeek-R1 Inspired by X-R1 and open-r1, X-R1 focuses on building a training environment that is easy to get started with and has low resource requirements.The core product of X-R1 is R1-Zero, a 0.5B covariate model that claims to be able to train efficiently on common hardware. It supports multiple base models (0.5B, 1.5B, 3B) and enhances the model's reasoning and format compliance through reinforcement learning. The project is developed in C++ and combines vLLM Inference engines and GRPO Algorithms for mathematical reasoning, Chinese task processing and other scenarios. Whether you are an individual developer or a researcher, X-R1 provides an open source option worth trying.

X-R1: Low-cost training of 0.5B models in common equipment-1

Function List

Low-cost model training: Support for training R1-Zero models with 0.5B parameters on common hardware (e.g., 4 3090 GPUs).
Enhanced Learning Optimization: Enhance the inference ability and output format accuracy of models through end-to-end reinforcement learning (RL).
Multi-model supportCompatible with 0.5B, 1.5B, 3B and other basic models, flexible adaptation to different task requirements.
GPU Accelerated Sampling: Integrate vLLM as an online inference engine for fast GRPO data sampling.
Chinese Mathematical Reasoning: Supports mathematical reasoning tasks in Chinese environments, generating clear answer processes.
Benchmarking assessment: Provides benchmark.py scripts for evaluating the accuracy and format output capabilities of the model.
Open Source Configuration: Provide a detailed configuration file (e.g. zero3.yaml) for user-defined training parameters.

Using Help

X-R1 is an open source project based on GitHub, users need some programming foundation and hardware environment to install and use it. The following is a detailed installation and usage guide to help you get started quickly.

Installation process

environmental preparation
- hardware requirement: At least 1 NVIDIA GPU (e.g., 3090), 4 recommended for optimal performance; CPU supports AVX/AVX2 instruction set; at least 16GB of RAM.
- operating system: Linux (e.g. Ubuntu 20.04+), Windows (requires WSL2 support) or macOS.
- Dependent tools: Install Git, Python 3.8+, CUDA Toolkit (to match the GPU, e.g. 11.8), C++ compiler (e.g. g++).
```
sudo apt update
sudo apt install git python3 python3-pip build-essential
```
- GPU Driver: Ensure that the NVIDIA driver is installed and matches the CUDA version, which can be done via the nvidia-smi Check.
cloning project
Run the following command in a terminal to download the X-R1 repository locally:
```
git clone https://github.com/dhcode-cpp/X-R1.git
cd X-R1
```
Installation of dependencies
- Install the Python dependencies:
```
pip install -r requirements.txt
```
- Install Accelerate (for distributed training):
```
pip install accelerate
```
Configuration environment
- Edit the profile according to the number of GPUs (e.g. recipes/zero3.yaml), set num_processes(e.g., 3, indicating that 3 of the 4 GPUs are used for training and 1 for vLLM inference).
- Example Configuration:
```
num_processes: 3
per_device_train_batch_size: 1
num_generations: 3
```
Verify Installation
Run the following command to check that the environment is properly configured:
```
accelerate config
```
Follow the prompts to select hardware and distribution settings.

Main Functions

1. Model training

priming training: Train the R1-Zero model using the GRPO algorithm. Run the following command:

ACCELERATE_LOG_LEVEL=info accelerate launch \
--config_file recipes/zero3.yaml \
--num_processes=3 \
src/x_r1/grpo.py \
--config recipes/X_R1_zero_0dot5B_config.yaml \
> ./output/x_r1_0dot5B_sampling.log 2>&1

Functional Description: This command initiates the training of the 0.5B model, and the logs are saved in the output Folder. The training process will use reinforcement learning to optimize the model's reasoning ability.
caveat: Ensure that the global batch size (num_processes * per_device_train_batch_size) can be num_generations Integrate, otherwise an error will be reported.

2. Chinese mathematical reasoning

Configuring Math Tasks: Use the configuration file designed for Chinese:

ACCELERATE_LOG_LEVEL=info accelerate launch \
--config_file recipes/zero3.yaml \
--num_processes=3 \
src/x_r1/grpo.py \
--config recipes/examples/mathcn_zero_3B_config.yaml \
> ./output/mathcn_3B_sampling.log 2>&1

workflow::
1. Prepare a Chinese math problem dataset (e.g., in JSON format).
2. Modify the dataset path in the configuration file.
3. Run the above command and the model will generate the inference process and output to the log.
Featured Functions: Support for generating detailed Chinese answer steps, suitable for educational scenarios.

3. Benchmarking

Operational assessment: Use benchmark.py Test model performance:

CUDA_VISIBLE_DEVICES=0,1 python ./src/x_r1/benchmark.py \
--model_name='xiaodongguaAIGC/X-R1-0.5B' \
--dataset_name='HuggingFaceH4/MATH-500' \
--output_name='./output/result_benchmark_math500' \
--max_output_tokens=1024 \
--num_gpus=2

Interpretation of results: The script outputs the accuracy-metric and format-metric and saves the results as a JSON file.
Usage Scenarios: Validating model performance in mathematical tasks and optimizing training parameters.

operating skill

Log View: After training or testing is complete, view the log file (e.g., the x_r1_0dot5B_sampling.log) to debug the problem.
Multi-GPU Optimization: If more GPUs are available, adjust the num_processes cap (a poem) num_gpus parameters to improve parallel efficiency.
error detection: If a batch size error is encountered, adjust the per_device_train_batch_size cap (a poem) num_generations Make it match.

Recommendations for use

When using it for the first time, it is recommended to start with a small-scale model (e.g., 0.5B) and then expand to a 3B model once you are familiar with the process.
Regularly check your GitHub repository for updates and new features and fixes.
For support, contact the developer at dhcode95@gmail.com.

With these steps, you can easily set up an X-R1 environment and start training or testing models. This framework provides efficient support for both research and development.

X-R1: Low-cost training of 0.5B models in common devices

General Introduction

Function List

Using Help

Installation process

Main Functions

1. Model training

2. Chinese mathematical reasoning

3. Benchmarking

operating skill

Recommendations for use

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification