rStar2-Agent - Microsoft's Open Source Efficient AI Reasoning Model

Latest AI Resources4mos agorelease AI Sharing Circle

23.8K 00

What is rStar2-Agent

rStar2-Agent is an advanced AI mathematical reasoning model open-sourced by Microsoft that demonstrates strong mathematical problem solving capabilities by achieving an accuracy of 80.61 TP3T in the AIME24 test. The model is equipped with scientific reasoning capabilities, reaching an accuracy of 60.91 TP3T in the GPQA-Diamond benchmark test. The model is trained by Intelligent Body Reinforcement Learning, with efficient tool invocation capability, supporting the automatic invocation of appropriate tools, such as code execution tools, according to the problem requirements, to enhance the efficiency of problem solving. The model training process adopts multi-stage reinforcement learning, combined with GRPO-RoC algorithm, to optimize the use of tools and significantly reduce costs.

Functional features of rStar2-Agent

Efficient Mathematical Reasoning: In the AIME24 test, rStar2-Agent achieves a high accuracy of 80.61 TP3T with 14 billion parameters, and is able to quickly solve complex math problems covering multiple domains such as algebra, geometry, and probability.
scientific reasoning: In the GPQA-Diamond test, the accuracy rate reached 60.9%, demonstrating a deep understanding of scientific knowledge and reasoning ability.
Smart Tool Recall: Automatically invoke appropriate tools, such as code execution tools, based on problem requirements to improve problem solving efficiency.
Strong generalization ability: Extending reasoning capabilities to a wide variety of other tasks and domains has the potential for a wide range of applications.

Core Benefits of rStar2-Agent

parametric efficiency: Achieve performance comparable to much larger models (e.g., DeepSeek-R1 with 671B parameters) with a relatively small number of parameters (14 billion parameters), demonstrating extremely efficient parameter utilization.
Training speed: Achieve a high level of inference in a very short period of time (only 510 reinforcement learning steps), greatly speeding up model training and iteration.
Resource utilization: Training is accomplished with limited GPU resources, reducing hardware dependence and making research and applications more feasible.
low error rate: Reduce the error rate of the model in the inference process through effective algorithm optimization to improve the accuracy and reliability of the results.
Innovative RL algorithms: The GRPO-RoC algorithm is used to solve the problems in traditional reinforcement learning and improve the model's inference in a code environment.
environmental adaptation: The model adapts to the noise in the code execution environment and effectively utilizes environmental feedback for self-correction and learning.

What is rStar2-Agent's official website?

GitHub repository:: https://github.com/microsoft/rStar
arXiv Technical Paper:: https://www.arxiv.org/pdf/2508.20722

People for whom rStar2-Agent is suitable

Researchers and developers: Researchers and developers working in the fields of artificial intelligence, machine learning, and natural language processing to study the behavior of models, optimize algorithms, or develop new applications.
educator: Educators supplement instruction, especially in math and scientific reasoning, to help students understand complex concepts and problem-solving steps.
schoolchildren: Students studying math, science, and programming as a learning tool to improve problem solving and learning.
Data Analyst: Data analysts who need to perform complex data analysis and decision support, process and analyze data to draw more accurate conclusions.
financial analyst: Professionals in finance perform risk assessment, investment analysis, and other tasks requiring advanced mathematical reasoning skills.