AI Personal Learning
and practical guidance

MathCLUE: DeepSeek R1 Challenges 'National High School Math Contest', Dramatically Outperforms o1

DeepSeek R1 Challenges 'National High School Math Contest', Dramatically Outperforms o1-1

DeepSeek R1 Challenges 'National High School Math Contest', Dramatically Outperforms o1-1

DeepSeek R1 Challenges 'National High School Math Contest', Dramatically Outperforms o1-1

MathCLUE "National High School Mathematics Competition" is introduced: an in-depth assessment of competition-level mathematical reasoning ability in large models. The assessment system covers a number of representative dimensions of high school mathematics, including geometry, algebra and probability statistics.

🔥 Measurement model: DeepSeek-R1 (accessed at chat.deepseek.com)

DeepSeek-R1 Evaluation and Analysis
🔍 DeepSeek-R1 Tops MathCLUE's National High School Math Contest List
DeepSeek-R1 topped the national high school math competition evaluation list with an excellent score of 87.31 points, significantly ahead of the world's top model o1 nearly 10 points, compared to DeepSeek-R1-Lite-Preview to improve 26.12 points, its overall score increased substantially, mathematical reasoning and problem solving ability to reach a new height.


 

Meanwhile, the results of Qwen2.5-Max "National High School Math Contest" are out! Failed to meet expectations, with reasons

🔥 Assessment model: Qwen2.5-Max
Call the official API version name: qwen-max-2025-01-25

Qwen2.5-Max Evaluation and Analysis
🔍Qwen2.5-Max still has some room for improvement on the MathCLUE list
Qwen2.5-Max scored 33.58 points and ranked 9th in the National High School Mathematics Competition, ahead of famous overseas models. Claude 3.5 Sonnet (20241022) 15.67 points, but still has some room for improvement (with a gap of more than 30 points) compared to the headline big models at home and abroad.
For the performance of this model, we analyzed its wrong questions in depth. It is found that the model omits the solution process and gives wrong answers directly on some puzzles, and this assessment is only based on the final answers, which may be the main reason for its low score.

 

Reviews
MathCLUE National High School Math Competition Review Set. Covers questions from the 2024 National High School Mathematics Competition and develops a rigorous assessment of the Big Model.

Methodology
The method of determining whether the final answer in the response matches the reference answer for the macromodel's response on the assessment task to confirm the macromodel's rate of correctness (correct or incorrect) on a question achieves complete objectivity in assessment.

May not be reproduced without permission:Chief AI Sharing Circle " MathCLUE: DeepSeek R1 Challenges 'National High School Math Contest', Dramatically Outperforms o1

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish