AI Personal Learning
and practical guidance
讯飞绘镜

Hybrid-T1 re-released: Mamba-enabled, redefining inference speed

Recently, the field of large-scale language modeling has received increasing attention from the industry for new paradigms of reinforcement learning in the late stages of training. Following the introduction of O-series models such as GPT-4o by OpenAI and the DeepSeek-R1 of the release, the excellent performance of the model demonstrates the key role of reinforcement learning in the optimization process.

Tencent's hybrid large modeling team has also made significant progress recently. In mid-February this year, the team launched the Mixed Yuan T1-Preview inference model based on the medium-sized Mixed Yuan base on the Tencent Yuanbao APP. Now, the Deep Thinking model of the Mixed Meta Model series has been upgraded to the official version of Mixed Meta-T1.


Experience Address:

https://llm.hunyuan.tencent.com/#/chat/hy-t1

https://huggingface.co/spaces/tencent/Hunyuan-T1

Yuanbao/yuanqi: Tencent Mixed Yuan supported AI assistant and open intelligent body design platform

 

The Hybrid-T1 is based on the early March release of the TurboS Rapid Thinking Base.TurboS is the world's first hyperscale Model of Mixed Expertise (MoE) that incorporates Transformer and Mamba architectures. With large-scale post-training, the inference capabilities of Mamba-T1 are significantly extended and better aligned with human preferences.

Hybrid-T1 has unique advantages in deep reasoning. First, the long text capture capability of TurboS helps to effectively solve the context loss and long-distance information dependency problems that are common in long text inference. Second, the Mamba architecture is specifically optimized for processing long sequences, which significantly reduces computational resource consumption through efficient computational methods while ensuring the ability to capture long textual information. Under the same deployment conditions, the decoding speed is increased by a factor of 2.

In the later training phase of the model, 96.7% of the computational resources are invested in reinforcement learning training, focusing on improving pure inference and optimizing the alignment with human preferences.

To achieve this goal, the research team collected world-class scientific and reasoning problems covering the fields of math, logical reasoning, science, and code. These datasets cover a wide range of tasks from basic mathematical reasoning to complex scientific problem solving. This, combined with realistic feedback (ground-truth), ensures that the model performs well in the face of a wide range of reasoning tasks.

The training was conducted using a curriculum learning (CLE) approach, which progressively increases the difficulty of the data while progressively expanding the length of the model's context, so that the model learns to effectively utilize the reasoning capabilities while improving the token Reasoning.

In terms of training strategies, classical reinforcement learning strategies such as data replay and periodic policy resetting are borrowed to improve the long-term stability of model training by more than 50%. In the alignment with human preferences phase, a unified reward system feedback scheme, including self-rewarding (comprehensive evaluation and scoring of model outputs based on an earlier version of T1-Preview) and reward modes, is used to guide the model to self-improvement. The model exhibits richer content detail and more efficient information in its responses.

In addition to achieving comparable or slightly better results than DeepSeek-R1 on public benchmark tests of Chinese and English knowledge, competition-level math and logical reasoning such as MMLU-pro, CEval, AIME, Zebra Logic, and others, Hybrid-T1 also performs well on the internal human assessment dataset, with slight advantages in cultural and creative instruction following, text summarization, and smart-body competence. .

In terms of the comprehensive evaluation metrics, the overall performance of the Hybrid-T1 is comparable to that of the best-in-class frontier inference models. In the comprehensive capability assessment, the T1 is at MMLU-PRO Second only to O1 on the list, gaining 87.2 of high scores. The test set covers questions in 14 areas of the humanities, social sciences, science and engineering, and focuses on testing the model's memory and understanding of a broad range of knowledge. Additionally, in focusing on specialized domain knowledge and complex scientific reasoning GPQA-diamond(T1 (which consists mainly of doctoral-level problems in physics, chemistry, and biology). 69.3 The score.

Scenarios requiring strong reasoning skills, such as coding, math, and logical reasoning, were tested in science and engineering. In the LiveCodeBench In the code evaluation, T1 reached 64.9 Score. Also, T1 excelled in math. Especially in MATH-500 On top of that, it made 96.2 The excellent results, following DeepSeek-R1, demonstrated T1's comprehensive ability in solving math problems. In addition, T1 showed strong adaptability in multiple alignment tasks, instruction-following tasks, and tool utilization tasks. For example, T1's performance on the ArenaHard The quest was awarded with the 91.9 The score.

model effect

混元-T1 重磅发布:Mamba 加持,重新定义推理效率-1

混元-T1 重磅发布:Mamba 加持,重新定义推理效率-2
Note: The assessment indicators for the other models in the table are from the official assessment results. For the parts that are not included in the official assessment results, the data come from the internal assessment platform of Mixed Element.

May not be reproduced without permission:Chief AI Sharing Circle " Hybrid-T1 re-released: Mamba-enabled, redefining inference speed
en_USEnglish