Ouro - A new cyclic language model open-sourced by the ByteHopper Seed team

Latest AI Resources2mos agorelease AI Sharing Circle

23.2K 00

What's Ouro?

Ouro is a new type of Looped Language Models (LLMs) developed by the Byte Jump Seed team, the core innovation is to directly build the reasoning capability in the pre-training stage through a parameter-sharing loop computation structure. The model uses 24 layers as the base block, and achieves an equivalent computational depth of 96 layers through 4 loops, but maintains a parameter scale of 1.4B, which significantly improves the reasoning efficiency of small models. Experiments show that Ouro 1.4B scores 71.02 on the BBH reasoning benchmark, approaching the performance of the 4B parameter model, while the 2.6B version achieves a score of 90.85 on the Math500 math problem, surpassing the 8B model. Its unique design includes a dynamic computational mechanism (fewer cycles for simple tasks and more cycles for complex tasks) and an entropy regularization training strategy that enables the model to adaptively adjust its depth of thinking.

Features of Ouro

Architecture InnovationThe Ouro model builds inference capabilities directly into the pre-training phase by iteratively computing in the latent space, rather than relying only on later fine-tuning. The architecture consists of a "layer stack" of N shared weight layers, which is looped several times during the forward propagation process, i.e., multiple "loop steps", thus realizing a "dynamic computation" that reduces the size of the model from the "number of parameters" to the "number of computations", thus reducing the size of the model from the "number of parameters" to the "number of computations". This enables "dynamic computation" and decouples the computational scale of the model from the "number of parameters" to the "depth of computation".
Training Strategies: The Ouro model employs a new two-stage adaptive computation training strategy. The first stage uses an entropy regularization objective with a uniform prior on the exit step to encourage the model to explore all computational depths unbiasedly; the second stage is a focused adaptive gating training phase that explicitly optimizes the exit gating to trade-off computational cost and performance gains.
parametric efficiency: The Ouro model demonstrates excellent parametric efficiency. the 1.4B and 2.6B models consistently match or even exceed the performance of the much larger SOTA LLM (up to 4B and 12B parameters, respectively) in all types of benchmarks, realizing parametric efficiency gains by a factor of 2-3.
reasoning ability: The performance advantage of the Ouro model stems not from the increased knowledge capacity, but from its far superior knowledge manipulation capabilities, i.e., the ability to reason and combine facts in multiple steps. The advantage of the Ouro model is particularly evident on difficult mathematical reasoning tasks such as GSM8K and MATH500.
Security and Fidelity: Compared to the baseline model, Ouro's harmful content generation rate is lower and decreases as the number of loop steps increases. Its reasoning process is shown to be more causally faithful, with intermediate steps being more closely related to the final answer

Ouro's core strengths

Powerful reasoning: Ouro excels in multi-step reasoning and logical deduction, especially on difficult mathematical reasoning tasks, and can accurately perform logical deductions and calculations.
Excellent parametric efficiency: Ouro significantly improves parameter efficiency through a recurrent architecture and training strategy. Smaller models show comparable or even better performance than larger models in several benchmarks.
Security and Fidelity: Ouro generates safer textual content with a low rate of harmful content generation. Its reasoning process is more causally faithful, with intermediate steps closely linked to the final answer.
Open Source and Scalability: The Ouro model has been open-sourced and is available in 1.4B and 2.6B parameter scales to facilitate further research and application development by researchers and developers.
Effective Training Strategies: Ouro employs a two-stage adaptive computation training strategy to efficiently explore different computational depths, optimize the inference process, and improve model performance.
Multi-language support: Ouro supports multiple languages and is capable of handling cross-language tasks such as machine translation and multilingual Q&A, with a wide range of applications.

What is Ouro's official website?

Project website:: https://ouro-llm.github.io/
HuggingFace Model Library:: https://huggingface.co/collections/ByteDance/ouro
arXiv Technical Paper:: https://arxiv.org/pdf/2510.25741

Who Ouro is for

natural language processing (NLP) researcher: The innovative architecture and training strategy of the Ouro model provides researchers with new research directions and experimental platforms, which help to promote technological advances in the field of natural language processing.
Artificial Intelligence Developers: Ouro's open source nature and flexibility make it ideal for developers to build a variety of language modeling applications, such as intelligent customer service, content generation tools, and more.
Educators and students: Ouro's strengths in mathematical reasoning and logical deduction make it a powerful tool in education for developing intelligent tutoring systems, automated problem solving tools, and more to help students better learn and understand complex concepts.
content creator: Ouro aids in creative writing, copy generation and storytelling, helping content creators to be more productive and inspired.
Enterprises and organizations: Ouro can be used for internal knowledge management, customer service and content auditing scenarios to improve operational efficiency and user experience.