Mini-o3 - Bytes, HKU Joint Open Source Visual Reasoning Model

Latest AI Resources6mos agorelease AI Sharing Circle

33.2K 00

What is Mini-o3

Mini-o3 is an open source model jointly launched by ByteDance and the University of Hong Kong, focusing on solving complex visual search problems. The model has a powerful multi-round interactive reasoning capability to locate targets through deep exploration and trial-and-error. In high-resolution images, Mini-o3 can accurately recognize targets even if they are tiny and have many distractions. The model performs well in multiple visual search benchmarks and demonstrates excellent visual reasoning capabilities. all code, models and datasets of Mini-o3 have been open-sourced to facilitate reproduction and further research, and provide strong support for the development of the visual search field.

Features of Mini-o3

Multi-round interactive reasoning: Mini-o3 is capable of deep multi-round inference, solving complex visual search problems through step-by-step exploration and trial-and-error, and the number of interaction rounds can be scaled up to dozens of rounds, capable of handling complex visual tasks.
Diversified Reasoning Model: The model supports multiple inference modes, including depth-first search, trial-and-error, and goal maintenance, etc., and flexibly adapts inference strategies to different problems.
High Resolution Image Processing: In high-resolution images, the model can accurately localize and identify the target even if the target is small and there are a large number of interfering objects, demonstrating a strong visual search capability.
superior performance: Mini-o3 achieves state-of-the-art results in several visual search benchmarks, such as excelling on VisualProbe, V* Bench, HR-Bench, MME-Realworld and other datasets.
open source: All codes, models and datasets of Mini-o3 have been open-sourced to facilitate reproduction and further research by researchers and to advance related technologies.

Mini-o3's core strengths

Powerful reasoning: Mini-o3 is equipped with in-depth multi-round reasoning capability, which can solve complex visual search problems through step-by-step exploration and trial-and-error, and can accurately localize and identify targets in high-resolution images with smaller targets and more interference.
Flexible reasoning strategies: It supports a variety of reasoning modes, such as depth-first search, trial-and-error, and goal maintenance, which can flexibly adjust the reasoning strategy according to different scenarios and improve the efficiency and accuracy of problem solving.
Open Source and Scalability: All code, models, and datasets of Mini-o3 have been open-sourced for easy reproduction and further study by researchers.
Innovative datasets and training methods: By constructing challenging visual search datasets (e.g., Visual Probe Dataset) and employing innovative training methods such as cold-start supervised fine-tuning (SFT) and reinforcement learning (RL), Mini-o3 is able to better learn complex inference patterns and improve the model's generalization ability.

What is Mini-o3's official website

Project website:: https://mini-o3.github.io/
GitHub repository:: https://github.com/Mini-o3/Mini-o3
HuggingFace Model Library:: https://huggingface.co/Mini-o3/models
arXiv Technical Paper:: https://arxiv.org/pdf/2509.07969

People for whom Mini-o3 is intended

Computer vision field: Scholars and researchers working on visual search, target detection, image recognition, etc., to reproduce, improve, and extend, and to promote the development of related technologies.
software engineer: Integrate Mini-o3 models when developing applications involving visual search functions (e.g., e-commerce searches, smart homes, surveillance systems, etc.) to enhance the visual reasoning capabilities of the applications.
data scientist: Improve the efficiency and accuracy of data processing when processing and analyzing visual data.
e-commerce company: Enhance the accuracy and user experience of product search by integrating Mini-o3 model to help users find their target products faster.
Smart Home Enterprise: In smart home environments, use Mini-o3's visual search capability to help users quickly find lost items and enhance the intelligence of the product.