Qwen3-Max-Preview - The Flagship Big Language Model from Tongyi Qianqian
Qwen3-Max-Preview is the latest flagship large language model released by Tongyi Qianwen. It is the model with the largest number of parameters in the Qwen3 family, with a parameter size of over 1 trillion. The model has significant improvements in inference, instruction following, multi-language support and long-tail knowledge coverage...
OneCAT - Open source multimodal modeling by Meituan and Shanghai Jiaotong University
OneCAT is a new unified multimodal model launched by Meituan in conjunction with Shanghai Jiaotong University, which adopts a pure decoder architecture and can seamlessly integrate multimodal comprehension, text-to-image generation and image editing functions. The model abandons the design of traditional multimodal models that rely on external visual coders and disambiguators through modality-specific...
Claudable - Open Source AI Web Application Builder, Natural Language Generated Code
Claudable is an open source web application builder based on Next.js that combines the advanced AI agent capabilities of Claude Code and Cursor CLI with Lovable's simple and intuitive application building experience....
FineVision - Open Source Visual Language Dataset from Hugging Face
FineVision is Hugging Face's open source visual language dataset for training advanced visual language models. It contains 17.3 million images, 24.3 million samples, 88.9 million rounds of dialog, and 9.5 billion answer tokens. The dataset aggregates...
HunyuanWorld-Voyager - Tencent open source ultra-long roaming world model
HunyuanWorld-Voyager (Hunyuan Voyager for short) is the industry's first ultra-long roaming world model released by Tencent that supports native 3D reconstruction. It is a novel video diffusion framework that generates a 3D point cloud sequence of user-defined camera paths from a single image, supporting...
Hunyuan-MT-7B - Tencent Mixed Meta Open Source Lightweight Translation Model
Hunyuan-MT-7B is a lightweight translation model introduced by Tencent's Mixed Meta Team, with 7 billion references, supporting the mutual translation of 33 languages and 5 folk-Chinese languages/dialects, including Cantonese, Uyghur, and Tibetan. In the International Association for Computational Linguistics (ACL) WMT2025 competition...
Step-Audio 2 mini - Step-Star Open Source Speech Megamodels
Step-Audio 2 mini is an open source end-to-end speech grand model of Step-Audio. It breaks through the traditional speech model structure and adopts the true end-to-end multimodal architecture, which directly transforms the original audio input into speech response output with lower latency, and understands paralinguistic information and non-vocal signals.
MobileCLIP2 - Apple's Open Source Efficient End-Side Multi-Modal Modeling
MobileCLIP2 is an upgraded version of MobileCLIP, an efficient end-side multimodal model introduced by Apple researchers. It is optimized in terms of multimodal reinforcement training by training better-performing CLIP instructor model integration on DFN datasets and improved graphical raw...
InternVL3.5 - Shanghai AI Lab Open Source Multimodal Large Models
InternVL3.5 (Shusheng-Wanxiang 3.5) is an open source multimodal large model of the Shanghai Artificial Intelligence Laboratory, the model is fully upgraded in terms of general ability, reasoning ability and deployment efficiency, providing nine sizes of versions from 1 billion to 241 billion parameters, covering different resource demand scenarios, including thick...
FastVLM - Visual Language Model from Apple
FastVLM (Fast Vision Language Model) is an efficient visual language model introduced by Apple Inc. With FastViTHD hybrid visual coder as the core, it incorporates convolutional and Transformer architectures to significantly reduce visual...









