FireRedChat - Little Red Book's open source full-duplex voice interaction system
FireRedChat is an open source full-duplex voice interaction system for Xiaohongshu with real-time bidirectional dialog capabilities and support for controlled interruptions. Adopts a modular design , including transcription control module , interaction module and dialogue manager , etc., supports cascade and semi-cascade architecture , can be flexibly deployed .
Logics-Parsing - Ali open source document parsing model
Logics-Parsing is an open source Ali end-to-end document parsing model , based on Qwen2.5-VL-7B. Optimize document layout analysis and reading order inference through reinforcement learning , PDF images can be converted to structured HTML output to support a variety of content ...
Ring-1T-preview - Ant Group's open-source trillion-parameter macromodel
Ring-1T-preview is an open source trillion-parameter big model of Ant Group, based on Ling 2.0 MoE architecture, pre-trained on 20T corpus, and trained in reasoning ability by self-developed reinforcement learning system ASystem. In natural language reasoning ...
RoboBrain-X0 - Wisdom Source Research Institute open source zero-sample cross ontology generalized embodiment model
RoboBrain-X0 is the world's first open source embodied model that supports zero-sample cross-ontology generalization open-sourced by Wisdom Source Research Institute, which is of great industrial significance. It can drive multiple real robots of different configurations to complete basic operation tasks without fine-tuning, and after a small amount of sample fine-tuning, it demonstrates the ability to replicate ...
Lynx - ByteHop's open source high-fidelity video generation model
Lynx is a high-fidelity personalized video generation model open-sourced by ByteDance that can generate identity-consistent videos with only a single portrait photo. Built on the diffusion Transformer (DiT) base model , the introduction of ID-adapter and Ref-adapte...
DeepSeek-V3.2-Exp - DeepSeek's latest open source experimental AI model
DeepSeek-V3.2-Exp is a DeepSeek open source experimental AI model that significantly improves the efficiency of long text processing by introducing the DeepSeek Sparse Attention (DSA) mechanism. The model is based on DeepSeek...
HunyuanImage 3.0 - Tencent open source free multimodal image generation model
HunyuanImage 3.0 (HunyuanImage 3.0) is a native multimodal image generation model released and open-sourced by Tencent. The model parameter size of 80B, is currently the best evaluation results, the largest number of parameters of the open source image generation model. Hybrid Image 3.0 supports real-time image generation, users can side...
Hunyuan3D-Part - Tencent open source free 3D components to generate models
Hunyuan3D-Part (Hybrid 3D-Part) is a 3D generation model released and open-sourced by Tencent. Composed of P3 - SAM and X - Part, it realizes high-precision and controllable component-based 3D generation for the first time, and supports 50 + components to be generated automatically. Users can first use...
AudioFly - KU Xunfei open source text generation sound AI models
AudioFly is KDDI open source AI model for text to generate sound effects. Based on the potential diffusion model architecture, with 1 billion parameters, trained on large-scale, diverse audio text datasets, covering AudioSet, AudioCaps, TUT and other public datasets and internal...
Hunyuan3D-Omni - Tencent Mixed-Year Open Source 3D Model Generation Framework
Hunyuan3D-Omni (Hybrid 3D-Omni) is an open source 3D asset generation framework by Tencent's Hybrid 3D team, which realizes accurate 3D model generation through multiple control signals. Based on Hunyuan3D 2.1 architecture, it introduces a unified control encoder that can handle point...









