MobileLLM-R1 - Meta open source special efficient inference model series
MobileLLM-R1 is Meta's open source series of efficient inference models designed for mathematical, programming and scientific reasoning. It contains a base model and a final model, with 140 million, 360 million and 950 million parameter versions, respectively. The models are not generic chat models and are supervised fine-tuned (SFT...
ERNIE-4.5-21B-A3B-Thinking - Baidu open source reasoning thinking model
ERNIE-4.5-21B-A3B-Thinking is Baidu's open source large-scale language model focused on reasoning tasks. Using the Mixed Expert (MoE) architecture , the total number of references to 21 billion , each token activates 3 billion parameters to support 128K long context window ...
MobiAgent - Shanghai Jiaotong University open source mobile intelligent body full-stack building framework
MobiAgent is an open source mobile intelligent body toolchain from IPADS Lab of Shanghai Jiaotong University, which helps users to build their own mobile intelligent assistants. By recording the user's operation trajectory and generating high-quality data, it trains an intelligent body that can understand natural language commands. Core features include efficient...
ZipVoice - Xiaomi's open source speech synthesis model series
ZipVoice is a series of speech synthesis (TTS) models based on the Flow Matching architecture released by Xiaomi, including ZipVoice (zero-sample single-speaker speech synthesis model) and ZipVoice-Dialog (zero-sample conversational speech synthesis...
PP-OCRv5 - Baidu's open source AI model for next-generation text recognition
PP-OCRv5 is the latest generation of text recognition AI model released by Baidu. With a lightweight design and a reference count of only 0.07B, it is suitable for efficient operation on CPU and edge devices, and can process more than 370 characters per second. The model supports Simplified Chinese, Traditional Chinese, English, Japanese and Pinyin...
Youtu-GraphRAG - Tencent Youtu Labs Open Source Graph Retrieval Augmentation Generation Framework
Youtu-GraphRAG is an open source graph retrieval augmentation generation framework from Tencent's Youtu Labs to help large language models handle complex Q&A tasks more accurately. By constructing a four-layer knowledge tree, the knowledge is disassembled into four levels of attributes, relationships, keywords and communities to realize the self-directed performance of cross-domain knowledge...
Stand-In - Tencent WeChat Visual Open Source Lightweight Video Generation Framework
Stand-In is a lightweight, plug-and-play identity-preserving video generation framework from Tencent's WeChat Vision team. Focusing on preserving specific identity features in video generation, it only needs to train the additional parameters of the base model 1%, and can achieve excellent results in face similarity and naturalness.
IndexTTS2 - B station open source free TTS model, the first to support precise duration control
IndexTTS2 is a new free text-to-speech (TTS) model open-sourced by the B station voice team, which realizes a major breakthrough in emotional expression and duration control, the first autoregressive TTS model that supports precise duration control. Supports zero-sample voice cloning, only one audio file can accurately copy the sound...
HuMo - Tsinghua University United Bytes open source multimodal video generation framework
HuMo is a multi-modal video generation framework jointly open-sourced by Tsinghua University and ByteDance Intelligent Creation Lab, focusing on human-centered video generation. It can generate high-quality, fine-grained and controllable human videos from a variety of modal inputs such as text, images and audio.HuMo supports a powerful text cue-following capability...
AntSK FileChunk - Free AI Semantic Document Slicing Tool, Dynamic Slicing Adjustment
AntSK FileChunk is a free intelligent document slicing tool designed for RAG (Retrieval Augmented Generation) applications. Semantic as the core, the document will be intelligently sliced into semantically complete, coherent segments , support for multi-language , can dynamically adjust the size of the slice to ensure that the context of coherence.