PaddleOCR-VL - Baidu open source ultra-lightweight visual-linguistic models
PaddleOCR-VL is Baidu's open source ultra-lightweight visual-language model, optimized for document parsing scenarios. The model contains only 0.9B parameters , through the fusion of dynamic high-resolution visual coder and lightweight ERNIE language model , while maintaining high accuracy and significantly reduce the computational overhead .
UniPixel - Pixel-level multimodal model open-sourced by Hong Kong Polytechnic, Tencent, Chinese Academy of Sciences and others
UniPixel is a novel multimodal model jointly proposed by Hong Kong Polytechnic University, Tencent, Chinese Academy of Sciences and Vivo to achieve pixel-level visual language understanding. By unifying object referencing and segmentation capabilities, it supports a variety of fine-grained tasks such as image segmentation, video segmentation, region understanding, and pi...
DiaMoE-TTS - Tsinghua and Giant Networks open source multi-dialect speech synthesis framework
DiaMoE-TTS is a multi-dialect speech synthesis framework jointly open-sourced by Tsinghua University and Giant Network, based on the International Phonetic Alphabet (IPA), to solve the problems of dialect data scarcity, orthographic inconsistency, and complex phonological changes. Through a unified IPA front-end standardized phoneme representation to eliminate cross-dialect differences ...
Kandinsky 5.0 - Russian AI Team's Open Source Video Generation Model Series
Kandinsky 5.0 is the latest video generation model series developed by Russian AI team, focusing on lightweight design and high performance performance. The first model in the series, Kandinsky 5.0 Video Lite, has only 2 billion parameters but surpasses similar 14B models, especially...
SongBloom - Tencent's open source song generation model with HKCNU and NTU.
SongBloom is an open source song generation model developed by Tencent AI Lab in collaboration with The Chinese University of Hong Kong (Shenzhen) and Nanjing University, which solves the problem of "plasticity" in AI music generation, and realizes high-quality, structurally complete song generation. Simply enter 10 seconds of reference audio and corresponding lyrics, and you can...
Pyscn - Free AI code quality analysis tool open-sourced specifically for Python developers
Pyscn is an intelligent code quality analysis tool designed for Python developers to detect potential problems in code to improve maintainability. It analyzes dead code through control flow diagrams, identifies duplicate code using APTED+LSH algorithm, calculates metrics such as module coupling and circle complexity...
Youtu-Embedding - Tencent Youtu open source generalized text representation model
Youtu-Embedding is a generalized text representation model open-sourced by Tencent's Youtu Lab, designed for enterprise-level applications. Through deep neural networks to map the text to a high-dimensional vector space, so that semantically similar sentences are closer in that space, to achieve accurate semantic retrieval.
SAIL-VL2 - ByteHop's open source multimodal visual language model
SAIL-VL2 is an open source multimodal visual language model by the Byte Jump team, focusing on joint modeling of multimodal inputs such as images and text. Using the sparse mixture of experts (MoE) architecture and progressive training strategy, it achieves high performance at parameter scales from 2B to 8B, especially in the areas of graphic comprehension, math...
MineContext - Bytes Open Source Active Context-Aware AI Partner
MineContext is an active context-aware AI partner open-sourced by the ByteDance Viking team to help users efficiently manage massive amounts of information and improve the efficiency of knowledge work. Over the screenshot and content understanding technology, automatically record the user's daily operations (such as browsing the web, editing documents, etc.), support...
nanochat - Karpathy's free and open source low-cost model training program
nanochat is an open source project released by AI legend and former Tesla AI Director Andrej Karpathy that allows individuals to quickly train a small ChatGPT-like language model at a very low cost and simplicity. The entire project uses only about 800...








