UnifiedTTS - One-stop TTS API Service Platform, Real-time Performance Monitoring
UnifiedTTS is a one-stop platform for text-to-speech (TTS) services. It supports multiple languages, including Chinese, English, Japanese and Korean, to meet the needs of global business. Through a unified API interface, it integrates many mainstream TTS services, including Micro...
MiniCPM 4.1 - Ultra-efficient end-side grand model introduced by Facing Face Intelligence
MiniCPM 4.1 is an ultra-efficient end-side large language model introduced by Facade Intelligence. With InfLLM v2 sparse attention architecture, each lexeme only needs to calculate the relevance to less than 5% lexemes, which significantly reduces the processing overhead of long text. In a 128K long text scenario...
WeKnora - Tencent WeChat Open Source Document Understanding and Semantic Retrieval Framework
WeKnora is Tencent WeChat team open source based on the Large Language Model (LLM) document understanding and semantic retrieval framework , designed for the structure of complex, heterogeneous document content scenarios and designed to use a modularized architecture , integration of multimodal preprocessing , semantic vector indexing , intelligent recall and large model generative reasoning ...
XTuner V1 - Shanghai AI Lab open source large model training engine
XTuner V1 is a new generation of large model training engine open-sourced by Shanghai Artificial Intelligence Laboratory (SAL), designed for ultra-large scale sparse Mixed Expert (MoE) model training. Developed based on PyTorch FSDP, it achieves high performance through multi-dimensional optimization of memory, communication and load ...
Qwen3-ASR-Flash - A series of speech recognition models launched by Ali Tongyi Qianqian
Qwen3-ASR-Flash is Alibaba's latest high-precision speech recognition model, based on the Qwen3 base model, trained on massive multimodal data. It supports 11 languages and multiple accents, including Mandarin, Sichuan, Minnan, Wu, Cantonese and other dialects...
Qwen3-Max-Preview - The Flagship Big Language Model from Tongyi Qianqian
Qwen3-Max-Preview is the latest flagship large language model released by Tongyi Qianwen. It is the model with the largest number of parameters in the Qwen3 family, with a parameter size of over 1 trillion. The model has significant improvements in inference, instruction following, multi-language support and long-tail knowledge coverage...
OneCAT - Open source multimodal modeling by Meituan and Shanghai Jiaotong University
OneCAT is a new unified multimodal model launched by Meituan in conjunction with Shanghai Jiaotong University, which adopts a pure decoder architecture and can seamlessly integrate multimodal comprehension, text-to-image generation and image editing functions. The model abandons the design of traditional multimodal models that rely on external visual coders and disambiguators through modality-specific...
Claudable - Open Source AI Web Application Builder, Natural Language Generated Code
Claudable is an open source web application builder based on Next.js that combines the advanced AI agent capabilities of Claude Code and Cursor CLI with Lovable's simple and intuitive application building experience....
FineVision - Open Source Visual Language Dataset from Hugging Face
FineVision is Hugging Face's open source visual language dataset for training advanced visual language models. It contains 17.3 million images, 24.3 million samples, 88.9 million rounds of dialog, and 9.5 billion answer tokens. The dataset aggregates...
HunyuanWorld-Voyager - Tencent open source ultra-long roaming world model
HunyuanWorld-Voyager (Hunyuan Voyager for short) is the industry's first ultra-long roaming world model released by Tencent that supports native 3D reconstruction. It is a novel video diffusion framework that generates a 3D point cloud sequence of user-defined camera paths from a single image, supporting...