AudioGen-Omni - Multimodal Audio Generation Model from Racer
AudioGen-Omni is a multimodal audio generation model from Racer that generates high-quality audio, speech, and songs based on inputs such as video, text, etc.AudioGen-Omni is based on advanced techniques such as multimodal diffusionTransformer and phase-aligned...
RedOne - the latest social mega-model from Little Red Book
RedOne is a large language model customized for social networks introduced by Little Red Book. The model is trained through a three-stage training strategy that incorporates social and cultural knowledge, strengthens multitasking capabilities, and aligns human preferences.RedOne significantly outperforms the base model in social task performance, in harmful content detection and browsing...
FastDeploy - Baidu's high-performance large model reasoning and deployment tool
FastDeploy is a high-performance reasoning and deployment tool from Baidu, designed for Large Language Models (LLMs) and Visual Language Models (VLMs).FastDeploy is developed based on the Flying Paddle (PaddlePaddle) framework, and supports a variety of hardware platforms...
InteriorGS - 3D Gaussian Semantic Dataset launched by Qunar Technologies
InteriorGS is a high-quality 3D Gaussian semantic dataset introduced by Qunar Technology. The dataset contains 1,000 3D scenes covering more than 80 indoor environments such as homes, convenience stores, wedding halls and museums. The dataset has more than 554,000 object instances in 755 categories...
DragonV2.1 - Zero-Sample Speech Synthesis Model from Microsoft
DragonV2.1 is an advanced zero-sample text-to-speech (TTS) model from Microsoft. Based on the Transformer architecture, the model supports multi-language and zero-sample speech cloning, and generates natural, expressive speech with only 5-90 seconds of voice prompts.
ScreenCoder - Open Source UI Screenshot Generation Front-End Code Tool
ScreenCoder is an open source intelligent tool to quickly convert UI design screenshots into high quality HTML/CSS code. Tools based on modular multi-intelligence architecture , combined with visual understanding , layout planning and code synthesis techniques to support the generation of high-precision and semantic front-end ...
Kimi K2 High-Speed Edition - High-Speed Edition of the language model released by Dark Side of the Moon Kimi
Kimi K2 High Speed Edition (kimi-k2-turbo-preview) is a high-performance language model introduced by Kimi, the Dark Side of the Moon. The model is optimized on the basis of Kimi K2, the output speed is greatly increased, and 40 Token per second can be generated...
dots.ocr - the open source multilingual document parsing model launched by the Little Red Book hi lab
dots.ocr is a multilingual document parsing model open-sourced by Xiaohongshu hi lab, based on a 1.7 billion-parameter visual language model (VLM), which can efficiently perform document layout detection and content recognition while maintaining a good reading order.
HYPIR - A new large model for image restoration introduced by a team from the Chinese Academy of Sciences
HYPIR is a large model for image restoration introduced by Dong Chao's team at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. The model combines the fractional prior of diffusion modeling with adversarial generative networks to achieve efficient, high-quality image restoration.HYPIR can quickly restore old photos and improve resolution while keeping text clear...
FLUX.1 Krea [dev] - Black Forest and Krea AI joint venture on Vincennes graph models
FLUX.1 Krea [dev] is a text-generated graph model from Black Forest Labs and Krea AI. The model is capable of generating high-quality, photorealistic images based on input text descriptions with a unique aesthetic style that avoids traditional A...









![FLUX.1 Krea [dev] - 黑森林和Krea AI联合推出的文生图模型](https://aisharenet.com/wp-content/uploads/2025/08/1754032748-1754032748-FLUX.1-Krea-dev-website-2.png)