Fun-Audio-Chat-8B - Ali Tongyi Open Source End-to-End Speech Interaction Grand Modeling
Fun-Audio-Chat-8B is an open source 8 billion parameter end-to-end speech big model by Ali Tongyi team, direct speech in speech out, no need for ASR+LLM+TTS splicing, bilingual fluent in Chinese and English, with low latency and natural timbre. Using dual-resolution shared LLM with 25Hz...
PromptFill - Open Source Structured Prompt Word Generation AI Tool Designed for AI Drawing
PromptFill is a structured cue generation tool designed specifically for AI painting, which helps users quickly build, manage and iterate complex prompts through visual "fill-in-the-blank" interactions, improving the efficiency and quality of AI image generation.PromptFill's core features...
GLM-4.7 - Wisdom Spectrum AI open source the latest generation of flagship large models
GLM-4.7 is the latest generation of flagship grand model released and open-sourced by Smart Spectrum AI, which is deeply optimized for AI programming, complex reasoning and intelligent body tasks. The model supports 200k context length and 128k maximum output, with multi-language coding, long-range task planning and tool collaboration capabilities...
NitroGen - NVIDIA's open-source gaming AI model in conjunction with Stanford, Caltech, and others
NitroGen is an open source gaming AI model developed by NVIDIA in conjunction with Stanford University, Caltech, and other institutions, capable of playing over 1,000 different types of games. The model is based on the GROOT N1.5 architecture, and is realized by analyzing 40,000 hours of game video data (including joystick operation annotation)...
Qwen-Image-Layered - AI image editing model open-sourced by Ali team
Qwen-Image-Layered is an open source AI image editing model by Ali team, which can intelligently decompose ordinary images into independent transparent layers to achieve accurate editing similar to Photoshop. The model is open source using the Apache 2.0 protocol and supports flexible control of layers...
VTP - MiniMax Conch Video Team's Open Source Visual Generative Modeling Technology
VTP (Visual Tokenizer Pre-training) is a key technology for visual generative models proposed by MiniMax Conch Video team, which enhances the performance of generative systems by improving the pre-training method of visual tokenizer (tokenizer). The traditional method...
T5Gemma 2 - Google's open source next generation encoder-decoder model
T5Gemma 2 is a new generation encoder-decoder model open-sourced by Google, based on the Gemma 3 architecture upgraded with multimodal and long context processing capabilities. It supports a wide range of data types, including text and images, and is capable of handling very long contexts (up to 128K) in generating...
FunctionGemma - Google open source lightweight AI model optimized for function calls
FunctionGemma is a lightweight AI model optimized for function calls launched by Google, developed based on the Gemma 3 base model with 270 million parameters, which can convert natural language into executable API instructions in real time on cell phones, browsers and other devices. The core feature is support for local off...
SHARP - Apple's open source monocular view 3D scene synthesis technology
SHARP (Sharp Monocular View Synthesis in Less Than a Second) is Apple's open source monocular view synthesis technology. It can quickly generate a realistic 3D representation of a scene from a single photo in less than a second...
TRELLIS.2 - Microsoft Open Source Large Scale 3D Generative Modeling
TRELLIS.2 is a Microsoft open source large-scale 3D generative model , with 4 billion parameters , focusing on high-fidelity image to 3D generation . Using the innovative "O-Voxel" sparse voxel structure , can efficiently handle complex topology and sharp features , to generate high-quality 3D information with full PBR material ...









