Step-GUI - Step-Star Open Source AI Agent Series Models
Step-GUI is Step-Star's open source AI Agent series of models, including the cloud model Step-GUI, the first MCP protocol for GUI Agents, and the industry's first open source end-side model Step-GUI Edge to support cell phone deployment.Specialized...
A2UI - Google's open source declarative protocol for Agent-driven user interaction interfaces
A2UI (Agent-to-User Interface) is Google's open-source Agent-driven interface protocol that solves the problem of generating complex interactive interfaces for AI agents. Through a declarative JSON format that allows AI agents to describe the structure of the user interface , client applications ...
SAM Audio - Open Source Multimodal Audio Segmentation Model from Meta
SAM Audio is an open source multimodal audio segmentation model introduced by Meta to accurately separate arbitrary target sounds from complex audio mixes. By combining textual, visual, and temporal dimensional cues, it enables flexible and efficient audio processing for tasks such as audio editing, denoising, sound extraction, and...
Mixed World Model 1.5 - Tencent Mixed Open Source Real-time World Model Generation Framework
Mixed World Model 1.5 (Tencent HY WorldPlay) is the industry's first open source real-time world modeling framework released by Tencent, covering the entire chain of data, training, and streaming inference deployment. The core is the WorldPlay autoregressive diffusion model, which uses Next-F...
Molmo 2 - Ai2 open source multimodal video image understanding model series
Molmo 2 is an open source multimodal model released by the Allen Institute for AI (Ai2) to improve video and multi-image understanding. Three variants are included; Molmo 2 (8B), Molmo 2 (4B) and Molmo 2-O...
LongCat-Video-Avatar - MeiTuan open source avatar video generation model
LongCat-Video-Avatar is an advanced audio-driven video generation model built on LongCat-Video open-sourced by Meituan, focusing on generating hyper-realistic, lip-synchronized long videos with natural dynamics and consistent identity.
MiMo-V2-Flash - a large model of the open source MoE architecture released by Xiaomi
MiMo-V2-Flash is an open source MoE architecture large model released by Xiaomi, with 309 billion total parameters and 15 billion active parameters, focusing on efficient reasoning and intelligent body applications. The model adopts hybrid attention architecture and multi-word meta-prediction technology, with an inference speed of 150 tokens/second, into...
Nemotron 3 - A family of open source AI models released by NVIDIA
Nemotron 3 is a family of open source AI models released by NVIDIA in Nano, Super and Ultra sizes. It adopts the hybrid potential expert hybrid (latent MoE) architecture to significantly improve inference efficiency and reduce operating costs. Among them...
Wan-Move - Ali Tongyi's open source AI video generation framework with Tsinghua and others
Wan-Move is an open source AI video generation framework jointly developed by Ali Tongyi Labs, Tsinghua University and other organizations, focusing on high-quality video synthesis through precise motion control technology. The core technology is "potential trajectory guidance", which can seamlessly add point-level motion control to the existing image-to-video model...
PaCoRe - Step Star's open source parallel collaborative AI reasoning framework
PaCoRe (Parallel Coordinated Reasoning) is StepFun's open source innovative parallel collaborative reasoning framework, through a massively parallel thinking mechanism, from multiple perspectives to simultaneously explore the problem solution, breaking through the traditional...








