Z-Image - Ali Tongyi Labs open source image generation model
Z-Image is an open source image generation model from Ali Tongyi Labs with efficient, fast and powerful image generation capabilities. Using a single-stream diffusion Transformer architecture (S3-DiT), it integrates text, visual semantics and image VAE tokens into a unified input stream...
ROCK - Alibaba open source smart body training environment sandbox
ROCK (Reinforcement Open Construction Kit) is Alibaba's open source sandbox for training environment of intelligences, which solves the problem that intelligences can't be scaled up for training in real environments.ROCK provides a highly stable sandbox management service...
ViMax - Open Source Multi-intelligent Body Video Generation Framework at the University of Hong Kong
ViMax is an open source multi-intelligence body video generation framework from the Data Science Laboratory of the University of Hong Kong, which can automate the whole process from creative input to video output. Integration of script generation , scene design , shot planning and video rendering and other functions , to support users to generate coherent film and television grade video through natural language description ...
FLUX.2 - Black Forest Open Source Image Generation and Editing Model
FLUX.2 is an open source image generation and editing model released by Black Forest Labs that supports textual raw images, multi-image referencing, and image editing with richer details, clear textures, and stable lighting. There are four versions: FLUX.2 [pro] (comparable to the top closed source...
Fara-7B - Microsoft's open-source computer-operated Agent assistant model
Fara-7B is a Microsoft open source release of a 7-billion-parameter-scale computer-operated agent (CUA) model based on the Qwen 2.5-VL-7B architecture. By visually parsing web page screenshots and performing clicks, inputs, and other actions on the screen, without relying on additional accessibility trees or multiple large models...
HunyuanOCR - Tencent's open source expert model for optical character recognition
HunyuanOCR is a high-performance optical character recognition model open-sourced by the Tencent hybrid team, with a reference number of only 1 billion. Developed based on the hybrid multimodal architecture, it adopts an end-to-end design and can efficiently handle text detection, recognition and document parsing tasks. The model scored 94.1 points in the complex document test, surpassing...
Supertonic - Open source, high performance AI text-to-speech system that runs offline very fast!
Supertonic is open source, high-performance text-to-speech (TTS) system focused on rapid speech generation on local devices. Using ONNX Runtime technology, it can run on devices such as cell phones, computers and even Raspberry Pi, supports 23 languages and speech clones, and requires no network...
MiMo-Embodied - Xiaomi's Open Source Cross-Domain Embodied Intelligence Pedestal Model
MiMo-Embodied is the world's first cross-embodied base model that successfully integrates Embodied AI and autonomous driving open-sourced by Xiaomi Group. It solves the knowledge migration problem between Embodied AI and autonomous driving, and realizes the unified modeling of tasks in the two fields.
MOSS-Speech - Fudan University's open source speech-to-speech grand modeling
MOSS-Speech is an open source speech-to-speech (Speech-to-Speech) big model by Prof. Qiu Xipeng's team at Fudan University. It breaks through the traditional speech processing, without the need for text guidance, and directly understands and generates speech, which can capture non-text elements such as intonation and emotion, making...
Parallax - The world's first fully autonomous AI operating system open-sourced by Gradient
Parallax is the world's first "fully autonomous AI operating system" open-sourced by Gradient, a distributed AI lab. It supports cross-platform deployment of large models on Mac, Windows and other heterogeneous devices, allowing users to fully control the model, data and AI memory. The system is built-in network-aware ...









