AI Sharing Circle

AI is changing the world!
Mistral 3 - Mistral AI发布开源的最新多模态大模型系列

Mistral 3 - Mistral AI Releases Open Source's Newest Series of Multimodal Large Models

Mistral 3 is the latest multimodal large model series released as open source by Mistral AI, including the flagship model Mistral Large 3 (675B total parameters) and the lighter version of the Ministral series (3B/8B/14B), both supporting image understanding...
6mos ago
025.9K
Vidi2 - 字节跳动开源的多模态视频理解与生成大模型

Vidi2 - ByteHop's open source multimodal video understanding and generation of large models

Vidi2 is a second-generation multimodal video understanding and generation big model open-sourced by ByteDance, focusing on video content understanding, analysis and creation. It supports joint input of text, video, and audio modalities, and can simultaneously understand picture content, sound information, and natural language commands to achieve cross-modal interaction and push...
6mos ago
029.5K
Alpamayo-R1 - 英伟达开源的带推理能力的视觉-语言-行动模型

Alpamayo-R1 - NVIDIA's Open Source Vision-Language-Action Model with Reasoning Capabilities

Alpamayo-R1 is a NVIDIA-developed Vision-Language-Action (VLA) model with reasoning capability, designed to enhance the decision-making capability of autonomous driving in complex scenarios. By introducing a causal chain reasoning mechanism, the vehicle is able to analyze scene causality (e.g., "cause before...
6mos ago
037.1K
Ovis-Image - 阿里AIDC-AI团队开源的文生图模型

Ovis-Image - Ali AIDC-AI team's open source Vincentian graph model

Ovis-Image is a 7 billion parameter text-generated graph model open-sourced by the AIDC-AI team of Alibaba International Digital Commerce Group, focusing on high-quality text rendering. Based on Ovis-U1 architecture, it inherits the advanced visual decoder and bi-directional Token refiner ...
6mos ago
025K
悟界·Emu3.5 - 智源研究院开源的多模态世界大模型

Wujie-Emu3.5 - Wisdom Source Research Institute open source multimodal world big model

Wujie-Emu3.5 is an open source multimodal world grand model from Beijing Zhiyuan Artificial Intelligence Research Institute, with 34 billion references and native world modeling capability. Trained by 10 trillion multimodal Token (including 790 years of video data), it can simulate the laws of physics and realize graphic generation, visual guidance...
6mos ago
028.5K
GELab-Zero - 阶跃团队开源的端侧多模态GUI Agent模型

GELab-Zero - Open source end-side multimodal GUI Agent model by Steps team

GELab-Zero is an open source end-side multimodal GUI Agent model by Step Leap Team , built on Qwen3-VL-4B-Instruct base model with 4B parameters.It can recognize UI elements and perform operations such as clicking and sliding, and supports cross-application tasking ...
6mos ago
036.2K
Depth Anything 3 - 字节跳动Seed开源的3D视觉重建模型

Depth Anything 3 - 3D Visual Reconstruction Models for ByteHop Seed Open Source

Depth Anything 3 (DA3) is a 3D visual reconstruction model developed and open-sourced by the Byte Jump Seed team. Through a single Transformer architecture to realize the spatial geometry of any viewpoint reconstruction, only need to predict the depth map and ray map can restore the three-dimensional scene, compared to...
6mos ago
038K
DeepSeek-Math-V2 - DeepSeek开源的数学推理模型

DeepSeek-Math-V2 - DeepSeek open source mathematical reasoning model

DeepSeek-Math-V2 is an open source mathematical reasoning model by DeepSeek, an AI company under Phantom Cube, and the latest version is based on DeepSeek-V3.2-Exp-Base improvement, with performance surpassing that of Gemini DeepThink to reach the international number...
6mos ago
030.6K
Z-Image - 阿里通义实验室开源的图像生成模型

Z-Image - Ali Tongyi Labs open source image generation model

Z-Image is an open source image generation model from Ali Tongyi Labs with efficient, fast and powerful image generation capabilities. Using a single-stream diffusion Transformer architecture (S3-DiT), it integrates text, visual semantics and image VAE tokens into a unified input stream...
6mos ago
052.6K
ROCK - 阿里巴巴开源的智能体训练环境沙箱

ROCK - Alibaba open source smart body training environment sandbox

ROCK (Reinforcement Open Construction Kit) is Alibaba's open source sandbox for training environment of intelligences, which solves the problem that intelligences can't be scaled up for training in real environments.ROCK provides a highly stable sandbox management service...
6mos ago
028.6K