AI Sharing Circle

AI is changing the world!
olmOCR 2 - AI2开源的多模态文档解析模型

olmOCR 2 - AI2 open source multimodal document parsing model

olmOCR 2 is an open source multimodal document parsing model from the Allen Institute for Artificial Intelligence (AI2) and is an upgraded version of olmOCR. The digitized printed documents (e.g. PDF) will be high...
7mos ago
039.8K
ValueCell - 开源的多智能体金融平台,多个Agent分工协作

ValueCell - Open Source Multi-Intelligence Financial Platform with Multiple Agents to Divide the Work

ValueCell is an open source multi-intelligent body financial application platform that improves the efficiency of financial analysis and investment management through AI technology. Simulating a professional investment team, multiple AI intelligences work together, covering market analysis, sentiment analysis, fundamental research, automated trading and other functions, to provide users with a comprehensive...
7mos ago
059.5K
Dexbotic - 原力灵机开源的具身智能VLA模型一站式科研服务平台

Dexbotic - The Force Spirit machine open source body intelligence VLA model one-stop research service platform

Dexbotic is the open source Visual-Linguistic-Action (VLA) model of embodied intelligence one-stop scientific research service platform of Dexmal, which solves the problems of fragmentation and low efficiency of research in the field of embodied intelligence. Based on PyTorch, Dexbotic is a one-stop research service platform to solve the problems of fragmentation and inefficiency in the field of embodied intelligence...
7mos ago
030.2K
LongCat-Video - 美团LongCat开源的视频生成模型

LongCat-Video - LongCat open source video generation model of the Mission

LongCat-Video is a 1.36 billion parameter video generation model open source by the LongCat team, using the MIT open source protocol, supporting three major tasks: text-generated video, graph-generated video and video continuation. The model through the "coarse to fine" generation strategy and block sparse attention mechanism, can be in a number of minutes ...
7mos ago
053K
DreamOmni2 - 港科大开源的多模态AI图像编辑与生成模型

DreamOmni2 - HKUST open source multimodal AI image editing and generation models

DreamOmni2 is a multimodal AI image editing and generation model open-sourced by Jiajia's team at HKUST. Can handle both text and image commands, supports multiple reference images, providing creators with more flexible ways of creation. The model is trained using a three-stage data synthesis process , joint training generation/editing...
7mos ago
038K
混元世界模型1.1 - 腾讯混元发布的开源3D重建大模型

Mixed World Model 1.1 - Tencent Mixed World Released Open Source 3D Reconstructed Large Model

WorldMirror 1.1 (WorldMirror) is an open source 3D reconstruction of large models released by Tencent's WorldMirror team, which is an upgraded version of the WorldMirror series. It supports multi-view images, videos, and multi-modal a priori inputs such as camera position, internal reference, depth map, etc. It breaks through the traditional 3D reconstruction that only relies on...
7mos ago
037K
DeepSeek-OCR - DeepSeek开源的光学字符识别模型

DeepSeek-OCR - DeepSeek open source optical character recognition model

DeepSeek-OCR is an advanced optical character recognition (OCR) model open-sourced by the DeepSeek team, which converts text into images through "contextual optical compression" technology, and utilizes visual tokens for compression and decoding to achieve efficient long text processing.
7mos ago
042.4K
VitaBench - 美团LongCat开源的交互式Agent评测基准

VitaBench - MMT LongCat Open Source Interactive Agent Review Benchmarks

VitaBench is the first interactive Agent evaluation benchmark for complex life scenarios released by the LongCat team of Meituan, assessing the comprehensive capabilities of large model intelligences in real life scenarios. The three high-frequency life scenarios of take-away ordering, restaurant dining, and traveling are used as the carrier to build the package...
7mos ago
033.6K
MinerU2.5 - 上海AI Lab联合北大开源的文档解析模型

MinerU2.5 - Shanghai AI Lab and Peking University open source document parsing model

MinerU2.5 is a decoupled visual language model jointly developed by Shanghai Artificial Intelligence Laboratory (AIL) and Peking University, focusing on efficiently processing high-resolution document image parsing. The core innovation lies in the two-phase design of "global layout detection followed by local content recognition": the first phase is a low-resolution...
7mos ago
047.9K
LongCat-Audio-Codec - 美团LongCat开源的语音编解码方案

LongCat-Audio-Codec - LongCat open source voice codec solution for Meituan

LongCat-Audio-Codec is an open source speech codec solution from the LongCat team of Meituan. The program is designed for Speech Large Language Model (Speech LLM), through the semantic and acoustic dual Token parallel extraction mechanism , taking into account the semantic and acoustic features of speech ...
7mos ago
031.3K