meso- (chemistry)Ming-flash-omni-Preview - Ant Group's open source fully modal large models
Ming-flash-omni-Preview is an open-source full-modal macromodel released by Ant Group inclusionAI, with a parameter scale of hundreds of billions, based on the sparse MoE architecture of Ling 2.0, with total parameters of 103B and activations of 9B. in full-modal understanding and generating...
meso- (chemistry)OmniVinci - NVIDIA's Open Source Omnimodal Large Language Model
OmniVinci is an open-source, fully modal large-scale language model developed by NVIDIA that solves the problem of modal fragmentation in multimodal models through architectural innovation and data optimization. Alignment of visual and audio embeddings is enhanced by OmniAlignNet, which utilizes temporally embedded group capture...
meso- (chemistry)olmOCR 2 - AI2 open source multimodal document parsing model
olmOCR 2 is an open source multimodal document parsing model from the Allen Institute for Artificial Intelligence (AI2) and is an upgraded version of olmOCR. The digitized printed documents (e.g. PDF) will be high...
meso- (chemistry)ValueCell - Open Source Multi-Intelligence Financial Platform with Multiple Agents to Divide the Work
ValueCell is an open source multi-intelligent body financial application platform that improves the efficiency of financial analysis and investment management through AI technology. Simulating a professional investment team, multiple AI intelligences work together, covering market analysis, sentiment analysis, fundamental research, automated trading and other functions, to provide users with a comprehensive...
meso- (chemistry)Dexbotic - The Force Spirit machine open source body intelligence VLA model one-stop research service platform
Dexbotic is the open source Visual-Linguistic-Action (VLA) model of embodied intelligence one-stop scientific research service platform of Dexmal, which solves the problems of fragmentation and low efficiency of research in the field of embodied intelligence. Based on PyTorch, Dexbotic is a one-stop research service platform to solve the problems of fragmentation and inefficiency in the field of embodied intelligence...
LongCat-Video - LongCat open source video generation model of the Mission
LongCat-Video is a 1.36 billion parameter video generation model open source by the LongCat team, using the MIT open source protocol, supporting three major tasks: text-generated video, graph-generated video and video continuation. The model through the "coarse to fine" generation strategy and block sparse attention mechanism, can be in a number of minutes ...
DreamOmni2 - HKUST open source multimodal AI image editing and generation models
DreamOmni2 is a multimodal AI image editing and generation model open-sourced by Jiajia's team at HKUST. Can handle both text and image commands, supports multiple reference images, providing creators with more flexible ways of creation. The model is trained using a three-stage data synthesis process , joint training generation/editing...
Mixed World Model 1.1 - Tencent Mixed World Released Open Source 3D Reconstructed Large Model
WorldMirror 1.1 (WorldMirror) is an open source 3D reconstruction of large models released by Tencent's WorldMirror team, which is an upgraded version of the WorldMirror series. It supports multi-view images, videos, and multi-modal a priori inputs such as camera position, internal reference, depth map, etc. It breaks through the traditional 3D reconstruction that only relies on...
DeepSeek-OCR - DeepSeek open source optical character recognition model
DeepSeek-OCR is an advanced optical character recognition (OCR) model open-sourced by the DeepSeek team, which converts text into images through "contextual optical compression" technology, and utilizes visual tokens for compression and decoding to achieve efficient long text processing.
VitaBench - MMT LongCat Open Source Interactive Agent Review Benchmarks
VitaBench is the first interactive Agent evaluation benchmark for complex life scenarios released by the LongCat team of Meituan, assessing the comprehensive capabilities of large model intelligences in real life scenarios. The three high-frequency life scenarios of take-away ordering, restaurant dining, and traveling are used as the carrier to build the package...









