SoulX-Podcast - Soul AI Lab's Open Source Conversational Speech Synthesis Model
SoulX-Podcast is Soul AI Lab's open source advanced multi-speaker conversational speech synthesis model designed for generating high quality podcast content. SoulX-Podcast has the ability to generate multiple rounds of conversations, which can simulate smooth conversations in real podcasting scenarios, and supports Mandarin, English, and multiple Chinese...
GigaBrain-0 - Open source embodied base model driven by world model generation data
GigaBrain-0 is the first end-to-end Vision-Language-Action (VLA) embodied base model in China that uses world model generation data to realize real machine generalization, and it is jointly released as open source by GigaVision and Hubei Humanoid Robot Innovation Center. It adopts the hybrid Transformer architecture, integrating ...
Ming-flash-omni-Preview - Ant Group's open source fully modal large models
Ming-flash-omni-Preview is an open-source full-modal macromodel released by Ant Group inclusionAI, with a parameter scale of hundreds of billions, based on the sparse MoE architecture of Ling 2.0, with total parameters of 103B and activations of 9B. in full-modal understanding and generating...
OmniVinci - NVIDIA's Open Source Omnimodal Large Language Model
OmniVinci is an open-source, fully modal large-scale language model developed by NVIDIA that solves the problem of modal fragmentation in multimodal models through architectural innovation and data optimization. Alignment of visual and audio embeddings is enhanced by OmniAlignNet, which utilizes temporally embedded group capture...
olmOCR 2 - AI2 open source multimodal document parsing model
olmOCR 2 is an open source multimodal document parsing model from the Allen Institute for Artificial Intelligence (AI2) and is an upgraded version of olmOCR. The digitized printed documents (e.g. PDF) will be high...
ValueCell - Open Source Multi-Intelligence Financial Platform with Multiple Agents to Divide the Work
ValueCell is an open source multi-intelligent body financial application platform that improves the efficiency of financial analysis and investment management through AI technology. Simulating a professional investment team, multiple AI intelligences work together, covering market analysis, sentiment analysis, fundamental research, automated trading and other functions, to provide users with a comprehensive...
Dexbotic - The Force Spirit machine open source body intelligence VLA model one-stop research service platform
Dexbotic is the open source Visual-Linguistic-Action (VLA) model of embodied intelligence one-stop scientific research service platform of Dexmal, which solves the problems of fragmentation and low efficiency of research in the field of embodied intelligence. Based on PyTorch, Dexbotic is a one-stop research service platform to solve the problems of fragmentation and inefficiency in the field of embodied intelligence...
LongCat-Video - LongCat open source video generation model of the Mission
LongCat-Video is a 1.36 billion parameter video generation model open source by the LongCat team, using the MIT open source protocol, supporting three major tasks: text-generated video, graph-generated video and video continuation. The model through the "coarse to fine" generation strategy and block sparse attention mechanism, can be in a number of minutes ...
DreamOmni2 - HKUST open source multimodal AI image editing and generation models
DreamOmni2 is a multimodal AI image editing and generation model open-sourced by Jiajia's team at HKUST. Can handle both text and image commands, supports multiple reference images, providing creators with more flexible ways of creation. The model is trained using a three-stage data synthesis process , joint training generation/editing...
Mixed World Model 1.1 - Tencent Mixed World Released Open Source 3D Reconstructed Large Model
WorldMirror 1.1 (WorldMirror) is an open source 3D reconstruction of large models released by Tencent's WorldMirror team, which is an upgraded version of the WorldMirror series. It supports multi-view images, videos, and multi-modal a priori inputs such as camera position, internal reference, depth map, etc. It breaks through the traditional 3D reconstruction that only relies on...









