Latest AI Resources

Total 2787 articles posts
OmniVinci - NVIDIA开源的全模态大语言模型

OmniVinci - NVIDIA's Open Source Omnimodal Large Language Model

OmniVinci is an open-source, fully modal large-scale language model developed by NVIDIA that solves the problem of modal fragmentation in multimodal models through architectural innovation and data optimization. Alignment of visual and audio embeddings is enhanced by OmniAlignNet, which utilizes temporally embedded group capture...
7dys ago
09.2K
ValueCell - 开源的多智能体金融平台,多个Agent分工协作

ValueCell - Open Source Multi-Intelligence Financial Platform with Multiple Agents to Divide the Work

ValueCell is an open source multi-intelligent body financial application platform that improves the efficiency of financial analysis and investment management through AI technology. Simulating a professional investment team, multiple AI intelligences work together, covering market analysis, sentiment analysis, fundamental research, automated trading and other functions, to provide users with a comprehensive...
1wks ago
013.3K
Dexbotic - 原力灵机开源的具身智能VLA模型一站式科研服务平台

Dexbotic - The Force Spirit machine open source body intelligence VLA model one-stop research service platform

Dexbotic is the open source Visual-Linguistic-Action (VLA) model of embodied intelligence one-stop scientific research service platform of Dexmal, which solves the problems of fragmentation and low efficiency of research in the field of embodied intelligence. Based on PyTorch, Dexbotic is a one-stop research service platform to solve the problems of fragmentation and inefficiency in the field of embodied intelligence...
1wks ago
07.3K
LongCat-Video - 美团LongCat开源的视频生成模型

LongCat-Video - LongCat open source video generation model of the Mission

LongCat-Video is a 1.36 billion parameter video generation model open source by the LongCat team, using the MIT open source protocol, supporting three major tasks: text-generated video, graph-generated video and video continuation. The model through the "coarse to fine" generation strategy and block sparse attention mechanism, can be in a number of minutes ...
1wks ago
014.1K
混元世界模型1.1 - 腾讯混元发布的开源3D重建大模型

Mixed World Model 1.1 - Tencent Mixed World Released Open Source 3D Reconstructed Large Model

WorldMirror 1.1 (WorldMirror) is an open source 3D reconstruction of large models released by Tencent's WorldMirror team, which is an upgraded version of the WorldMirror series. It supports multi-view images, videos, and multi-modal a priori inputs such as camera position, internal reference, depth map, etc. It breaks through the traditional 3D reconstruction that only relies on...
2wks ago
013.8K
VitaBench - 美团LongCat开源的交互式Agent评测基准

VitaBench - MMT LongCat Open Source Interactive Agent Review Benchmarks

VitaBench is the first interactive Agent evaluation benchmark for complex life scenarios released by the LongCat team of Meituan, assessing the comprehensive capabilities of large model intelligences in real life scenarios. The three high-frequency life scenarios of take-away ordering, restaurant dining, and traveling are used as the carrier to build the package...
2wks ago
011.6K
UniPixel - 香港理工、腾讯、中科院等开源的像素级多模态模型

UniPixel - Pixel-level multimodal model open-sourced by Hong Kong Polytechnic, Tencent, Chinese Academy of Sciences and others

UniPixel is a novel multimodal model jointly proposed by Hong Kong Polytechnic University, Tencent, Chinese Academy of Sciences and Vivo to achieve pixel-level visual language understanding. By unifying object referencing and segmentation capabilities, it supports a variety of fine-grained tasks such as image segmentation, video segmentation, region understanding, and pi...
3wks ago
014.8K
DiaMoE-TTS - 清华联合巨人网络开源的多方言语音合成框架

DiaMoE-TTS - Tsinghua and Giant Networks open source multi-dialect speech synthesis framework

DiaMoE-TTS is a multi-dialect speech synthesis framework jointly open-sourced by Tsinghua University and Giant Network, based on the International Phonetic Alphabet (IPA), to solve the problems of dialect data scarcity, orthographic inconsistency, and complex phonological changes. Through a unified IPA front-end standardized phoneme representation to eliminate cross-dialect differences ...
3wks ago
015.7K
SongBloom - 腾讯联合港中文、南大开源的歌曲生成模型

SongBloom - Tencent's open source song generation model with HKCNU and NTU.

SongBloom is an open source song generation model developed by Tencent AI Lab in collaboration with The Chinese University of Hong Kong (Shenzhen) and Nanjing University, which solves the problem of "plasticity" in AI music generation, and realizes high-quality, structurally complete song generation. Simply enter 10 seconds of reference audio and corresponding lyrics, and you can...
3wks ago
012.7K
SAIL-VL2 - 字节跳动开源的多模态视觉语言模型

SAIL-VL2 - ByteHop's open source multimodal visual language model

SAIL-VL2 is an open source multimodal visual language model by the Byte Jump team, focusing on joint modeling of multimodal inputs such as images and text. Using the sparse mixture of experts (MoE) architecture and progressive training strategy, it achieves high performance at parameter scales from 2B to 8B, especially in the areas of graphic comprehension, math...
3wks ago
09.7K
MineContext - 字节开源的主动式上下文感知AI伙伴

MineContext - Bytes Open Source Active Context-Aware AI Partner

MineContext is an active context-aware AI partner open-sourced by the ByteDance Viking team to help users efficiently manage massive amounts of information and improve the efficiency of knowledge work. Over the screenshot and content understanding technology, automatically record the user's daily operations (such as browsing the web, editing documents, etc.), support...
3wks ago
014.6K
吴恩达的《Agentic AI》最新智能体免费课程

Free Course on the Latest Intelligentsia from Agentic AI by Ernest Ng

Agentic AI is the newest course on intelligent bodies launched by Ernest Ng.The course focuses on the design and construction of intelligent bodies, covering the four major design patterns of reflection, tool use, planning, and multi-intelligent body collaboration. Learners will master how to make intelligent bodies check outputs, autonomously adjust through theoretical explanations and code practice...
4wks ago
015.5K
聆音EchoCare - 香港科学院开源的超声基座大模型

EchoCare - Hong Kong Academy of Sciences open source ultrasound base large model

EchoCare is a large model of ultrasound base developed by the Center for Artificial Intelligence and Robotics Innovation (CAIR) at the Hong Kong Institute of Innovation and Research of the Chinese Academy of Sciences (CAS), trained based on the world's largest ultrasound image dataset (more than 4.5 million images), covering multi-center, multi-region, multi-ethnicity, and more than 50 individuals...
4wks ago
011.5K
RoboBrain-X0 - 智源研究院开源的零样本跨本体泛化具身模型

RoboBrain-X0 - Wisdom Source Research Institute open source zero-sample cross ontology generalized embodiment model

RoboBrain-X0 is the world's first open source embodied model that supports zero-sample cross-ontology generalization open-sourced by Wisdom Source Research Institute, which is of great industrial significance. It can drive multiple real robots of different configurations to complete basic operation tasks without fine-tuning, and after a small amount of sample fine-tuning, it demonstrates the ability to replicate ...
1mos ago
013.9K
CWM - Meta FAIR开源的代码世界语言模型

CWM - Meta FAIR open source code world language model

CWM (Code World Model) is a 32-billion-parameter open-source world language model released by the Meta FAIR team, designed for code generation and reasoning. Introducing the concept of "world model", it can simulate the code execution process, predict the variable state changes, and advance...
1mos ago
015.7K
Neovate Code - 蚂蚁开源的智能编程助手

Neovate Code - Ant Open Source's Intelligent Programming Assistant

Neovate Code is an open source intelligent programming assistant from Ant Group's Alipay Experience Technology Department, which improves development efficiency through artificial intelligence technology. With conversational development features, developers can describe the requirements through natural language, Neovate Code can understand and generate the corresponding generation...
1mos ago
018K
Qwen3Guard - 阿里Qwen开源的安全模型

Qwen3Guard - Ali Qwen open source security model

Qwen3Guard is a fine-tuned security protection model based on the Qwen3 base model, designed for security detection. It provides accurate security categorization of prompts and responses, provides risk levels, and supports English, Chinese, and multi-language environments.Qwen3Guard comes with two pro...
1mos ago
019.5K
Qwen3-TTS-Flash - 阿里通义推出的语音合成模型

Qwen3-TTS-Flash - Speech Synthesis Models by Ali Tongyi

Qwen3-TTS-Flash is an advanced speech synthesis model introduced by Ali Tongyi, supporting 17 tones and 10 languages, covering Mandarin, English, dialects, etc. It has excellent stability and high expressiveness of Chinese and English speech, and the model can automatically adjust the tone of voice to make it more vivid.
1mos ago
019.8K
InternVLA-A1 - 上海AI Lab开源一体化操作能力的具身大模型

InternVLA-A1 - Shanghai AI Lab Open Source Integration of Operational Capabilities for Embodied Large Models

InternVLA-A1 is a large model of embodied operation open-sourced by Shanghai Artificial Intelligence Laboratory. It has the ability to understand, imagine, and execute the integration, and can accurately complete the task. The model fuses real and simulated operational data, and automates the construction of massive multimodal through large-scale virtual-real hybrid scene assets...
2mos ago
014.8K
VoxCPM - 面壁智能联合清华开源的端到端TTS模型

VoxCPM - Faceted Intelligence and Tsinghua Open Source End-to-End TTS Model

VoxCPM is a speech generation model jointly open-sourced by Facade Intelligence and Shenzhen International Graduate School of Tsinghua University.VoxCPM adopts an end-to-end diffusion autoregressive architecture to generate continuous speech representations directly from text, breaking through the limitations of traditional discrete disambiguation. Through hierarchical language modeling and finite state quantization...
2mos ago
018.4K
InternVLA·N1 - 上海AI Lab开源的端到端双系统导航大模型

InternVLA-N1 - Shanghai AI Lab Open Source End-to-End Dual System Navigation Large Model

InternVLA-N1 is an open source end-to-end dual-system navigation macromodel from Shanghai Artificial Intelligence Laboratory. Using a dual-system architecture, System 2 is responsible for understanding linguistic commands and planning long-range paths, while System 1 focuses on high-frequency response and agile obstacle avoidance. The model is trained entirely based on synthetic data through large-scale digital ...
2mos ago
014.5K
VLAC - 上海AI Lab开源的具身奖励大模型

VLAC - Shanghai AI Lab's Open Source Large Model of Embodied Reward

VLAC is an open source embodied reward macromodel from Shanghai Artificial Intelligence Laboratory. Based on InternVL multimodal macromodel, it integrates Internet video data and robot operation data to provide process reward and task completion estimation for robot reinforcement learning in the real world.VLAC can effectively ...
2mos ago
013.3K
浙江大学免费PDF资料《大模型基础》 - 附下载链接

Free PDF of Fundamentals of Large Models from Zhejiang University - with download link

Fundamentals of Large Models provides an in-depth analysis of the core technologies and practical paths of Large Language Models (LLMs). Starting from the fundamental theory of language modeling, it systematically explains the principles of model design based on statistics, recurrent neural networks (RNN), and Transformer architecture, focusing on the three major big language model...
2mos ago
015.4K
MobiAgent - 上海交大开源的移动端智能体全栈构建框架

MobiAgent - Shanghai Jiaotong University open source mobile intelligent body full-stack building framework

MobiAgent is an open source mobile intelligent body toolchain from IPADS Lab of Shanghai Jiaotong University, which helps users to build their own mobile intelligent assistants. By recording the user's operation trajectory and generating high-quality data, it trains an intelligent body that can understand natural language commands. Core features include efficient...
2mos ago
014.3K
Youtu-GraphRAG - 腾讯优图实验室开源的图检索增强生成框架

Youtu-GraphRAG - Tencent Youtu Labs Open Source Graph Retrieval Augmentation Generation Framework

Youtu-GraphRAG is an open source graph retrieval augmentation generation framework from Tencent's Youtu Labs to help large language models handle complex Q&A tasks more accurately. By constructing a four-layer knowledge tree, the knowledge is disassembled into four levels of attributes, relationships, keywords and communities to realize the self-directed performance of cross-domain knowledge...
2mos ago
014.1K