Latest AI Resources

Total 2920 articles posts
InternVLA-A1 - 上海AI Lab开源一体化操作能力的具身大模型

InternVLA-A1 - Shanghai AI Lab Open Source Integration of Operational Capabilities for Embodied Large Models

InternVLA-A1 is a large model of embodied operation open-sourced by Shanghai Artificial Intelligence Laboratory. It has the ability to understand, imagine, and execute the integration, and can accurately complete the task. The model fuses real and simulated operational data, and automates the construction of massive multimodal through large-scale virtual-real hybrid scene assets...
4mos ago
027.5K
VoxCPM - 面壁智能联合清华开源的端到端TTS模型

VoxCPM - Faceted Intelligence and Tsinghua Open Source End-to-End TTS Model

VoxCPM is a speech generation model jointly open-sourced by Facade Intelligence and Shenzhen International Graduate School of Tsinghua University.VoxCPM adopts an end-to-end diffusion autoregressive architecture to generate continuous speech representations directly from text, breaking through the limitations of traditional discrete disambiguation. Through hierarchical language modeling and finite state quantization...
4mos ago
031.6K
InternVLA·N1 - 上海AI Lab开源的端到端双系统导航大模型

InternVLA-N1 - Shanghai AI Lab Open Source End-to-End Dual System Navigation Large Model

InternVLA-N1 is an open source end-to-end dual-system navigation macromodel from Shanghai Artificial Intelligence Laboratory. Using a dual-system architecture, System 2 is responsible for understanding linguistic commands and planning long-range paths, while System 1 focuses on high-frequency response and agile obstacle avoidance. The model is trained entirely based on synthetic data through large-scale digital ...
4mos ago
025.7K
VLAC - 上海AI Lab开源的具身奖励大模型

VLAC - Shanghai AI Lab's Open Source Large Model of Embodied Reward

VLAC is an open source embodied reward macromodel from Shanghai Artificial Intelligence Laboratory. Based on InternVL multimodal macromodel, it integrates Internet video data and robot operation data to provide process reward and task completion estimation for robot reinforcement learning in the real world.VLAC can effectively ...
4mos ago
021.8K
浙江大学免费PDF资料《大模型基础》 - 附下载链接

Free PDF of Fundamentals of Large Models from Zhejiang University - with download link

Fundamentals of Large Models provides an in-depth analysis of the core technologies and practical paths of Large Language Models (LLMs). Starting from the fundamental theory of language modeling, it systematically explains the principles of model design based on statistics, recurrent neural networks (RNN), and Transformer architecture, focusing on the three major big language model...
4mos ago
029.8K
MobiAgent - 上海交大开源的移动端智能体全栈构建框架

MobiAgent - Shanghai Jiaotong University open source mobile intelligent body full-stack building framework

MobiAgent is an open source mobile intelligent body toolchain from IPADS Lab of Shanghai Jiaotong University, which helps users to build their own mobile intelligent assistants. By recording the user's operation trajectory and generating high-quality data, it trains an intelligent body that can understand natural language commands. Core features include efficient...
4mos ago
026.7K
Youtu-GraphRAG - 腾讯优图实验室开源的图检索增强生成框架

Youtu-GraphRAG - Tencent Youtu Labs Open Source Graph Retrieval Augmentation Generation Framework

Youtu-GraphRAG is an open source graph retrieval augmentation generation framework from Tencent's Youtu Labs to help large language models handle complex Q&A tasks more accurately. By constructing a four-layer knowledge tree, the knowledge is disassembled into four levels of attributes, relationships, keywords and communities to realize the self-directed performance of cross-domain knowledge...
4mos ago
027.1K
MiniMax Music 1.5 - MiniMax最新推出的AI音乐生成模型

MiniMax Music 1.5 - MiniMax's latest AI music generation model

MiniMax Music 1.5 is an advanced AI music generation tool that supports generating up to 4 minutes of music based on users' natural language descriptions. The model supports a variety of music styles and mood customization, generating a natural and full vocal color, smooth transitions, richly layered arrangements...
4mos ago
027.4K
文心大模型X1.1 - 百度推出的深度思考模型,理解能力更强

Wenshin Big Model X1.1 - Baidu's Deep Thinking Model for Better Understanding

Wenxin Big Model X1.1 is a deep thinking model launched by Baidu, based on a hybrid reinforcement learning framework that focuses on improving language understanding and generation. The model excels in handling complex questions, following instructions and simulating the behavior of intelligences, and can accurately provide knowledgeable answers and high-quality text content.
4mos ago
027.6K
WeKnora - 腾讯微信开源的文档理解与语义检索框架

WeKnora - Tencent WeChat Open Source Document Understanding and Semantic Retrieval Framework

WeKnora is Tencent WeChat team open source based on the Large Language Model (LLM) document understanding and semantic retrieval framework , designed for the structure of complex, heterogeneous document content scenarios and designed to use a modularized architecture , integration of multimodal preprocessing , semantic vector indexing , intelligent recall and large model generative reasoning ...
4mos ago
057.4K
XTuner V1 - 上海AI Lab开源的大模型训练引擎

XTuner V1 - Shanghai AI Lab open source large model training engine

XTuner V1 is a new generation of large model training engine open-sourced by Shanghai Artificial Intelligence Laboratory (SAL), designed for ultra-large scale sparse Mixed Expert (MoE) model training. Developed based on PyTorch FSDP, it achieves high performance through multi-dimensional optimization of memory, communication and load ...
4mos ago
024.8K
OneCAT - 美团联合上海交大开源的多模态模型

OneCAT - Open source multimodal modeling by Meituan and Shanghai Jiaotong University

OneCAT is a new unified multimodal model launched by Meituan in conjunction with Shanghai Jiaotong University, which adopts a pure decoder architecture and can seamlessly integrate multimodal comprehension, text-to-image generation and image editing functions. The model abandons the design of traditional multimodal models that rely on external visual coders and disambiguators through modality-specific...
4mos ago
026.4K
Step-Audio 2 mini - 阶跃星辰开源的语音大模型

Step-Audio 2 mini - Step-Star Open Source Speech Megamodels

Step-Audio 2 mini is an open source end-to-end speech grand model of Step-Audio. It breaks through the traditional speech model structure and adopts the true end-to-end multimodal architecture, which directly transforms the original audio input into speech response output with lower latency, and understands paralinguistic information and non-vocal signals.
4mos ago
035.8K
InternVL3.5 - 上海AI实验室开源的多模态大模型

InternVL3.5 - Shanghai AI Lab Open Source Multimodal Large Models

InternVL3.5 (Shusheng-Wanxiang 3.5) is an open source multimodal large model of the Shanghai Artificial Intelligence Laboratory, the model is fully upgraded in terms of general ability, reasoning ability and deployment efficiency, providing nine sizes of versions from 1 billion to 241 billion parameters, covering different resource demand scenarios, including thick...
4mos ago
036.3K
FastVLM - 苹果公司推出的视觉语言模型

FastVLM - Visual Language Model from Apple

FastVLM (Fast Vision Language Model) is an efficient visual language model introduced by Apple Inc. With FastViTHD hybrid visual coder as the core, it incorporates convolutional and Transformer architectures to significantly reduce visual...
4mos ago
032.2K
Meeseeks - 美团开源的评估模型指令遵循能力的评测集

Meeseeks - Meeseeks open-source assessment set for evaluating the ability to follow model instructions

Meeseeks is an open source large model evaluation set used by the Meituan M17 team to evaluate the model's ability to follow instructions.Meeseeks uses a three-tiered evaluation framework to comprehensively measure whether the model is able to generate answers in strict accordance with the user's instructions from the macro to the micro level, without evaluating the knowledge of the content of the answers positively ...
5mos ago
030K
gpt-realtime - OpenAI最新推出的AI语音模型

gpt-realtime - OpenAI's newest AI speech model

gpt-realtime is an advanced speech model from OpenAI that supports direct audio processing to generate natural and smooth speech. The model supports multiple languages and styles, understands non-verbal cues such as laughter, and can switch between languages.
5mos ago
033.4K
HunyuanVideo-Foley - 腾讯推出的开源视频音效生成模型

HunyuanVideo-Foley - Tencent's Open Source Video Sound Generation Model

HunyuanVideo-Foley is an open source video sound generation model by the Tencent Mixed Yuan team that supports adding accurately matched sound effects to silent videos. The model is based on a large-scale dataset training , with a multimodal diffusion transformer architecture , combined with the characterization of the alignment loss function and audio VAE optimization techniques ...
5mos ago
040.6K
问小白5 - 问小白推出的全能AI模型

Ask White 5 - All-in-One AI Model from Ask White

Ask White 5 is the flagship "All in One" model with a very high level of intelligence. The model has excellent performance in many assessments, such as the AA-Index composite assessment score of 64.7 and the STEM ability assessment score of 86, which is close to the world's leading GPT-5.
5mos ago
032.6K
问小白o4 - 问小白推出的并行思考模型,同时开启8条思考路径

Ask Whitey o4 - A parallel thinking model introduced by Ask Whitey that opens 8 thinking paths at the same time

Ask White o4 is an innovative parallel thinking model that opens 8 thinking paths at the same time, analyzes the problem from multiple perspectives and automatically filters out the optimal solution. The model incorporates advanced Long-CoT reinforcement learning and process reward learning techniques, has powerful deep reasoning capabilities, and performs well in complex tasks.
5mos ago
029.8K
VibeVoice - 微软推出的文本到语音模型

VibeVoice - Text-to-Speech Model from Microsoft

VibeVoice is a new text-to-speech (TTS) model from Microsoft. The model generates conversational audio from up to four different speakers and supports up to 90 minutes of continuous voice output, breaking the length limitations of traditional TTS systems.
5mos ago
051.9K
Fun-ASR - 钉钉、通义联合推出的新一代语音识别模型

Fun-ASR - A New Generation of Speech Recognition Models Jointly Launched by Nail and Tongyi

Fun-ASR is a big model of speech recognition jointly launched by Nail and Tongyi Labs. The model has been trained with massive audio data and can accurately recognize multi-industry terminology, such as Internet, technology, home decoration, etc., significantly improving the recognition accuracy. The model combines with Nail enterprise information for inference optimization to reduce the illusion problem...
5mos ago
055.6K
Grok 2.5 - 马斯克旗下xAI开源的人工智能模型

Grok 2.5 - Musk's xAI open source AI model

Grok 2.5 is an open source AI model from Elon Musk's xAI. With 269 billion parameters, it is based on the Mixed Expert (MoE) architecture for powerful performance and inference. The model has been tested at graduate level scientific knowledge (GPQA), generalized knowledge (MMLU, MM...
5mos ago
036.3K
ToonComposer - 腾讯开源的生成式AI动画制作工具

ToonComposer - Tencent open source generative AI animation tool

ToonComposer is a generative AI animation tool jointly launched by The Chinese University of Hong Kong, Tencent PCG ARC Lab and Peking University. Through generative post keyframe technology, the intermediate frame generation and coloring process is integrated into an automated process, requiring only a sketch and a...
5mos ago
039.7K
Seed-OSS - 字节跳动团队开源的全新AI模型

Seed-OSS - A new AI model open-sourced by the Wordpress team

Seed-OSS is a large family of language models open-sourced by the Byte Jump Seed team, focusing on long text and reasoning tasks. The model performs well in complex logical reasoning and multi-step reasoning, with high accuracy and efficient problem solving.Seed-OSS supports long text contexts up to 512K...
5mos ago
040.3K
CombatVLA - 淘天集团推出的高效VLA模型

CombatVLA - Efficient VLA Model by Amoy Group

CombatVLA is an innovative 3D action role-playing game (ARPG)-specific model from the Future Life Lab team of the Amoy Sky Group.CombatVLA is a visual-linguistic-action (VLA) model, built on a 3B parametric scale, that collects human player's through a motion tracker...
5mos ago
034.2K
DeepSeek V3.1 - DeepSeek推出的最新开源AI模型

DeepSeek V3.1 - Latest Open Source AI Models from DeepSeek

DeepSeek V3.1 is a new generation of AI models introduced by DeepSeek, with important upgrades based on its predecessor, V3. DeepSeek V3.1 introduces a hybrid reasoning architecture that allows the model to flexibly switch between thinking and non-thinking modes, significantly improving the thinking...
5mos ago
037.6K
Qwen-Image-Edit - 阿里通义开源的图像编辑模型

Qwen-Image-Edit - Ali Tongyi open source image editing model

Qwen-Image-Edit is an all-purpose image editing model introduced by Ali Tongyi, built on the Qwen-Image architecture with 20 billion parameters. The model combines both semantic and appearance editing capabilities, and can perform low-level visual appearance editing on images (e.g., adding, deleting...
5mos ago
035.1K
MoE-TTS - 昆仑万维推出的最新语音生成框架

MoE-TTS - The Latest Speech Generation Framework from KunlunWei

MoE-TTS is a speech synthesis framework introduced by KunlunWanwei, based on the Mixed Expert (MoE) architecture, which combines pre-trained Large Language Models (LLMs) with speech expert modules.MoE-TTS retains the powerful textual reasoning by freezing the textual module parameters and updating only the speech module parameters...
5mos ago
034.9K