Latest AI Resources

Total 3143 articles posts

Course materials Latest AI Resources AI Knowledge Base AI News

Sorting

nanochat - Karpathy免费开源的低成本模型训练项目

nanochat - Karpathy's free and open source low-cost model training program

nanochat is an open source project released by AI legend and former Tesla AI Director Andrej Karpathy that allows individuals to quickly train a small ChatGPT-like language model at a very low cost and simplicity. The entire project uses only about 800...

Latest AI Resources

10mos ago

049K

LLaVA-OneVision-1.5 - 免费开源的多模态模型，高性能多模态理解

LLaVA-OneVision-1.5 - Free and open source multimodal modeling, high performance multimodal understanding

LLaVA-OneVision-1.5 is an open-source multimodal model by the EvolvingLMMS-Lab team, using 8B parameter scale, through a compact three-phase training process (language-image alignment, conceptual equalization and knowledge injection, and instruction fine-tuning) on 128 A800...

Latest AI Resources

10mos ago

045.3K

Paper2Video - 新加坡国立开源的学术论文自动生成演示视频项目

Paper2Video - NUS open source project to automatically generate demo videos for academic papers

Paper2Video is an open-source presentation video project for automatic generation of academic papers by Show Lab at National University of Singapore. Using the PaperTalker multi-intelligence framework, papers are transformed into full presentation videos containing slides, subtitles, voiceover and speaker avatar...

Latest AI Resources

10mos ago

048.5K

NeuTTS Air - 支持离线CPU运行的免费轻量级语音合成模型

NeuTTS Air - Free and Lightweight Speech Synthesis Model with Offline CPU Running Support

NeuTTS Air is open source lightweight speech synthesis model, developed by Neuphonic team, which can run in real time on local devices (e.g. cell phones, laptops, Raspberry Pi) without relying on the cloud. Using 0.5B parameter Qwen architecture and self-developed NeuCodec codec...

Latest AI Resources

10mos ago

054.9K

KAT-Dev-72B-Exp - 快手开源的免费编程专用模型

KAT-Dev-72B-Exp - Racer open source free programming-specific models

KAT-Dev-72B-Exp is an open-source programming-specific large language model launched by the Racer team, optimized based on reinforcement learning technology, which achieved an accuracy rate of 74.6% in the SWE-Bench Verified benchmark test, the best performance of any open-source model at present. The model uses innovative...

Latest AI Resources

10mos ago

044.5K

Jamba Reasoning 3B - 以色列AI21 Labs开源的轻量级推理模型

Jamba Reasoning 3B - Israel AI21 Labs open source lightweight reasoning model

Jamba Reasoning 3B is a lightweight inference model open-sourced by Israeli AI startup AI21 Labs with strong performance and potential for a wide range of applications. It utilizes a hybrid SSM-Transformer architecture that combines Trans...

Latest AI Resources

10mos ago

041.7K

吴恩达的《Agentic AI》最新智能体免费课程

Free Course on the Latest Intelligentsia from Agentic AI by Ernest Ng

Agentic AI is the newest course on intelligent bodies launched by Ernest Ng.The course focuses on the design and construction of intelligent bodies, covering the four major design patterns of reflection, tool use, planning, and multi-intelligent body collaboration. Learners will master how to make intelligent bodies check outputs, autonomously adjust through theoretical explanations and code practice...

Latest AI Resources Course materials

10mos ago

076K

OpenAgents - 开源免费的构建AI Agent网络开放协作项目

OpenAgents - Open Source Free Open Collaboration Project for Building AI Agent Networks

OpenAgents is the open source project that creates a network of AI agents and facilitates open collaboration between agents. A basic network infrastructure is provided to enable AI agents to seamlessly connect and collaborate. Users can quickly start their own agent network, extend functionality through a modular architecture, support...

Latest AI Resources

10mos ago

045.8K

Androidify - 谷歌开源如何在Android上构建AI应用的免费资源

Androidify - Google open sources free resources on how to build AI apps on Android

Androidify is Google's open source project to help developers learn how to build AI-driven apps on Android. The project uses Google's latest technologies such as Jetpack Compose, Gemini API (via Fire...

Latest AI Resources

10mos ago

045.1K

Ling-1T - 蚂蚁集团开源的万亿参数通用语言模型

Ling-1T - Ant Group's open source universal language model for trillions of parameters

Ling-1T is a trillion-parameter general-purpose language model open-sourced by Ant Group, which belongs to the flagship product of the Ling 2.0 series of Bering's large models. The model adopts a highly efficient MoE architecture, supports 128K context windows, and surpasses GPT in 7 benchmarks including code generation, mathematical reasoning, and logic test...

Latest AI Resources

10mos ago

074.8K

聆音EchoCare - 香港科学院开源的超声基座大模型

EchoCare - Hong Kong Academy of Sciences open source ultrasound base large model

EchoCare is a large model of ultrasound base developed by the Center for Artificial Intelligence and Robotics Innovation (CAIR) at the Hong Kong Institute of Innovation and Research of the Chinese Academy of Sciences (CAS), trained based on the world's largest ultrasound image dataset (more than 4.5 million images), covering multi-center, multi-region, multi-ethnicity, and more than 50 individuals...

Latest AI Resources

10mos ago

050.4K

Code2Video - Show Lab开源的AI教学视频生成框架

Code2Video - Show Lab open source AI teaching video generation framework

Code2Video is innovative open source project that automatically converts code snippets into high quality video content (mp4 format). The project through a unique code-centric paradigm , the use of carbon-now-cli tools to generate code into beautiful images , the use of ffmpeg will be these ...

Latest AI Resources

10mos ago

053.9K

SceneGen - 上海交大开源的单图像生成3D场景框架

SceneGen - Shanghai Jiaotong University open source single image to generate 3D scene framework

SceneGen is an open source method for generating 3D scenes from a single image at Shanghai Jiao Tong University. From a single scene image and a target resource mask, a complete scene containing multiple 3D resources is efficiently generated, including the geometric structure of the resources, texture and relative spatial location.

Latest AI Resources

10mos ago

045.1K

Ming-UniAudio - 蚂蚁开源的统一音频多模态生成模型

Ming-UniAudio - Ant open source unified audio multimodal generation model

Ming-UniAudio is Ant Group's open source unified audio multimodal generation model that supports mixed input and output of text, audio, image and video. Using multi-scale Transformer and hybrid expert (MoE) architecture , through modality-aware routing mechanism to efficiently handle cross-modal ...

Latest AI Resources

10mos ago

051K

AIMangaStudio - 免费的AI漫画创作工具，提供完整创作流程

AIMangaStudio - Free AI manga authoring tool with complete authoring flow

AIMangaStudio is a free AI manga creation tool that provides creators with a complete manga creation pipeline, including plot generation, sub-scene design, character setting and other functions, which can simplify the production process from script to manga page. It supports natural language generation of comic scripts, including plot, dialog...

Latest AI Resources

10mos ago

064.7K

FireRedChat - 小红书开源的全双工语音交互系统

FireRedChat - Little Red Book's open source full-duplex voice interaction system

FireRedChat is an open source full-duplex voice interaction system for Xiaohongshu with real-time bidirectional dialog capabilities and support for controlled interruptions. Adopts a modular design , including transcription control module , interaction module and dialogue manager , etc., supports cascade and semi-cascade architecture , can be flexibly deployed .

Latest AI Resources

10mos ago

059.2K

Logics-Parsing - 阿里开源的文档解析模型

Logics-Parsing - Ali open source document parsing model

Logics-Parsing is an open source Ali end-to-end document parsing model , based on Qwen2.5-VL-7B. Optimize document layout analysis and reading order inference through reinforcement learning , PDF images can be converted to structured HTML output to support a variety of content ...

Latest AI Resources

10mos ago

058K

Ring-1T-preview - 蚂蚁集团开源的万亿参数大模型

Ring-1T-preview - Ant Group's open-source trillion-parameter macromodel

Ring-1T-preview is an open source trillion-parameter big model of Ant Group, based on Ling 2.0 MoE architecture, pre-trained on 20T corpus, and trained in reasoning ability by self-developed reinforcement learning system ASystem. In natural language reasoning ...

Latest AI Resources

10mos ago

065K

RoboBrain-X0 - 智源研究院开源的零样本跨本体泛化具身模型

RoboBrain-X0 - Wisdom Source Research Institute open source zero-sample cross ontology generalized embodiment model

RoboBrain-X0 is the world's first open source embodied model that supports zero-sample cross-ontology generalization open-sourced by Wisdom Source Research Institute, which is of great industrial significance. It can drive multiple real robots of different configurations to complete basic operation tasks without fine-tuning, and after a small amount of sample fine-tuning, it demonstrates the ability to replicate ...

Latest AI Resources

10mos ago

049.7K

Lynx - 字节跳动开源的高保真视频生成模型

Lynx - ByteHop's open source high-fidelity video generation model

Lynx is a high-fidelity personalized video generation model open-sourced by ByteDance that can generate identity-consistent videos with only a single portrait photo. Built on the diffusion Transformer (DiT) base model , the introduction of ID-adapter and Ref-adapte...

Latest AI Resources

10mos ago

052.2K

Claude Sonnet 4.5 - Anthropic推出的最强AI编程模型

Claude Sonnet 4.5 - The Most Powerful AI Programming Model from Anthropic

Claude Sonnet 4.5 is an artificial intelligence model from Anthropic designed for programming, computer operations, and complex task automation. The model excels in code generation, long-duration task processing, reasoning, and mathematical computation, supporting everything from initial planning...

Latest AI Resources

10mos ago

055.3K

DeepSeek-V3.2-Exp - DeepSeek最新开源的实验性AI模型

DeepSeek-V3.2-Exp - DeepSeek's latest open source experimental AI model

DeepSeek-V3.2-Exp is a DeepSeek open source experimental AI model that significantly improves the efficiency of long text processing by introducing the DeepSeek Sparse Attention (DSA) mechanism. The model is based on DeepSeek...

Latest AI Resources

10mos ago

051.6K

HunyuanImage 3.0 - 腾讯开源的免费多模态图像生成模型

HunyuanImage 3.0 - Tencent open source free multimodal image generation model

HunyuanImage 3.0 (HunyuanImage 3.0) is a native multimodal image generation model released and open-sourced by Tencent. The model parameter size of 80B, is currently the best evaluation results, the largest number of parameters of the open source image generation model. Hybrid Image 3.0 supports real-time image generation, users can side...

Latest AI Resources

10mos ago

062.9K

Hunyuan3D-Part - 腾讯开源的免费3D组件生成模型

Hunyuan3D-Part - Tencent open source free 3D components to generate models

Hunyuan3D-Part (Hybrid 3D-Part) is a 3D generation model released and open-sourced by Tencent. Composed of P3 - SAM and X - Part, it realizes high-precision and controllable component-based 3D generation for the first time, and supports 50 + components to be generated automatically. Users can first use...

Latest AI Resources

10mos ago

069.1K

AudioFly - 科大讯飞开源的文本生成音效AI模型

AudioFly - KU Xunfei open source text generation sound AI models

AudioFly is KDDI open source AI model for text to generate sound effects. Based on the potential diffusion model architecture, with 1 billion parameters, trained on large-scale, diverse audio text datasets, covering AudioSet, AudioCaps, TUT and other public datasets and internal...

Latest AI Resources

10mos ago

061.2K

Hunyuan3D-Omni - 腾讯混元开源的3D模型生成框架

Hunyuan3D-Omni - Tencent Mixed-Year Open Source 3D Model Generation Framework

Hunyuan3D-Omni (Hybrid 3D-Omni) is an open source 3D asset generation framework by Tencent's Hybrid 3D team, which realizes accurate 3D model generation through multiple control signals. Based on Hunyuan3D 2.1 architecture, it introduces a unified control encoder that can handle point...

Latest AI Resources

10mos ago

062.3K

FLM-Audio - 智源联合南洋理工开源的全双工音频对话模型

FLM-Audio - Wisdom Source and Nanyang Polytechnic Open Source Full-Duplex Audio Dialog Modeling

FLM-Audio is a native full-duplex audio dialog grand model released by Beijing Zhiyuan Artificial Intelligence Research Institute in conjunction with Spin Matrix and Nanyang Technological University of Singapore, supporting both Chinese and English. Adopting native full-duplex architecture, it can merge listening, speaking and monologue at each time step...

Latest AI Resources

10mos ago

055.2K

CWM - Meta FAIR开源的代码世界语言模型

CWM - Meta FAIR open source code world language model

CWM (Code World Model) is a 32-billion-parameter open-source world language model released by the Meta FAIR team, designed for code generation and reasoning. Introducing the concept of "world model", it can simulate the code execution process, predict the variable state changes, and advance...

Latest AI Resources

10mos ago

054.2K

Neovate Code - 蚂蚁开源的智能编程助手

Neovate Code - Ant Open Source's Intelligent Programming Assistant

Neovate Code is an open source intelligent programming assistant from Ant Group's Alipay Experience Technology Department, which improves development efficiency through artificial intelligence technology. With conversational development features, developers can describe the requirements through natural language, Neovate Code can understand and generate the corresponding generation...

Latest AI Resources

10mos ago

054K

Audio2Face - NVIDIA开源的AI 3D面部动画生成模型

Audio2Face - NVIDIA open source AI 3D facial animation generation model

Audio2Face is NVIDIA's open source AI tool capable of transforming audio input into realistic 3D facial animation. By analyzing speech features in the audio, such as phonemes and intonation, it generates precise lip synchronization and subtle emotional expressions to give vivid human expressions to virtual characters.

Latest AI Resources

10mos ago

056.2K

Qwen3-VL - 阿里云通义千问开源的多模态视觉语言大模型

Qwen3-VL - AliCloud Tongyi Qianqian open source multimodal visual language big model

Qwen3-VL is an open source multimodal visual language large model by AliCloud Tongyi Qianqian team, the number of references reaches 235 billion, and the model file is about 471GB.Containing instruction version and thinking version, it adopts enhanced MRope interleaved layout, DeepStack and other technologies, which can effectively utilize the visual transform...

Latest AI Resources

10mos ago

071.4K

Qwen3Guard - 阿里Qwen开源的安全模型

Qwen3Guard - Ali Qwen open source security model

Qwen3Guard is a fine-tuned security protection model based on the Qwen3 base model, designed for security detection. It provides accurate security categorization of prompts and responses, provides risk levels, and supports English, Chinese, and multi-language environments.Qwen3Guard comes with two pro...

Latest AI Resources

10mos ago

059.1K

Qwen3-TTS-Flash - 阿里通义推出的语音合成模型

Qwen3-TTS-Flash - Speech Synthesis Models by Ali Tongyi

Qwen3-TTS-Flash is an advanced speech synthesis model introduced by Ali Tongyi, supporting 17 tones and 10 languages, covering Mandarin, English, dialects, etc. It has excellent stability and high expressiveness of Chinese and English speech, and the model can automatically adjust the tone of voice to make it more vivid.

Latest AI Resources

10mos ago

067.1K

Qwen3-Omni - 阿里通义推出的全模态AI模型

Qwen3-Omni - Omnimodal AI model launched by Ali Tongyi

Qwen3-Omni is a fully modal AI model introduced by the Ali Tongyi team that can handle multiple data types such as text, images, audio and video, and supports text interaction in 119 languages with low latency and high controllability.

Latest AI Resources

10mos ago

056K

DeepSeek-V3.1-Terminus - DeepSeek推出的最新版AI模型

DeepSeek-V3.1-Terminus - The latest version of the AI model introduced by DeepSeek

DeepSeek-V3.1-Terminus is an upgraded version of DeepSeek-V3.1, an artificial intelligence language model from the DeepSeek team. The model is optimized in terms of language consistency, code generation, and search capabilities to more accurately...

Latest AI Resources

10mos ago

054.3K

Granite-Docling-258M - IBM开源的视觉语言模型

Granite-Docling-258M - IBM Open Source Visual Language Modeling

Granite-Docling-258M is an ultra-compact open source visual language model from IBM designed for efficient document conversion. The model converts documents into machine-readable formats while leaving layout, tables, formulas, and other elements intact.

Latest AI Resources

10mos ago

050.3K

Lucy Edit - 开源的AI视频编辑工具，自然语言描述编辑

Lucy Edit - open source AI video editing tool, natural language description editing

Lucy Edit is an open source AI video editing tool developed by Decart AI. Allows users to edit video through simple natural language descriptions, such as "change the character into a polar bear" or "turn the scene into a 2D cartoon style", without the need for complex fine-tuning or the use of masks ...

Latest AI Resources

10mos ago

061.9K

LongCat-Flash-Thinking - 美团开源的高效推理模型

LongCat-Flash-Thinking - An Efficient Reasoning Model for Meituan Open Source

LongCat-Flash-Thinking is a highly efficient reasoning model released by the LongCat team at Mission LongCat that has become more powerful and specialized while maintaining the extreme speed of LongCat-Flash-Chat. The model is based on logic, math, code, intelligence...

Latest AI Resources

10mos ago

047.6K

Ling-V2 - 蚂蚁百灵开源的MoE架构语言模型系列

Ling-V2 - The MoE Architecture Language Model Series of Ant Centurion Open Source

Ling-V2 is a family of large-scale language models based on the MoE architecture introduced by the Ant-Belling team. The first version, Ling-mini-2.0, has 16 billion total parameters, with only 1.4 billion parameters activated per input token.

Latest AI Resources

10mos ago

051.2K

Kronos - 清华和微软联合开源的金融K线图基础模型

Kronos - Tsinghua and Microsoft joint open source financial K chart base model

Kronos is the first K-line chart base model for financial markets jointly open-sourced by Tsinghua University and Microsoft Research Asia. It analyzes K-line data of stocks, cryptocurrencies and other assets, including opening, high, low, closing and volume, to predict future price movements.

Latest AI Resources

10mos ago

080.6K

Wan2.2-Animate - 通义万相开源的动作生成模型

Wan2.2-Animate - A Generative Model for Action Generation of the Tongyi Wanphase Open Source

Wan2.2-Animate is an open source action generation model , support for action imitation and role-playing mode . Users only need to input a character picture and a reference video , the model can migrate the video character's movements and expressions to the picture character , giving the picture character dynamic expression ...

Latest AI Resources

10mos ago

053.5K

Xiaomi-MiMo-Audio - 小米开源的首个原生端到端语音大模型

Xiaomi-MiMo-Audio - Xiaomi Open Source's First Native End-to-End Speech Big Model

Xiaomi-MiMo-Audio is Xiaomi's open source 7-billion-parameter end-to-end speech macromodel with powerful features such as multi-language dialog, speech continuation, less-sample generalization, and audio understanding, which is able to reach the SOTA level in speech intelligence and audio understanding benchmarks, surpassing Google Gemi...

Latest AI Resources

10mos ago

058.8K

InternVLA-A1 - 上海AI Lab开源一体化操作能力的具身大模型

InternVLA-A1 - Shanghai AI Lab Open Source Integration of Operational Capabilities for Embodied Large Models

InternVLA-A1 is a large model of embodied operation open-sourced by Shanghai Artificial Intelligence Laboratory. It has the ability to understand, imagine, and execute the integration, and can accurately complete the task. The model fuses real and simulated operational data, and automates the construction of massive multimodal through large-scale virtual-real hybrid scene assets...

Latest AI Resources

10mos ago

062.9K

VoxCPM - 面壁智能联合清华开源的端到端TTS模型

VoxCPM - Faceted Intelligence and Tsinghua Open Source End-to-End TTS Model

VoxCPM is a speech generation model jointly open-sourced by Facade Intelligence and Shenzhen International Graduate School of Tsinghua University.VoxCPM adopts an end-to-end diffusion autoregressive architecture to generate continuous speech representations directly from text, breaking through the limitations of traditional discrete disambiguation. Through hierarchical language modeling and finite state quantization...

Latest AI Resources

10mos ago

063.3K

InternVLA·N1 - 上海AI Lab开源的端到端双系统导航大模型

InternVLA-N1 - Shanghai AI Lab Open Source End-to-End Dual System Navigation Large Model

InternVLA-N1 is an open source end-to-end dual-system navigation macromodel from Shanghai Artificial Intelligence Laboratory. Using a dual-system architecture, System 2 is responsible for understanding linguistic commands and planning long-range paths, while System 1 focuses on high-frequency response and agile obstacle avoidance. The model is trained entirely based on synthetic data through large-scale digital ...

Latest AI Resources

10mos ago

063.1K

WebWeaver - 阿里通义开源的新型双智能体框架

WebWeaver - Ali Tongyi open source new dual-intelligence body framework

WebWeaver is a new dual-intelligence body framework introduced by Alibaba Tongyi team, which is mainly used in open deep research, and can simulate the human research process, which is divided into two intelligences: planning and writing.

Latest AI Resources

10mos ago

057.2K

MCP Registry - GitHub推出的官方MCP服务器管理平台

MCP Registry - The official MCP server management platform from GitHub.

The MCP Registry is a centralized platform from GitHub that helps developers discover and install MCP servers more easily.The MCP Registry is here to help developers quickly find the AI tools they need in one place, greatly simplifying...

Latest AI Resources

10mos ago

054.6K

VLAC - 上海AI Lab开源的具身奖励大模型

VLAC - Shanghai AI Lab's Open Source Large Model of Embodied Reward

VLAC is an open source embodied reward macromodel from Shanghai Artificial Intelligence Laboratory. Based on InternVL multimodal macromodel, it integrates Internet video data and robot operation data to provide process reward and task completion estimation for robot reinforcement learning in the real world.VLAC can effectively ...

Latest AI Resources

10mos ago

054.7K

通义DeepResearch - 阿里通义开源的深度研究智能体

Tongyi DeepResearch - Ali Tongyi Open Source Deep Research Intelligence Body

Tongyi DeepResearch (Tongyi DeepResearch) is an open source intelligent body launched by Alibaba, designed for deep information retrieval and complex task reasoning, with 30 billion parameters, supporting multiple reasoning modes, including ReAct mode and deep mode...

Latest AI Resources

10mos ago

060.7K

InternVLA·M1 - 上海AI Lab开源的具身双系统操作“大脑”

InternVLA-M1 - Shanghai AI Lab's Open Source Embodied Dual System Operation "Brain"

InternVLA-M1 is an open-source embodied operating "brain" of Shanghai Artificial Intelligence Laboratory, which is a large model of two-system operation oriented to instruction following. It builds a complete closed loop covering "think-act-learn" and is responsible for high-level spatial reasoning and task planning. The model adopts a two-phase training cur...

Latest AI Resources

10mos ago

048.1K

OpenAI《在AI时代保持领先》PDF指南 - 附下载链接

OpenAI's PDF Guide to Staying Ahead in the Age of AI - with Download Links

Staying ahead in the age of AI is an AI leadership guide from OpenAI that helps business leaders maintain a competitive edge in the age of AI. The guide points to the rapid growth of AI, with faster model releases, lower costs, and faster enterprise adoption...

Latest AI Resources Course materials

11mos ago

061.5K

浙江大学免费PDF资料《大模型基础》 - 附下载链接

Free PDF of Fundamentals of Large Models from Zhejiang University - with download link

Fundamentals of Large Models provides an in-depth analysis of the core technologies and practical paths of Large Language Models (LLMs). Starting from the fundamental theory of language modeling, it systematically explains the principles of model design based on statistics, recurrent neural networks (RNN), and Transformer architecture, focusing on the three major big language model...

Latest AI Resources Course materials

11mos ago

066.4K

PromptEnhancer - 腾讯混元开源的AI提示词增强工具

PromptEnhancer - Tencent Mixed Meta Open Source AI Prompt Word Enhancement Tool

PromptEnhancer is an open source prompt word enhancement tool from Tencent's Mixed Meta team to improve the generation of text-to-image (Text-to-Image, T2I) models. Through the chain of reasoning (Chain-of-Thought, CoT) approach to the use of ...

Latest AI Resources

11mos ago

055.5K

LLaSO - 逻辑智能推出的业界首个全面开源的语音模型

LLaSO - The Industry's First Fully Open Source Speech Model from Logic Intelligence

LLaSO is an open source speech model launched by Beijing Depth Logic Intelligence Technology Co. Ltd, which solves the problems of data dispersion and insufficient task coverage in the field of large-scale speech language modeling by integrating speech and text data and providing alignment datasets, command fine-tuning datasets and evaluation benchmarks.

Latest AI Resources

11mos ago

048.8K

混元3D 3.0 - 腾讯推出的3D生成模型，支持超高清建模

Hybrid 3D 3.0 - Tencent's 3D generated models with UHD modeling support

Hybrid 3D 3.0 is an advanced 3D generation model launched by Tencent, based on 3D-DiT hierarchical sculpting technology, with a geometric resolution of up to 1536³, capable of generating ultra-high-definition, detail-rich 3D models, and excelling in character modeling, with the ability to accurately shape the five senses and body shape.

Latest AI Resources

11mos ago

067.2K

UnifoLM-WMA-0 - 宇树科技开源的世界模型动作架构

UnifoLM-WMA-0 - Yu Shu Technology open source world model action architecture

UnifoLM-WMA-0 is an open source world model-action architecture across multiple classes of robot ontologies by Yu Shu Technology, designed for general robot learning. Composed of a world model and an action architecture, the world model understands the physical laws of robot-environment interaction, and the action architecture is responsible for specific...

Latest AI Resources

11mos ago

070.2K

InfiniteTalk - 美团视觉AI开源的音频驱动视频生成工具

InfiniteTalk - Open Source Audio-Driven Video Generation Tool for Mission Vision AI

InfiniteTalk is an audio-driven video generation tool developed by the MeiGen-AI team that generates talking videos of unlimited length based on the input audio. The core advantage lies in the precise lip synchronization technology, which can perfectly match the audio with the character's mouth shape to generate natural and smooth...

Latest AI Resources

11mos ago

081.2K

Mini-o3 - 字节、港大联合开源的视觉推理模型

Mini-o3 - Bytes, HKU Joint Open Source Visual Reasoning Model

Mini-o3 is an open source model jointly launched by ByteDance and the University of Hong Kong, focusing on solving complex visual search problems. The model has a powerful multi-round interactive reasoning capability, and can locate the target through deep exploration and trial-and-error.

Latest AI Resources

11mos ago

052.7K

GPT-5-Codex - OpenAI推出的最强编程模型

GPT-5-Codex - The Most Powerful Programming Model Introduced by OpenAI

GPT-5-Codex is a powerful programming optimization model from OpenAI, further enhanced by GPT-5 and designed for software engineers. The model generates high-quality code quickly, supports multiple programming languages, and optimizes existing code to improve performance.

Latest AI Resources

11mos ago

051K

ROMA - 开源的元Agent框架，自动分解复杂任务并行处理

ROMA - Open Source Meta-Agent Framework for Automatic Decomposition of Complex Tasks for Parallel Processing

ROMA (Recursive-Open-Meta-Agent) is an open source meta-agent framework developed by Sentient AGI to efficiently solve complex problems through recursive task decomposition and parallel processing. Support for Python 3.12+, Docker and ...

Latest AI Resources

11mos ago

066.9K

Lumina-DiMOO - 上海AI Lab联合华为昇腾开源的多模态大模型

Lumina-DiMOO - A Multimodal Large Model Open-Sourced by Shanghai AI Lab and Huawei Ascendant

Lumina-DiMOO is a new generation of unified model for multimodal generation and understanding launched by Shanghai Artificial Intelligence Laboratory (SAL) in conjunction with Huawei Rise at the World Artificial Intelligence Conference 2025. Based on the Rise AI basic hardware and software platform and the MindSpeed MM multimodal large model suite, it accomplishes...

Latest AI Resources

11mos ago

060.3K

Hyprnote - 开源的本地优先AI会议笔记工具

Hyprnote - Open source, locally prioritized AI conference note-taking tool

Hyprnote is an open source, local-first AI meeting note-taking tool designed for professionals to protect user privacy and improve meeting efficiency. Adopting the "local first" principle, all data storage and processing is done on the user's local device to ensure data security and support offline operation.

Latest AI Resources

11mos ago

060.3K

MobileLLM-R1 - Meta开源的专项高效推理模型系列

MobileLLM-R1 - Meta open source special efficient inference model series

MobileLLM-R1 is Meta's open source series of efficient inference models designed for mathematical, programming and scientific reasoning. It contains a base model and a final model, with 140 million, 360 million and 950 million parameter versions, respectively. The models are not generic chat models and are supervised fine-tuned (SFT...

Latest AI Resources

11mos ago

048.1K

ERNIE-4.5-21B-A3B-Thinking - 百度开源的推理思考模型

ERNIE-4.5-21B-A3B-Thinking - Baidu open source reasoning thinking model

ERNIE-4.5-21B-A3B-Thinking is Baidu's open source large-scale language model focused on reasoning tasks. Using the Mixed Expert (MoE) architecture , the total number of references to 21 billion , each token activates 3 billion parameters to support 128K long context window ...

Latest AI Resources

11mos ago

047.6K

MobiAgent - 上海交大开源的移动端智能体全栈构建框架

MobiAgent - Shanghai Jiaotong University open source mobile intelligent body full-stack building framework

MobiAgent is an open source mobile intelligent body toolchain from IPADS Lab of Shanghai Jiaotong University, which helps users to build their own mobile intelligent assistants. By recording the user's operation trajectory and generating high-quality data, it trains an intelligent body that can understand natural language commands. Core features include efficient...

Latest AI Resources

11mos ago

057.1K

ZipVoice - 小米开源的语音合成系列模型

ZipVoice - Xiaomi's open source speech synthesis model series

ZipVoice is a series of speech synthesis (TTS) models based on the Flow Matching architecture released by Xiaomi, including ZipVoice (zero-sample single-speaker speech synthesis model) and ZipVoice-Dialog (zero-sample conversational speech synthesis...

Latest AI Resources

11mos ago

068.8K

PP-OCRv5 - 百度开源的新一代文字识别AI模型

PP-OCRv5 - Baidu's open source AI model for next-generation text recognition

PP-OCRv5 is the latest generation of text recognition AI model released by Baidu. With a lightweight design and a reference count of only 0.07B, it is suitable for efficient operation on CPU and edge devices, and can process more than 370 characters per second. The model supports Simplified Chinese, Traditional Chinese, English, Japanese and Pinyin...

Latest AI Resources

11mos ago

084.7K

Youtu-GraphRAG - 腾讯优图实验室开源的图检索增强生成框架

Youtu-GraphRAG - Tencent Youtu Labs Open Source Graph Retrieval Augmentation Generation Framework

Youtu-GraphRAG is an open source graph retrieval augmentation generation framework from Tencent's Youtu Labs to help large language models handle complex Q&A tasks more accurately. By constructing a four-layer knowledge tree, the knowledge is disassembled into four levels of attributes, relationships, keywords and communities to realize the self-directed performance of cross-domain knowledge...

Latest AI Resources

11mos ago

058.1K

Stand-In - 腾讯微信视觉开源的轻量级视频生成框架

Stand-In - Tencent WeChat Visual Open Source Lightweight Video Generation Framework

Stand-In is a lightweight, plug-and-play identity-preserving video generation framework from Tencent's WeChat Vision team. Focusing on preserving specific identity features in video generation, it only needs to train the additional parameters of the base model 1%, and can achieve excellent results in face similarity and naturalness.

Latest AI Resources

11mos ago

057.5K

IndexTTS2 - B站开源的免费TTS模型，首个支持精确时长控制

IndexTTS2 - B station open source free TTS model, the first to support precise duration control

IndexTTS2 is a new free text-to-speech (TTS) model open-sourced by the B station voice team, which realizes a major breakthrough in emotional expression and duration control, the first autoregressive TTS model that supports precise duration control. Supports zero-sample voice cloning, only one audio file can accurately copy the sound...

Latest AI Resources

11mos ago

0125.6K

MiniMax Music 1.5 - MiniMax最新推出的AI音乐生成模型

MiniMax Music 1.5 - MiniMax's latest AI music generation model

MiniMax Music 1.5 is an advanced AI music generation tool that supports generating up to 4 minutes of music based on users' natural language descriptions. The model supports a variety of music styles and mood customization, generating a natural and full vocal color, smooth transitions, richly layered arrangements...

Latest AI Resources

11mos ago

058.9K

HuMo - 清华大学联合字节开源的多模态视频生成框架

HuMo - Tsinghua University United Bytes open source multimodal video generation framework

HuMo is a multi-modal video generation framework jointly open-sourced by Tsinghua University and ByteDance Intelligent Creation Lab, focusing on human-centered video generation. It can generate high-quality, fine-grained and controllable human videos from a variety of modal inputs such as text, images and audio.HuMo supports a powerful text cue-following capability...

Latest AI Resources

11mos ago

0144.6K

AnyI2V - 复旦联合阿里达摩院等开源的智能图像动画生成框架

AnyI2V - Fudan, Ali Dharma Institute and other open source framework for intelligent image animation generation

AnyI2V is an image animation generation framework jointly launched by Fudan University, Alibaba Dharma Institute and others, which supports the conversion of static conditional images (e.g., grids, point clouds, etc.) into dynamic videos without the need for complex training processes and large amounts of data.

Latest AI Resources

11mos ago

055.8K

SRPO - 腾讯混元推出的文本到图像生成模型

SRPO - Text-to-Image Generation Model launched by Tencent Mixed Meta

SRPO (Semantic Relative Preference Optimization) is a text-to-image generation model introduced by Tencent Hybrid, which optimizes the reward mechanism through text conditioned signals to achieve online adjustment of rewards and reduce offline fine-tuning dependency.

Latest AI Resources

11mos ago

071.2K

Qwen3-Next - 阿里通义推出的最新基础模型

Qwen3-Next - the latest base model from Ali Tongyi

Qwen3-Next is a new generation of hybrid architecture big model open source by Ali Tongyi, combining Gated DeltaNet and Gated Attention technology, good at dealing with long text, fast inference and saving computing resources.

Latest AI Resources

11mos ago

053.6K

文心大模型X1.1 - 百度推出的深度思考模型，理解能力更强

Wenshin Big Model X1.1 - Baidu's Deep Thinking Model for Better Understanding

Wenxin Big Model X1.1 is a deep thinking model launched by Baidu, based on a hybrid reinforcement learning framework that focuses on improving language understanding and generation. The model excels in handling complex questions, following instructions and simulating the behavior of intelligences, and can accurately provide knowledgeable answers and high-quality text content.

Latest AI Resources

11mos ago

058.9K

混元图像2.1 - 腾讯推出的开源文生图模型

Hybrid Image 2.1 - Tencent's Open Source Vendor Graph Model

HunyuanImage 2.1 is Tencent's open source graphic model, designed for high-quality image generation. The model supports native 2K resolution, can accurately render complex scenes and details, so that the character's expression and movement can be vividly reproduced.

Latest AI Resources

11mos ago

055.2K

AntSK FileChunk - 免费的AI语义文档切片工具，动态切片调整

AntSK FileChunk - Free AI Semantic Document Slicing Tool, Dynamic Slicing Adjustment

AntSK FileChunk is a free intelligent document slicing tool designed for RAG (Retrieval Augmented Generation) applications. Semantic as the core, the document will be intelligently sliced into semantically complete, coherent segments , support for multi-language , can dynamically adjust the size of the slice to ensure that the context of coherence.

Latest AI Resources

11mos ago

059.9K

UnifiedTTS - 一站式TTS API服务平台，实时性能监控

UnifiedTTS - One-stop TTS API Service Platform, Real-time Performance Monitoring

UnifiedTTS is a one-stop platform for text-to-speech (TTS) services. It supports multiple languages, including Chinese, English, Japanese and Korean, to meet the needs of global business. Through a unified API interface, it integrates many mainstream TTS services, including Micro...

Latest AI Resources

11mos ago

064.3K

MiniCPM 4.1 - 面壁智能推出的超高效端侧大模型

MiniCPM 4.1 - Ultra-efficient end-side grand model introduced by Facing Face Intelligence

MiniCPM 4.1 is an ultra-efficient end-side large language model introduced by Facade Intelligence. With InfLLM v2 sparse attention architecture, each lexeme only needs to calculate the relevance to less than 5% lexemes, which significantly reduces the processing overhead of long text. In a 128K long text scenario...

Latest AI Resources

11mos ago

055.7K

WeKnora - 腾讯微信开源的文档理解与语义检索框架

WeKnora - Tencent WeChat Open Source Document Understanding and Semantic Retrieval Framework

WeKnora is Tencent WeChat team open source based on the Large Language Model (LLM) document understanding and semantic retrieval framework , designed for the structure of complex, heterogeneous document content scenarios and designed to use a modularized architecture , integration of multimodal preprocessing , semantic vector indexing , intelligent recall and large model generative reasoning ...

Latest AI Resources

11mos ago

0104K

XTuner V1 - 上海AI Lab开源的大模型训练引擎

XTuner V1 - Shanghai AI Lab open source large model training engine

XTuner V1 is a new generation of large model training engine open-sourced by Shanghai Artificial Intelligence Laboratory (SAL), designed for ultra-large scale sparse Mixed Expert (MoE) model training. Developed based on PyTorch FSDP, it achieves high performance through multi-dimensional optimization of memory, communication and load ...

Latest AI Resources

11mos ago

054.1K

Qwen3-ASR-Flash - 阿里通义千问推出的系列语音识别模型

Qwen3-ASR-Flash - A series of speech recognition models launched by Ali Tongyi Qianqian

Qwen3-ASR-Flash is Alibaba's latest high-precision speech recognition model, based on the Qwen3 base model, trained on massive multimodal data. It supports 11 languages and multiple accents, including Mandarin, Sichuan, Minnan, Wu, Cantonese and other dialects...

Latest AI Resources

11mos ago

067.4K

Seedream 4.0 - 字节推出的最新一代图像创作模型

Seedream 4.0 - the latest generation of image creation models launched by Bytes

Seedream 4.0 is an advanced image generation and editing tool launched by ByteDance, centered on the integration of generation and editing, with powerful features such as precise command editing, high feature retention, and deep intent understanding.

Latest AI Resources

11mos ago

0103.8K

rStar2-Agent - 微软开源的高效AI推理模型

rStar2-Agent - Microsoft's Open Source Efficient AI Reasoning Model

rStar2-Agent is an advanced AI mathematical reasoning model open-sourced by Microsoft that demonstrates strong mathematical problem solving capabilities by achieving an accuracy of 80.61 TP3T in the AIME24 test. The model is equipped with scientific reasoning capabilities, achieving in the GPQA-Diamond benchmark...

Latest AI Resources

11mos ago

052.5K

Qwen3-Max-Preview - 通义千问推出的旗舰大语言模型

Qwen3-Max-Preview - The Flagship Big Language Model from Tongyi Qianqian

Qwen3-Max-Preview is the latest flagship large language model released by Tongyi Qianwen. It is the model with the largest number of parameters in the Qwen3 family, with a parameter size of over 1 trillion. The model has significant improvements in inference, instruction following, multi-language support and long-tail knowledge coverage...

Latest AI Resources

11mos ago

058.4K

OneCAT - 美团联合上海交大开源的多模态模型

OneCAT - Open source multimodal modeling by Meituan and Shanghai Jiaotong University

OneCAT is a new unified multimodal model launched by Meituan in conjunction with Shanghai Jiaotong University, which adopts a pure decoder architecture and can seamlessly integrate multimodal comprehension, text-to-image generation and image editing functions. The model abandons the design of traditional multimodal models that rely on external visual coders and disambiguators through modality-specific...

Latest AI Resources

11mos ago

056.5K

Claudable - 开源AI Web应用构建器，自然语言生成代码

Claudable - Open Source AI Web Application Builder, Natural Language Generated Code

Claudable is an open source web application builder based on Next.js that combines the advanced AI agent capabilities of Claude Code and Cursor CLI with Lovable's simple and intuitive application building experience....

Latest AI Resources

11mos ago

064.5K

FineVision - Hugging Face推出的开源视觉语言数据集

FineVision - Open Source Visual Language Dataset from Hugging Face

FineVision is Hugging Face's open source visual language dataset for training advanced visual language models. It contains 17.3 million images, 24.3 million samples, 88.9 million rounds of dialog, and 9.5 billion answer tokens. The dataset aggregates...

Latest AI Resources

11mos ago

061.1K

InfinityHuman - 字节联合浙大推出的长视频数字人生成模型

InfinityHuman - Long video digital human generation model launched by Bytes in collaboration with ZJU

InfinityHuman is a commercial-grade long time-series audio-driven character video generation model jointly launched by ByteDance and Zhejiang University. The model is audio-driven and can generate high-resolution, long duration and visually consistent character videos.

Latest AI Resources

11mos ago

056.2K

HunyuanWorld-Voyager - 腾讯开源的超长漫游世界模型

HunyuanWorld-Voyager - Tencent open source ultra-long roaming world model

HunyuanWorld-Voyager (Hunyuan Voyager for short) is the industry's first ultra-long roaming world model released by Tencent that supports native 3D reconstruction. It is a novel video diffusion framework that generates a 3D point cloud sequence of user-defined camera paths from a single image, supporting...

Latest AI Resources

11mos ago

056.9K

Hunyuan-MT-7B - 腾讯混元开源的轻量级翻译模型

Hunyuan-MT-7B - Tencent Mixed Meta Open Source Lightweight Translation Model

Hunyuan-MT-7B is a lightweight translation model introduced by Tencent's Mixed Meta Team, with 7 billion references, supporting the mutual translation of 33 languages and 5 folk-Chinese languages/dialects, including Cantonese, Uyghur, and Tibetan. In the International Association for Computational Linguistics (ACL) WMT2025 competition...

Latest AI Resources

11mos ago

052.6K

Step-Audio 2 mini - 阶跃星辰开源的语音大模型

Step-Audio 2 mini - Step-Star Open Source Speech Megamodels

Step-Audio 2 mini is an open source end-to-end speech grand model of Step-Audio. It breaks through the traditional speech model structure and adopts the true end-to-end multimodal architecture, which directly transforms the original audio input into speech response output with lower latency, and understands paralinguistic information and non-vocal signals.

Latest AI Resources

11mos ago

062K

MobileCLIP2 - 苹果公司开源的高效端侧多模态模型

MobileCLIP2 - Apple's Open Source Efficient End-Side Multi-Modal Modeling

MobileCLIP2 is an upgraded version of MobileCLIP, an efficient end-side multimodal model introduced by Apple researchers. It is optimized in terms of multimodal reinforcement training by training better-performing CLIP instructor model integration on DFN datasets and improved graphical raw...

Latest AI Resources

11mos ago

071.1K

InternVL3.5 - 上海AI实验室开源的多模态大模型

InternVL3.5 - Shanghai AI Lab Open Source Multimodal Large Models

InternVL3.5 (Shusheng-Wanxiang 3.5) is an open source multimodal large model of the Shanghai Artificial Intelligence Laboratory, the model is fully upgraded in terms of general ability, reasoning ability and deployment efficiency, providing nine sizes of versions from 1 billion to 241 billion parameters, covering different resource demand scenarios, including thick...

Latest AI Resources

11mos ago

066.8K

FastVLM - 苹果公司推出的视觉语言模型

FastVLM - Visual Language Model from Apple

FastVLM (Fast Vision Language Model) is an efficient visual language model introduced by Apple Inc. With FastViTHD hybrid visual coder as the core, it incorporates convolutional and Transformer architectures to significantly reduce visual...

Latest AI Resources

11mos ago

065.1K

Meeseeks - 美团开源的评估模型指令遵循能力的评测集

Meeseeks - Meeseeks open-source assessment set for evaluating the ability to follow model instructions

Meeseeks is an open source large model evaluation set used by the Meituan M17 team to evaluate the model's ability to follow instructions.Meeseeks uses a three-tiered evaluation framework to comprehensively measure whether the model is able to generate answers in strict accordance with the user's instructions from the macro to the micro level, without evaluating the knowledge of the content of the answers positively ...

Latest AI Resources

11mos ago

058.9K

gpt-realtime - OpenAI最新推出的AI语音模型

gpt-realtime - OpenAI's newest AI speech model

gpt-realtime is an advanced speech model from OpenAI that supports direct audio processing to generate natural and smooth speech. The model supports multiple languages and styles, understands non-verbal cues such as laughter, and can switch between languages.

Latest AI Resources

11mos ago

063.1K

Youtu-agent - 腾讯开源的高效智能体框架

Youtu-agent - Tencent open source efficient intelligent body framework

Youtu-agent is an open source framework for building and running autonomous intelligences from Tencent Youtu Labs. The framework performs well in WebWalkerQA and GAIA benchmarks, with an accuracy of 71.47% and 72.8% respectively.The framework...

Latest AI Resources

11mos ago

073.1K