Latest AI Resources

Total 3143 articles posts

Course materials Latest AI Resources AI Knowledge Base AI News

Sorting

InfinityHuman - 字节联合浙大推出的长视频数字人生成模型

InfinityHuman - Long video digital human generation model launched by Bytes in collaboration with ZJU

InfinityHuman is a commercial-grade long time-series audio-driven character video generation model jointly launched by ByteDance and Zhejiang University. The model is audio-driven and can generate high-resolution, long duration and visually consistent character videos.

Latest AI Resources

11mos ago

056.4K

AutoMV - M-A-P联合北邮、南大等开源的免费音乐视频生成系统

AutoMV - M-A-P open source free music video generation system in conjunction with the North Post, South University, etc.

AutoMV is an open source music video generation system developed by the M-A-P team in collaboration with several universities, which can automatically generate coherent music videos based on complete songs without training.It adopts a multi-intelligence body collaboration model, including music analysis, scriptwriting, directing, and quality control modules, and can accurately analyze the lyrics, beats, and...

Latest AI Resources

7mos ago

056.3K

Audio2Face - NVIDIA开源的AI 3D面部动画生成模型

Audio2Face - NVIDIA open source AI 3D facial animation generation model

Audio2Face is NVIDIA's open source AI tool capable of transforming audio input into realistic 3D facial animation. By analyzing speech features in the audio, such as phonemes and intonation, it generates precise lip synchronization and subtle emotional expressions to give vivid human expressions to virtual characters.

Latest AI Resources

10mos ago

056.3K

Wide Research - Manus平台推出的多智能体协同功能

Wide Research - Multi-Intelligence Collaboration Introduced on the Manus Platform

Wide Research is a powerful feature of the Manus platform designed to handle complex and large-scale tasks. The platform supports hundreds of general-purpose intelligences working simultaneously through system-level parallel processing mechanisms and intelligence collaboration protocols.

Latest AI Resources

1yrs ago

056.3K

10Kh RealOmni-Open - 简智机器人开源的具身智能数据集

10Kh RealOmni-Open - 简智机器人开源的具身智能数据集

10Kh RealOmni-Open是简智机器人开源的具身智能数据集，是行业内规模最大的开源具身智能数据集。数据集累计拥有超10000小时数据、100万+片段，覆盖10大场景任务、超过30项技能。数据...

Latest AI Resources

7mos ago

056.2K

SCAIL - 智谱联合清华开源的影视级角色动画生成框架

SCAIL - Smart Spectrum and Tsinghua open source film and television character animation generation framework

SCAIL (Studio-Grade Character Animation via In-Context Learning) is a film and television grade character animation generation framework proposed by Smart Spectrum in collaboration with Prof. Liu Yongjin's group at Tsinghua University. Through...

Latest AI Resources

8mos ago

056.2K

VibeVoice-ASR - 微软开源的统一语音转文本（ASR）模型

VibeVoice-ASR - 微软开源的统一语音转文本（ASR）模型

VibeVoice-ASR是微软开源的统一语音转文本（ASR）模型，专为处理长音频设计，可一次性处理长达60分钟的连续音频，确保语义连贯性和说话人追踪的一致性。支持自定义热词功能，用户可输入特定词汇或...

Latest AI Resources

6mos ago

056.2K

AnyI2V - 复旦联合阿里达摩院等开源的智能图像动画生成框架

AnyI2V - Fudan, Ali Dharma Institute and other open source framework for intelligent image animation generation

AnyI2V is an image animation generation framework jointly launched by Fudan University, Alibaba Dharma Institute and others, which supports the conversion of static conditional images (e.g., grids, point clouds, etc.) into dynamic videos without the need for complex training processes and large amounts of data.

Latest AI Resources

11mos ago

056.2K

MiMo-V2-Flash - 小米发布的开源MoE架构大模型

MiMo-V2-Flash - a large model of the open source MoE architecture released by Xiaomi

MiMo-V2-Flash is an open source MoE architecture large model released by Xiaomi, with 309 billion total parameters and 15 billion active parameters, focusing on efficient reasoning and intelligent body applications. The model adopts hybrid attention architecture and multi-word meta-prediction technology, with an inference speed of 150 tokens/second, into...

Latest AI Resources

8mos ago

056.1K

Clawra - 基于OpenClaw框架开源的AI女友程序

Clawra - 基于OpenClaw框架开源的AI女友程序

Clawra是一个基于OpenClaw框架开发的AI女友程序，由韩国开发者David Im制作，具有完整人设和交互功能。通过Persona Engineering技术赋予AI“18岁亚裔女性练习生”的...

Latest AI Resources

6mos ago

056.1K

OpenReasoning-Nemotron - 英伟达推出的开源系列推理模型

OpenReasoning-Nemotron - Open Source Series of Reasoning Models from NVIDIA

OpenReasoning-Nemotron is a series of large-scale language models open-sourced by NVIDIA to support processing of reasoning tasks in math, science and code. The models are distilled based on the DeepSeek R1 0528 model with parameter scales of 1.5B...

Latest AI Resources

1yrs ago

056.1K

FLUX.2 [klein] - Black Forest Labs 开源的轻量级图像生成与编辑模型

FLUX.2 [klein] - Black Forest Labs 开源的轻量级图像生成与编辑模型

FLUX.2 [klein] 是 Black Forest Labs 推出的开源轻量级图像生成与编辑模型，专为快速推理和低延迟应用场景设计。支持文本生成图像、图像编辑以及多参考图像生成，能在不到1秒内...

Latest AI Resources

6mos ago

056K

GLM-ASR - 智谱AI开源的高性能语音识别模型系列

GLM-ASR - Wisdom Spectrum AI open source high-performance speech recognition model series

GLM-ASR is a family of high-performance speech recognition models open-sourced by Smart Spectrum AI, including the cloud-based model GLM-ASR-2512 and the open-source end-side model GLM-ASR-Nano-2512.GLM-ASR-2512 is the world's leading cloud-based speech recognition model, supporting multiple...

Latest AI Resources

8mos ago

056K

GLM-OCR - 智谱开源的 0.9B 轻量级专业 OCR 模型

GLM-OCR - 智谱开源的 0.9B 轻量级专业 OCR 模型

GLM-OCR 是智谱开源的 0.9B 轻量级专业 OCR 模型，在 OmniDocBench V1.5 以 94.6 分刷新 SOTA。兼顾“小体积”与“全场景”，扫描、手写、印章、多语混排、复杂表...

Latest AI Resources

6mos ago

056K

MiniCPM 4.1 - 面壁智能推出的超高效端侧大模型

MiniCPM 4.1 - Ultra-efficient end-side grand model introduced by Facing Face Intelligence

MiniCPM 4.1 is an ultra-efficient end-side large language model introduced by Facade Intelligence. With InfLLM v2 sparse attention architecture, each lexeme only needs to calculate the relevance to less than 5% lexemes, which significantly reduces the processing overhead of long text. In a 128K long text scenario...

Latest AI Resources

11mos ago

055.8K

MiniCPM-o 4.5 - 面壁智能开源的 9B 全模态旗舰模型

MiniCPM-o 4.5 - 面壁智能开源的 9B 全模态旗舰模型

MiniCPM-o 4.5 是面壁智能开源的 9B 全模态旗舰模型，以“边看边听主动说”的端到端架构，在手机端即可跑出 GPT-4o 级体验：支持单图、多图、高帧率长视频、实时语音双工对话，首 tok...

Latest AI Resources

6mos ago

055.8K

Protenix-v1 - 字节Seed团队推出的首个开源蛋白质结构预测模型

Protenix-v1 - 字节Seed团队推出的首个开源蛋白质结构预测模型

Protenix-v1是字节跳动ByteDance Seed团队推出的首个开源蛋白质结构预测模型，性能在严格对齐训练数据和模型规模后超越AlphaFold 3。模型具备显著的推理时扩展特性：通过增加采...

Latest AI Resources

6mos ago

055.8K

Kaleido - 智谱AI联合清华大学等开源的多主体参考视频生成模型

Kaleido - A multi-subject reference video generation model open-sourced by Smart Spectrum AI in collaboration with Tsinghua University and others

Kaleido is an open source multi-subject reference video generation model jointly developed by Hefei University of Technology, Tsinghua University and Smart Spectrum AI. It generates subject-consistent videos through multiple reference images, solving the deficiencies of existing models in multi-subject consistency and background decoupling.Kaleido generates videos through specialized data...

Latest AI Resources

8mos ago

055.8K

阶跃深研 - 阶跃星辰推出的AI深入研究工具

Steps Deep Research - AI Deep Research Tool by Steps Star

Steps Deep Research is an efficient AI research tool launched by Steps Star, which can autonomously complete research on complex issues and generate professional reports in a short period of time. The tool is designed for finance, consulting, healthcare, law and other fields, and excels in industry reviews with its in-depth search and information integration capabilities.

Latest AI Resources

1yrs ago

055.8K

Step-Audio-R1.1 - 阶跃星辰开源的全球首个原生语音推理模型

Step-Audio-R1.1 - 阶跃星辰开源的全球首个原生语音推理模型

Step-Audio-R1.1是阶跃星辰开源的全球首个原生语音推理模型，最新升级版本在权威评测榜单Artificial Analysis Speech Reasoning中以96.4%准确率登顶。模型...

Latest AI Resources

7mos ago

055.8K

Claude Sonnet 4.5 - Anthropic推出的最强AI编程模型

Claude Sonnet 4.5 - The Most Powerful AI Programming Model from Anthropic

Claude Sonnet 4.5 is an artificial intelligence model from Anthropic designed for programming, computer operations, and complex task automation. The model excels in code generation, long-duration task processing, reasoning, and mathematical computation, supporting everything from initial planning...

Latest AI Resources

10mos ago

055.7K

ZeroSearch - 阿里通义推出的开源大模型搜索引擎框架

ZeroSearch - Ali Tongyi launched the open source large model search engine framework

ZeroSearch is Alibaba Tongyi Lab open source innovative large model search engine framework. The framework does not need to interact with real search engines , based on the simulation of the search engine , with a large model of its own pre-training knowledge to generate relevant or noise documents , significantly reducing the training cost ( reduce 80% or more ...

Latest AI Resources

1yrs ago

055.6K

PromptEnhancer - 腾讯混元开源的AI提示词增强工具

PromptEnhancer - Tencent Mixed Meta Open Source AI Prompt Word Enhancement Tool

PromptEnhancer is an open source prompt word enhancement tool from Tencent's Mixed Meta team to improve the generation of text-to-image (Text-to-Image, T2I) models. Through the chain of reasoning (Chain-of-Thought, CoT) approach to the use of ...

Latest AI Resources

11mos ago

055.5K

FLM-Audio - 智源联合南洋理工开源的全双工音频对话模型

FLM-Audio - Wisdom Source and Nanyang Polytechnic Open Source Full-Duplex Audio Dialog Modeling

FLM-Audio is a native full-duplex audio dialog grand model released by Beijing Zhiyuan Artificial Intelligence Research Institute in conjunction with Spin Matrix and Nanyang Technological University of Singapore, supporting both Chinese and English. Adopting native full-duplex architecture, it can merge listening, speaking and monologue at each time step...

Latest AI Resources

10mos ago

055.5K

New API - 开源的AI模型接口管理与分发系统，统一为标准化接口

New API - 开源的AI模型接口管理与分发系统，统一为标准化接口

New API是基于Go语言开发的开源AI聚合网关工具，可统一管理30+种主流大模型（如OpenAI、Claude、Midjourney等），将不同模型接口转换为标准化OpenAI格式。

Latest AI Resources

7mos ago

055.4K

Kimi Linear - 月之暗面开源的新型混合线性注意力架构

Kimi Linear - A New Hybrid Linear Attention Architecture Open-Sourced by Dark Side of the Moon

Kimi Linear is a new hybrid linear attention architecture open-sourced by Dark Side of the Moon, with Kimi Delta Attention (KDA) as the core, optimizing the traditional attention model through a finer-grained gating mechanism, which significantly improves the hardware efficiency and memory control ability ...

Latest AI Resources

9mos ago

055.4K

混元图像2.1 - 腾讯推出的开源文生图模型

Hybrid Image 2.1 - Tencent's Open Source Vendor Graph Model

HunyuanImage 2.1 is Tencent's open source graphic model, designed for high-quality image generation. The model supports native 2K resolution, can accurately render complex scenes and details, so that the character's expression and movement can be vividly reproduced.

Latest AI Resources

11mos ago

055.4K

SoulX-Podcast - Soul AI Lab开源的对话式语音合成模型

SoulX-Podcast - Soul AI Lab's Open Source Conversational Speech Synthesis Model

SoulX-Podcast is Soul AI Lab's open source advanced multi-speaker conversational speech synthesis model designed for generating high quality podcast content. SoulX-Podcast has the ability to generate multiple rounds of conversations, which can simulate smooth conversations in real podcasting scenarios, and supports Mandarin, English, and multiple Chinese...

Latest AI Resources

9mos ago

055.3K

NeuTTS Air - 支持离线CPU运行的免费轻量级语音合成模型

NeuTTS Air - Free and Lightweight Speech Synthesis Model with Offline CPU Running Support

NeuTTS Air is open source lightweight speech synthesis model, developed by Neuphonic team, which can run in real time on local devices (e.g. cell phones, laptops, Raspberry Pi) without relying on the cloud. Using 0.5B parameter Qwen architecture and self-developed NeuCodec codec...

Latest AI Resources

10mos ago

055.3K

ClawWork - 香港大学数据科学实验室开源的AI经济压力测试框架

ClawWork - 香港大学数据科学实验室开源的AI经济压力测试框架

ClawWork是香港大学数据科学实验室开发的AI经济压力测试框架，允许AI在模拟经济环境中完成真实工作任务并获得报酬。核心逻辑是让初始资金仅10美元的AI通过完成220个专业任务（覆盖制造、金融、医...

Latest AI Resources

5mos ago

055.3K

XVERSE-Ent - 元象科技开源的泛娱乐领域中英大模型

XVERSE-Ent - 元象科技开源的泛娱乐领域中英大模型

XVERSE-Ent是元象科技推出的专注于泛娱乐领域的开源大模型，包含中英文双版本，支持社交互动、游戏叙事和文化创作等场景。模型通过角色一致性强化、长剧情理解等技术优化，能在虚拟角色人设稳定性、复杂故...

Latest AI Resources

7mos ago

055.3K

SoulX-FlashTalk - Soul App AI团队开源的实时数字人生成模型

SoulX-FlashTalk - Soul App AI团队开源的实时数字人生成模型

SoulX-FlashTalk是Soul App AI团队开源的实时数字人生成模型，拥有140亿参数量，实现了0.87秒超低延迟和32帧/秒的高帧率。模型通过双向蒸馏技术解决了传统数字人延迟高、画面易...

Latest AI Resources

6mos ago

055.1K

VLAC - 上海AI Lab开源的具身奖励大模型

VLAC - Shanghai AI Lab's Open Source Large Model of Embodied Reward

VLAC is an open source embodied reward macromodel from Shanghai Artificial Intelligence Laboratory. Based on InternVL multimodal macromodel, it integrates Internet video data and robot operation data to provide process reward and task completion estimation for robot reinforcement learning in the real world.VLAC can effectively ...

Latest AI Resources

11mos ago

054.9K

Wan-Move - 阿里通义联合清华等开源的AI视频生成框架

Wan-Move - Ali Tongyi's open source AI video generation framework with Tsinghua and others

Wan-Move is an open source AI video generation framework jointly developed by Ali Tongyi Labs, Tsinghua University and other organizations, focusing on high-quality video synthesis through precise motion control technology. The core technology is "potential trajectory guidance", which can seamlessly add point-level motion control to the existing image-to-video model...

Latest AI Resources

8mos ago

054.9K

MCP Registry - GitHub推出的官方MCP服务器管理平台

MCP Registry - The official MCP server management platform from GitHub.

The MCP Registry is a centralized platform from GitHub that helps developers discover and install MCP servers more easily.The MCP Registry is here to help developers quickly find the AI tools they need in one place, greatly simplifying...

Latest AI Resources

11mos ago

054.9K

SHARP - 苹果开源的单目视图3D场景合成技术

SHARP - Apple's open source monocular view 3D scene synthesis technology

SHARP (Sharp Monocular View Synthesis in Less Than a Second) is Apple's open source monocular view synthesis technology. It can quickly generate a realistic 3D representation of a scene from a single photo in less than a second...

Latest AI Resources

7mos ago

054.9K

Ring-2.5-1T - 蚂蚁百灵开源的万亿参数混合线性架构思考模型

Ring-2.5-1T - 蚂蚁百灵开源的万亿参数混合线性架构思考模型

Ring-2.5-1T 是蚂蚁集团百灵大模型团队开源的全球首个万亿参数混合线性架构思考模型，采用1:7 MLA与Lightning Linear Attention混合设计，激活参数量达63B。模型在...

Latest AI Resources

5mos ago

054.8K

Xiaomi-Robotics-0 - 小米开源的首代具身智能大模型

Xiaomi-Robotics-0 - 小米开源的首代具身智能大模型

Xiaomi-Robotics-0 是小米开源的首代具身智能大模型，拥有47亿参数，采用"大脑+小脑"混合架构设计。视觉语言大脑基于多模态大模型，负责理解人类模糊指令与空间推理；动作执行小脑则通过Di...

Latest AI Resources

5mos ago

054.7K

LingBot-World - 蚂蚁旗下灵波科技开源的交互式世界模型

LingBot-World - 蚂蚁旗下灵波科技开源的交互式世界模型

LingBot-World 是蚂蚁集团旗下具身智能公司灵波科技（Robbyant）开源的交互式世界模型，专为具身智能、自动驾驶及游戏开发打造高保真“数字演练场”。模型通过可扩展数据引擎从大规模游戏环境...

Latest AI Resources

6mos ago

054.6K

混元Motion1.0 - 腾讯混元团队开源的文本生成3D动作模型

Mixed Motion 1.0 - Tencent Mixed Motion team open source text to generate 3D action models

Hybrid Motion1.0 (HY-Motion1.0) is Tencent Hybrid team open source text generated 3D action model , using 1 billion parameters Diffusion Transformer architecture , can be directly generated through natural language description of high-quality 3D character animation .

Latest AI Resources

7mos ago

054.6K

Kimi K2.5 - 月之暗面开源的新一代旗舰模型

Kimi K2.5 - 月之暗面开源的新一代旗舰模型

Kimi K2.5 是月之暗面发布的开源旗舰模型，采用 1T MoE 架构、激活 32B、上下文 256K token，原生支持图文视频多模态输入。在 Agent、代码、视觉理解三大基准均列开源第一...

Latest AI Resources

6mos ago

054.5K

DeepSeek-V3.1-Terminus - DeepSeek推出的最新版AI模型

DeepSeek-V3.1-Terminus - The latest version of the AI model introduced by DeepSeek

DeepSeek-V3.1-Terminus is an upgraded version of DeepSeek-V3.1, an artificial intelligence language model from the DeepSeek team. The model is optimized in terms of language consistency, code generation, and search capabilities to more accurately...

Latest AI Resources

10mos ago

054.5K

HeyGen - AI 数字人视频创作平台，支持多语言翻译配音

HeyGen - AI Digital Human Video Creation Platform with Multi-Language Translation and Dubbing Support

HeyGen is an AI-driven digital human video creation platform that supports a streamlined video production process, allowing users to quickly generate professional-level digital human videos. The platform is based on advanced AI technology, giving users full control over the image and voice of digital people, providing a rich library of material, including diverse background...

Latest AI Resources

1yrs ago

054.5K

olmOCR 2 - AI2开源的多模态文档解析模型

olmOCR 2 - AI2 open source multimodal document parsing model

olmOCR 2 is an open source multimodal document parsing model from the Allen Institute for Artificial Intelligence (AI2) and is an upgraded version of olmOCR. The digitized printed documents (e.g. PDF) will be high...

Latest AI Resources

9mos ago

054.5K

LingBot-VA - 蚂蚁灵波开源的首个“自回归视频-动作世界模型”

LingBot-VA - 蚂蚁灵波开源的首个“自回归视频-动作世界模型”

LingBot-VA 是蚂蚁灵波开源的全球首个“自回归视频-动作世界模型”，把视频生成与机器人控制塞进同一 Transformer，每一步同时输出下一帧世界画面和对应动作，实现“边想边干”。

Latest AI Resources

6mos ago

054.4K

CWM - Meta FAIR开源的代码世界语言模型

CWM - Meta FAIR open source code world language model

CWM (Code World Model) is a 32-billion-parameter open-source world language model released by the Meta FAIR team, designed for code generation and reasoning. Introducing the concept of "world model", it can simulate the code execution process, predict the variable state changes, and advance...

Latest AI Resources

10mos ago

054.4K

XTuner V1 - 上海AI Lab开源的大模型训练引擎

XTuner V1 - Shanghai AI Lab open source large model training engine

XTuner V1 is a new generation of large model training engine open-sourced by Shanghai Artificial Intelligence Laboratory (SAL), designed for ultra-large scale sparse Mixed Expert (MoE) model training. Developed based on PyTorch FSDP, it achieves high performance through multi-dimensional optimization of memory, communication and load ...

Latest AI Resources

11mos ago

054.3K

Ouro - 字节跳动Seed团队开源的新型循环语言模型

Ouro - A new cyclic language model open-sourced by the ByteHopper Seed team

Ouro is a new type of Looped Language Models (LLMs) developed by the ByteDance Seed team, with the core innovation of directly building inference capabilities in the pre-training phase through a parameter-sharing recurrent computation structure. The model uses 24 layers as the base block through...

Latest AI Resources

9mos ago

054.1K

Code2Video - Show Lab开源的AI教学视频生成框架

Code2Video - Show Lab open source AI teaching video generation framework

Code2Video is innovative open source project that automatically converts code snippets into high quality video content (mp4 format). The project through a unique code-centric paradigm , the use of carbon-now-cli tools to generate code into beautiful images , the use of ffmpeg will be these ...

Latest AI Resources

10mos ago

054.1K

Neovate Code - 蚂蚁开源的智能编程助手

Neovate Code - Ant Open Source's Intelligent Programming Assistant

Neovate Code is an open source intelligent programming assistant from Ant Group's Alipay Experience Technology Department, which improves development efficiency through artificial intelligence technology. With conversational development features, developers can describe the requirements through natural language, Neovate Code can understand and generate the corresponding generation...

Latest AI Resources

10mos ago

054K

SkyReels-V3 - 昆仑万维Skywork AI开源的多模态视频生成模型

SkyReels-V3 - 昆仑万维Skywork AI开源的多模态视频生成模型

SkyReels-V3是昆仑万维Skywork AI开源的多模态视频生成模型，被誉为视频生成领域的"全能型"标杆。模型基于"一核多支"的统一架构，在单一建模框架内集成三大核心能力：参考图像转视频、智能...

Latest AI Resources

6mos ago

053.7K

Wan2.2-Animate - 通义万相开源的动作生成模型

Wan2.2-Animate - A Generative Model for Action Generation of the Tongyi Wanphase Open Source

Wan2.2-Animate is an open source action generation model , support for action imitation and role-playing mode . Users only need to input a character picture and a reference video , the model can migrate the video character's movements and expressions to the picture character , giving the picture character dynamic expression ...

Latest AI Resources

10mos ago

053.7K

Qwen3-Next - 阿里通义推出的最新基础模型

Qwen3-Next - the latest base model from Ali Tongyi

Qwen3-Next is a new generation of hybrid architecture big model open source by Ali Tongyi, combining Gated DeltaNet and Gated Attention technology, good at dealing with long text, fast inference and saving computing resources.

Latest AI Resources

11mos ago

053.7K

Mistral Vibe - Mistral AI推出的开源命令行编码助手

Mistral Vibe - Open Source Command Line Coding Assistant from Mistral AI

Mistral Vibe is an open source command line coding assistant from Mistral AI, developed based on the Devstral model, which supports natural language interaction to complete code search, file manipulation, version control and other tasks. Can automatically scan the project structure and Git status through the @ symbol...

Latest AI Resources

8mos ago

053.5K

SAM Audio - Meta推出的开源多模态音频分割模型

SAM Audio - Open Source Multimodal Audio Segmentation Model from Meta

SAM Audio is an open source multimodal audio segmentation model introduced by Meta to accurately separate arbitrary target sounds from complex audio mixes. By combining textual, visual, and temporal dimensional cues, it enables flexible and efficient audio processing for tasks such as audio editing, denoising, sound extraction, and...

Latest AI Resources

7mos ago

053.5K

UltraEval-Audio - 清华、OpenBMB联合面壁智能开源的音频模型评测框架

UltraEval-Audio - 清华、OpenBMB联合面壁智能开源的音频模型评测框架

UltraEval-Audio是清华大学NLP实验室、OpenBMB和面壁智能联合开发的音频模型评测框架，最新版本为v1.1.0。专注于解决音频模型复现难、依赖冲突等问题，提供一键复现热门模型（如Vo...

Latest AI Resources

7mos ago

053.4K

Chatterbox-Turbo - Resemble AI开源的文本到语音模型

Chatterbox-Turbo - Resemble AI开源的文本到语音模型

Chatterbox-Turbo 是 Resemble AI 推出的开源文本到语音（TTS）模型，专为高效、低延迟的语音合成而设计。基于350M参数的精简架构，单步推理生成音频，时间延迟极低，在150...

Latest AI Resources

7mos ago

053.3K

DreamOmni2 - 港科大开源的多模态AI图像编辑与生成模型

DreamOmni2 - HKUST open source multimodal AI image editing and generation models

DreamOmni2 is a multimodal AI image editing and generation model open-sourced by Jiajia's team at HKUST. Can handle both text and image commands, supports multiple reference images, providing creators with more flexible ways of creation. The model is trained using a three-stage data synthesis process , joint training generation/editing...

Latest AI Resources

9mos ago

053.3K

PaperBanana - 北大与谷歌联合开源的AI学术插图自动生成框架

PaperBanana - 北大与谷歌联合开源的AI学术插图自动生成框架

PaperBanana是北大与谷歌团队联合开源的AI学术插图自动生成框架，专门解决科研人员绘制方法示意图和统计图表的痛点。框架通过五个智能体协作（检索、规划、造型、渲染和批评），实现从文本描述到Neu...

Latest AI Resources

6mos ago

053K

SongBloom - 腾讯联合港中文、南大开源的歌曲生成模型

SongBloom - Tencent's open source song generation model with HKCNU and NTU.

SongBloom is an open source song generation model developed by Tencent AI Lab in collaboration with The Chinese University of Hong Kong (Shenzhen) and Nanjing University, which solves the problem of "plasticity" in AI music generation, and realizes high-quality, structurally complete song generation. Simply enter 10 seconds of reference audio and corresponding lyrics, and you can...

Latest AI Resources

10mos ago

053K

UniPixel - 香港理工、腾讯、中科院等开源的像素级多模态模型

UniPixel - Pixel-level multimodal model open-sourced by Hong Kong Polytechnic, Tencent, Chinese Academy of Sciences and others

UniPixel is a novel multimodal model jointly proposed by Hong Kong Polytechnic University, Tencent, Chinese Academy of Sciences and Vivo to achieve pixel-level visual language understanding. By unifying object referencing and segmentation capabilities, it supports a variety of fine-grained tasks such as image segmentation, video segmentation, region understanding, and pi...

Latest AI Resources

10mos ago

052.9K

Mini-o3 - 字节、港大联合开源的视觉推理模型

Mini-o3 - Bytes, HKU Joint Open Source Visual Reasoning Model

Mini-o3 is an open source model jointly launched by ByteDance and the University of Hong Kong, focusing on solving complex visual search problems. The model has a powerful multi-round interactive reasoning capability, and can locate the target through deep exploration and trial-and-error.

Latest AI Resources

11mos ago

052.9K

Hunyuan-MT-7B - 腾讯混元开源的轻量级翻译模型

Hunyuan-MT-7B - Tencent Mixed Meta Open Source Lightweight Translation Model

Hunyuan-MT-7B is a lightweight translation model introduced by Tencent's Mixed Meta Team, with 7 billion references, supporting the mutual translation of 33 languages and 5 folk-Chinese languages/dialects, including Cantonese, Uyghur, and Tibetan. In the International Association for Computational Linguistics (ACL) WMT2025 competition...

Latest AI Resources

11mos ago

052.8K

Chroma 1.0 - FlashLabs开源的全球首个实时端到端语音对话模型

Chroma 1.0 - FlashLabs开源的全球首个实时端到端语音对话模型

Chroma 1.0是FlashLabs发布的全球首个开源的实时端到端语音对话模型，兼具低延迟交互、高保真个性化语音克隆和强对话能力。通过紧密耦合语音理解与生成，采用1:2文本-音频token调度策略...

Latest AI Resources

6mos ago

052.7K

rStar2-Agent - 微软开源的高效AI推理模型

rStar2-Agent - Microsoft's Open Source Efficient AI Reasoning Model

rStar2-Agent is an advanced AI mathematical reasoning model open-sourced by Microsoft that demonstrates strong mathematical problem solving capabilities by achieving an accuracy of 80.61 TP3T in the AIME24 test. The model is equipped with scientific reasoning capabilities, achieving in the GPQA-Diamond benchmark...

Latest AI Resources

11mos ago

052.6K

DeepSearchQA - 谷歌开源的AI研究Agent测试基准

DeepSearchQA - Google's Open Source AI Research Agent Testing Benchmarks

DeepSearchQA is Google's open-source AI research Agent test benchmark, specifically designed to evaluate the performance of intelligences on complex multi-step query tasks. It consists of 900 hand-designed "causal chain" tasks covering 17 domains, requiring the AI to act like a human researcher and push through multi-step...

Latest AI Resources

8mos ago

052.6K

MAI-UI - 阿里通义实验室开源的通用GUI智能体基座模型

MAI-UI - Ali Tongyi Labs Open Source Universal GUI Intelligent Body Base Model

MAI-UI is an open source generalized GUI intelligent body base model from Alibaba Tongyi Labs, with four major capabilities: cross-application operation, fuzzy semantic understanding, active user interaction and multi-step process coordination. Adopting end-cloud collaboration architecture, the lightweight model resides in the device to handle daily tasks, and complex tasks can call the cloud big...

Latest AI Resources

7mos ago

052.6K

Lynx - 字节跳动开源的高保真视频生成模型

Lynx - ByteHop's open source high-fidelity video generation model

Lynx is a high-fidelity personalized video generation model open-sourced by ByteDance that can generate identity-consistent videos with only a single portrait photo. Built on the diffusion Transformer (DiT) base model , the introduction of ID-adapter and Ref-adapte...

Latest AI Resources

10mos ago

052.5K

问小白o4 - 问小白推出的并行思考模型，同时开启8条思考路径

Ask Whitey o4 - A parallel thinking model introduced by Ask Whitey that opens 8 thinking paths at the same time

Ask White o4 is an innovative parallel thinking model that opens 8 thinking paths at the same time, analyzes the problem from multiple perspectives and automatically filters out the optimal solution. The model incorporates advanced Long-CoT reinforcement learning and process reward learning techniques, has powerful deep reasoning capabilities, and performs well in complex tasks.

Latest AI Resources

11mos ago

052.4K

美间：在线软装（家装）设计工具，快速生成设计方案，软装辅助AI工具箱

Meiman: online soft furnishing (home furnishing) design tools, rapid generation of design plans, soft furnishing auxiliary AI toolkit

Comprehensive Introduction Meiman is an online platform specializing in home design and marketing negotiation. The site provides a wealth of design materials, soft furnishings and proposal PPT templates, poster templates, etc. to help designers and homeowners quickly generate high-quality design solutions. Meiman's online soft furnishing design tool can be used in as little as 10 seconds...

Latest AI Resources # AI image editing # AI Generated Presentation/PPT

1yrs ago

052.2K

Qwen3-VL-Reranker - 阿里巴巴推出的多模态重排序模型

Qwen3-VL-Reranker - 阿里巴巴推出的多模态重排序模型

Qwen3-VL-Reranker是阿里巴巴推出的多模态重排序模型，专门用于提升跨模态检索的精准度。与Qwen3-VL-Embedding协同工作：前者负责快速召回候选结果，后者通过深度跨模态交互（如...

Latest AI Resources

7mos ago

052.2K

GELab-Zero - 阶跃团队开源的端侧多模态GUI Agent模型

GELab-Zero - Open source end-side multimodal GUI Agent model by Steps team

GELab-Zero is an open source end-side multimodal GUI Agent model by Step Leap Team , built on Qwen3-VL-4B-Instruct base model with 4B parameters.It can recognize UI elements and perform operations such as clicking and sliding, and supports cross-application tasking ...

Latest AI Resources

8mos ago

052.2K

DiaMoE-TTS - 清华联合巨人网络开源的多方言语音合成框架

DiaMoE-TTS - Tsinghua and Giant Networks open source multi-dialect speech synthesis framework

DiaMoE-TTS is a multi-dialect speech synthesis framework jointly open-sourced by Tsinghua University and Giant Network, based on the International Phonetic Alphabet (IPA), to solve the problems of dialect data scarcity, orthographic inconsistency, and complex phonological changes. Through a unified IPA front-end standardized phoneme representation to eliminate cross-dialect differences ...

Latest AI Resources

10mos ago

052.2K

Zen Browser - 基于Firefox内核的开源AI网页浏览器

Zen Browser - Open source AI web browser based on Firefox kernel

Zen Browser is an open source browser based on the Firefox kernel, focusing on a simple and efficient browsing experience, with the core features of vertical tab bar and workspace isolation. With the sidebar design, it can clearly display the full titles of 50+ tabs and supports multi-window split-screen browsing.

Latest AI Resources

7mos ago

051.9K

DeepSeek-V3.2-Exp - DeepSeek最新开源的实验性AI模型

DeepSeek-V3.2-Exp - DeepSeek's latest open source experimental AI model

DeepSeek-V3.2-Exp is a DeepSeek open source experimental AI model that significantly improves the efficiency of long text processing by introducing the DeepSeek Sparse Attention (DSA) mechanism. The model is based on DeepSeek...

Latest AI Resources

10mos ago

051.9K

MedASR - 谷歌开源的医疗语音识别模型

MedASR - Google's open source medical speech recognition model

MedASR is a 105 million parameter medical speech recognition model open-sourced by Google, fine-tuned on a 5,000-hour desensitized clinical corpus, optimized for drug, dosage, and anatomical terminology, with a built-in 6-gram medical language model, and a word error rate of only 4.6 on the private radiology dataset RAD-DICT...

Latest AI Resources

7mos ago

051.7K

openPangu-VL-7B - 华为开源的7B参数多模态模型

openPangu-VL-7B - 华为开源的7B参数多模态模型

openPangu-VL-7B是华为开源的7B参数规模的多模态模型，专为昇腾端侧设备优化设计。模型在视觉定位、OCR识别、文档理解等任务中表现出色，支持实时推理（5FPS），单卡延迟仅160毫秒。

Latest AI Resources

7mos ago

051.6K

Ming-UniAudio - 蚂蚁开源的统一音频多模态生成模型

Ming-UniAudio - Ant open source unified audio multimodal generation model

Ming-UniAudio is Ant Group's open source unified audio multimodal generation model that supports mixed input and output of text, audio, image and video. Using multi-scale Transformer and hybrid expert (MoE) architecture , through modality-aware routing mechanism to efficiently handle cross-modal ...

Latest AI Resources

10mos ago

051.6K

Ling-V2 - 蚂蚁百灵开源的MoE架构语言模型系列

Ling-V2 - The MoE Architecture Language Model Series of Ant Centurion Open Source

Ling-V2 is a family of large-scale language models based on the MoE architecture introduced by the Ant-Belling team. The first version, Ling-mini-2.0, has 16 billion total parameters, with only 1.4 billion parameters activated per input token.

Latest AI Resources

10mos ago

051.4K

GLM-4.7-Flash - 智谱开源的混合专家架构语言模型

GLM-4.7-Flash - 智谱开源的混合专家架构语言模型

GLM-4.7-Flash是智谱开源的混合专家架构语言模型，参数规模为30B，激活参数量3B，上下文窗口达200K，最大输出令牌为128K。在编程能力上表现出色，SWE-bench验证集分数达59.2...

Latest AI Resources

6mos ago

051.3K

MOVA - 创智学院联合模思智能开源的端到端音视频生成模型

MOVA - 创智学院联合模思智能开源的端到端音视频生成模型

MOVA（MOSS-Video-and-Audio）是上海创智学院 OpenMOSS 团队联合模思智能（MOSI）开源的端到端音视频生成模型，是中国首个高性能开源音视频模型。突破了传统"先画面后配音...

Latest AI Resources

6mos ago

051.1K

GPT-5-Codex - OpenAI推出的最强编程模型

GPT-5-Codex - The Most Powerful Programming Model Introduced by OpenAI

GPT-5-Codex is a powerful programming optimization model from OpenAI, further enhanced by GPT-5 and designed for software engineers. The model generates high-quality code quickly, supports multiple programming languages, and optimizes existing code to improve performance.

Latest AI Resources

11mos ago

051.1K

FunctionGemma - 谷歌开源专为函数调用优化的轻量级AI模型

FunctionGemma - Google open source lightweight AI model optimized for function calls

FunctionGemma is a lightweight AI model optimized for function calls launched by Google, developed based on the Gemma 3 base model with 270 million parameters, which can convert natural language into executable API instructions in real time on cell phones, browsers and other devices. The core feature is support for local off...

Latest AI Resources

7mos ago

050.9K

Yume1.5 - 上海AI Lab联合复旦大学开源的交互式世界生成模型

Yume1.5 - An Interactive World Generation Model Open-Sourced by Shanghai AI Lab and Fudan University

Yume 1.5 is an open source interactive world generation model, jointly developed by Shanghai Artificial Intelligence Laboratory, Fudan University, and Shanghai Innovation Research Institute, which is capable of real-time interactive rendering (12 FPS on a single card). It adopts the joint spatio-temporal channel modeling (TSCM) technology, even if the context length increases...

Latest AI Resources

7mos ago

050.9K

AgentCPM-Report - 清华联合面壁智能等开源的深度调研智能体工具

AgentCPM-Report - 清华联合面壁智能等开源的深度调研智能体工具

AgentCPM-Report 是清华大学自然语言处理实验室、中国人民大学、面壁智能与 OpenBMB 开源社区联合研发的深度调研智能体工具。基于 8 亿参数的模型，通过深度检索和推理，能生成万字长篇...

Latest AI Resources

6mos ago

050.9K

Step3-VL-10B - 阶跃星辰开源的100亿参数多模态AI模型

Step3-VL-10B - 阶跃星辰开源的100亿参数多模态AI模型

Step3-VL-10B是阶跃星辰团队开源的100亿参数多模态AI模型，核心突破在于以轻量化设计实现顶级性能。模型通过统一预训练策略（1.2T多模态令牌数据）和创新的并行协同推理技术（PACORE...

Latest AI Resources

6mos ago

050.9K

聆音EchoCare - 香港科学院开源的超声基座大模型

EchoCare - Hong Kong Academy of Sciences open source ultrasound base large model

EchoCare is a large model of ultrasound base developed by the Center for Artificial Intelligence and Robotics Innovation (CAIR) at the Hong Kong Institute of Innovation and Research of the Chinese Academy of Sciences (CAS), trained based on the world's largest ultrasound image dataset (more than 4.5 million images), covering multi-center, multi-region, multi-ethnicity, and more than 50 individuals...

Latest AI Resources

10mos ago

050.7K

TurboDiffusion - 生数科技联合清华等开源的视频生成加速框架

TurboDiffusion - Raw Digital Technology, Tsinghua and other open source video generation acceleration framework

TurboDiffusion is a video generation acceleration framework jointly open-sourced by Tsinghua University, BioDigital Technology, and UC Berkeley, which is able to improve video generation speed by 100-200 times while maintaining nearly lossless picture quality. Through sparse linear attention, sample step distillation and 8-bit...

Latest AI Resources

7mos ago

050.5K

EverMemOS - 盛大团队推出的开源长期记忆操作系统

EverMemOS - Open Source Long-Term Memory Operating System by Team Shanda

EverMemOS is an open source long-term memory operating system launched by the Shanda team led by Chen Tianqiao, designed for AI intelligences to solve the problem of memory breakage caused by the fixed context window of large language models. The system is based on the human brain memory mechanism, using a four-layer architecture (agent layer, memory layer, index layer...

Latest AI Resources

9mos ago

050.4K

GLM-TTS - 智谱AI推出的开源工业级语音合成系统

GLM-TTS - Open Source Industrial Grade Speech Synthesis System by Smart Spectrum AI

GLM-TTS is an open source industrial-grade speech synthesis system with powerful speech synthesis capabilities. Adopting a two-stage generation architecture: the first stage will be converted to text into speech token sequences, and the second stage will be converted into high-quality audio token sequences. The system supports only 3 seconds of voice samples to complete the sound...

Latest AI Resources

8mos ago

050.4K

Granite-Docling-258M - IBM开源的视觉语言模型

Granite-Docling-258M - IBM Open Source Visual Language Modeling

Granite-Docling-258M is an ultra-compact open source visual language model from IBM designed for efficient document conversion. The model converts documents into machine-readable formats while leaving layout, tables, formulas, and other elements intact.

Latest AI Resources

10mos ago

050.3K

LingBot-Depth - 蚂蚁灵波科技开源的高精度空间感知模型

LingBot-Depth - 蚂蚁灵波科技开源的高精度空间感知模型

LingBot-Depth是蚂蚁灵波科技开源的高精度空间感知模型，专门解决机器人在透明玻璃、反光物体等复杂场景中的深度识别难题。模型通过创新的"掩码深度建模"技术，在RGB图像基础上预测缺失的深度值

Latest AI Resources

6mos ago

050.3K

T5Gemma 2 - 谷歌开源的新一代编码器-解码器模型

T5Gemma 2 - Google's open source next generation encoder-decoder model

T5Gemma 2 is a new generation encoder-decoder model open-sourced by Google, based on the Gemma 3 architecture upgraded with multimodal and long context processing capabilities. It supports a wide range of data types, including text and images, and is capable of handling very long contexts (up to 128K) in generating...

Latest AI Resources

7mos ago

050.3K

RealVideo - 智谱 AI 开源的实时流式视频生成系统

RealVideo - Wisdom Spectrum AI's open source real-time streaming video generation system

RealVideo is an open-source real-time streaming video generation system from Smart Spectrum AI that can quickly generate natural and smooth video responses in 2 to 3 seconds. Users only need to upload a photo and enter text, and the system can generate corresponding voice and video, realizing real-time conversations with AI characters...

Latest AI Resources

8mos ago

050.3K

LazyCraft - 开源AI Agent应用开发与管理平台，基于LazyLLM构建

LazyCraft - Open Source AI Agent Application Development and Management Platform, built on LazyLLM

LazyCraft is an open source AI Agent application development and management platform built by Shangtang based on the open source framework LazyLLM, which provides one-stop AI application development solutions for enterprises and developers. It helps developers to quickly build and release large model applications with low threshold and low cost...

Latest AI Resources

9mos ago

050.2K

PromptFill - 开源的结构化提示词生成AI工具，专为AI绘画设计

PromptFill - Open Source Structured Prompt Word Generation AI Tool Designed for AI Drawing

PromptFill is a structured cue generation tool designed specifically for AI painting, which helps users quickly build, manage and iterate complex prompts through visual "fill-in-the-blank" interactions, improving the efficiency and quality of AI image generation.PromptFill's core features...

Latest AI Resources

7mos ago

050.1K

Youtu-LLM - 腾讯 Youtu 团队开源的轻量级语言模型

Youtu-LLM - 腾讯 Youtu 团队开源的轻量级语言模型

Youtu-LLM 是腾讯 Youtu 团队开源的轻量级语言模型，参数规模为 19.6 亿。专为智能体任务设计，具备强大的“原生智能体能力”，在多项任务中超越同规模甚至更大模型。

Latest AI Resources

7mos ago

050K

Depth Anything 3 - 字节跳动Seed开源的3D视觉重建模型

Depth Anything 3 - 3D Visual Reconstruction Models for ByteHop Seed Open Source

Depth Anything 3 (DA3) is a 3D visual reconstruction model developed and open-sourced by the Byte Jump Seed team. Through a single Transformer architecture to realize the spatial geometry of any viewpoint reconstruction, only need to predict the depth map and ray map can restore the three-dimensional scene, compared to...

Latest AI Resources

8mos ago

050K

VoiceSculptor - 西北工业大学联合语图智能开源的音色设计模型

VoiceSculptor - 西北工业大学联合语图智能开源的音色设计模型

VoiceSculptor 是西北工业大学联合多家机构开源的音色设计模型，基于 LLaSA-3B 和 CosyVoice2 开发，专注于通过自然语言指令生成多样化音色的语音合成。支持对语速、音量、基频...

Latest AI Resources

7mos ago

050K