Latest AI Resources

Total 3143 articles posts

Course materials Latest AI Resources AI Knowledge Base AI News

Sorting

PaperBanana - 北大与谷歌联合开源的AI学术插图自动生成框架

PaperBanana - 北大与谷歌联合开源的AI学术插图自动生成框架

PaperBanana是北大与谷歌团队联合开源的AI学术插图自动生成框架，专门解决科研人员绘制方法示意图和统计图表的痛点。框架通过五个智能体协作（检索、规划、造型、渲染和批评），实现从文本描述到Neu...

Latest AI Resources

6mos ago

052.7K

Intern-S1-Pro - 上海AI Lab开源的首个万亿参数科学多模态大模型

Intern-S1-Pro - 上海AI Lab开源的首个万亿参数科学多模态大模型

Intern-S1-Pro是上海人工智能实验室开源的全球首个万亿参数级科学多模态大模型。采用512专家MoE架构，激活仅8专家22B参数，兼顾性能与效率。模型基于SAGE架构，引入傅里叶位置编码，统一...

Latest AI Resources

6mos ago

047.6K

LingBot-VA - 蚂蚁灵波开源的首个“自回归视频-动作世界模型”

LingBot-VA - 蚂蚁灵波开源的首个“自回归视频-动作世界模型”

LingBot-VA 是蚂蚁灵波开源的全球首个“自回归视频-动作世界模型”，把视频生成与机器人控制塞进同一 Transformer，每一步同时输出下一帧世界画面和对应动作，实现“边想边干”。

Latest AI Resources

6mos ago

053.9K

MiniCPM-o 4.5 - 面壁智能开源的 9B 全模态旗舰模型

MiniCPM-o 4.5 - 面壁智能开源的 9B 全模态旗舰模型

MiniCPM-o 4.5 是面壁智能开源的 9B 全模态旗舰模型，以“边看边听主动说”的端到端架构，在手机端即可跑出 GPT-4o 级体验：支持单图、多图、高帧率长视频、实时语音双工对话，首 tok...

Latest AI Resources

6mos ago

055.2K

SoulX-FlashTalk - Soul App AI团队开源的实时数字人生成模型

SoulX-FlashTalk - Soul App AI团队开源的实时数字人生成模型

SoulX-FlashTalk是Soul App AI团队开源的实时数字人生成模型，拥有140亿参数量，实现了0.87秒超低延迟和32帧/秒的高帧率。模型通过双向蒸馏技术解决了传统数字人延迟高、画面易...

Latest AI Resources

6mos ago

054.6K

Qwen3-Coder-Next - 阿里通义千问开源的编程智能体混合模型

Qwen3-Coder-Next - 阿里通义千问开源的编程智能体混合模型

Qwen3-Coder-Next是阿里巴巴通义千问团队开源的专为编程智能体设计的高效混合模型，基于80B总参数的Qwen3-Next架构，推理时仅激活3B参数。核心创新在于采用环境交互和强化学习训练方...

Latest AI Resources

6mos ago

057.6K

GLM-OCR - 智谱开源的 0.9B 轻量级专业 OCR 模型

GLM-OCR - 智谱开源的 0.9B 轻量级专业 OCR 模型

GLM-OCR 是智谱开源的 0.9B 轻量级专业 OCR 模型，在 OmniDocBench V1.5 以 94.6 分刷新 SOTA。兼顾“小体积”与“全场景”，扫描、手写、印章、多语混排、复杂表...

Latest AI Resources

6mos ago

055.7K

Step 3.5 Flash - 阶跃星辰开源的 1960 亿稀疏 MoE 模型

Step 3.5 Flash - 阶跃星辰开源的 1960 亿稀疏 MoE 模型

Step 3.5 Flash 是阶跃星辰开源的 1960 亿稀疏 MoE 模型，每 token 仅激活 110 亿参数，能在代码任务跑出 350 token/s 的实时速度。基于自研 MTP-3 多 ...

Latest AI Resources

6mos ago

045.4K

UnifoLM-VLA-0 - 宇树科技开源的首款操作型大模型

UnifoLM-VLA-0 - 宇树科技开源的首款操作型大模型

UnifoLM-VLA-0 是宇树科技 UnifoLM 系列的首款操作型大模型，突破传统视觉语言模型（VLM）仅能理解图像文字的局限，通过在机器人操作数据上的持续预训练，实现从"图文理解"向具备物理常...

Latest AI Resources

6mos ago

048.8K

SenseNova-MARS - 商汤科技开源的多模态搜索推理Agent语言模型

SenseNova-MARS - 商汤科技开源的多模态搜索推理Agent语言模型

SenseNova-MARS 是商汤开源的首个支持动态视觉推理与图文搜索深度融合的智能体视觉语言模型（Agentic VLM），提供 8B 和 32B 双版本。模型能自主规划任务步骤、调用多种工具（如...

Latest AI Resources

6mos ago

046.1K

MOVA - 创智学院联合模思智能开源的端到端音视频生成模型

MOVA - 创智学院联合模思智能开源的端到端音视频生成模型

MOVA（MOSS-Video-and-Audio）是上海创智学院 OpenMOSS 团队联合模思智能（MOSI）开源的端到端音视频生成模型，是中国首个高性能开源音视频模型。突破了传统"先画面后配音...

Latest AI Resources

6mos ago

050.7K

LingBot-World - 蚂蚁旗下灵波科技开源的交互式世界模型

LingBot-World - 蚂蚁旗下灵波科技开源的交互式世界模型

LingBot-World 是蚂蚁集团旗下具身智能公司灵波科技（Robbyant）开源的交互式世界模型，专为具身智能、自动驾驶及游戏开发打造高保真“数字演练场”。模型通过可扩展数据引擎从大规模游戏环境...

Latest AI Resources

6mos ago

054K

SkyReels-V3 - 昆仑万维Skywork AI开源的多模态视频生成模型

SkyReels-V3 - 昆仑万维Skywork AI开源的多模态视频生成模型

SkyReels-V3是昆仑万维Skywork AI开源的多模态视频生成模型，被誉为视频生成领域的"全能型"标杆。模型基于"一核多支"的统一架构，在单一建模框架内集成三大核心能力：参考图像转视频、智能...

Latest AI Resources

6mos ago

053.4K

LingBot-Depth - 蚂蚁灵波科技开源的高精度空间感知模型

LingBot-Depth - 蚂蚁灵波科技开源的高精度空间感知模型

LingBot-Depth是蚂蚁灵波科技开源的高精度空间感知模型，专门解决机器人在透明玻璃、反光物体等复杂场景中的深度识别难题。模型通过创新的"掩码深度建模"技术，在RGB图像基础上预测缺失的深度值

Latest AI Resources

6mos ago

050K

DeepSeek-OCR 2 - DeepSeek团队开源的新一代OCR模型

DeepSeek-OCR 2 - DeepSeek团队开源的新一代OCR模型

DeepSeek-OCR 2是DeepSeek团队开源的新一代OCR模型，核心创新在于采用DeepEncoder V2架构，将传统固定栅格扫描的视觉编码方式升级为基于语义推理的动态处理。模型通过因果流...

Latest AI Resources

6mos ago

060.7K

Kimi K2.5 - 月之暗面开源的新一代旗舰模型

Kimi K2.5 - 月之暗面开源的新一代旗舰模型

Kimi K2.5 是月之暗面发布的开源旗舰模型，采用 1T MoE 架构、激活 32B、上下文 256K token，原生支持图文视频多模态输入。在 Agent、代码、视觉理解三大基准均列开源第一...

Latest AI Resources

6mos ago

054.2K

Moltbot - 开源的本地优先AI助手，支持多渠道与用户交互

Moltbot - 开源的本地优先AI助手，支持多渠道与用户交互

Moltbot（原名Clawdbot）是奥地利开发者 Peter Steinberger 开源的“本地优先”AI 助手，WhatsApp、Telegram、Discord、Slack、iMessage...

Latest AI Resources

6mos ago

062.4K

json-render - Vercel Labs开源的AI生成UI的工具

json-render - Vercel Labs开源的AI生成UI的工具

json-render是Vercel Labs开源的AI生成UI的工具，通过“AI → JSON → UI”的流程实现结构化、可控的界面生成。要求AI仅输出符合预定义Schema的JSON数据，前端再...

Latest AI Resources

6mos ago

060.9K

FlowAct-R1 - 字节跳动开源的实时交互数字人视频生成框架

FlowAct-R1 - 字节跳动开源的实时交互数字人视频生成框架

FlowAct-R1是字节跳动开源的实时交互数字人视频生成框架，能通过单张参考图和音频流式生成无限时长的高保真全身动态视频。核心创新在于分块流式生成技术，将视频拆解为0.5秒一小段接力处理，配合结构化...

Latest AI Resources

6mos ago

060.1K

VibeVoice-ASR - 微软开源的统一语音转文本（ASR）模型

VibeVoice-ASR - 微软开源的统一语音转文本（ASR）模型

VibeVoice-ASR是微软开源的统一语音转文本（ASR）模型，专为处理长音频设计，可一次性处理长达60分钟的连续音频，确保语义连贯性和说话人追踪的一致性。支持自定义热词功能，用户可输入特定词汇或...

Latest AI Resources

6mos ago

055.7K

Chroma 1.0 - FlashLabs开源的全球首个实时端到端语音对话模型

Chroma 1.0 - FlashLabs开源的全球首个实时端到端语音对话模型

Chroma 1.0是FlashLabs发布的全球首个开源的实时端到端语音对话模型，兼具低延迟交互、高保真个性化语音克隆和强对话能力。通过紧密耦合语音理解与生成，采用1:2文本-音频token调度策略...

Latest AI Resources

6mos ago

052.3K

AgentCPM-Report - 清华联合面壁智能等开源的深度调研智能体工具

AgentCPM-Report - 清华联合面壁智能等开源的深度调研智能体工具

AgentCPM-Report 是清华大学自然语言处理实验室、中国人民大学、面壁智能与 OpenBMB 开源社区联合研发的深度调研智能体工具。基于 8 亿参数的模型，通过深度检索和推理，能生成万字长篇...

Latest AI Resources

6mos ago

050.4K

EmbodiChain - 跨维智能推出的开源具身智能开发平台

EmbodiChain - 跨维智能推出的开源具身智能开发平台

EmbodiChain是跨维智能推出的开源具身智能开发平台，专注于解决具身智能模型训练中数据稀缺的问题。通过数据引擎实现大规模场景相关数据生成、Real2Sim 数据轨迹映射和多模态数据扩增，从根本上...

Latest AI Resources

6mos ago

061.7K

Step3-VL-10B - 阶跃星辰开源的100亿参数多模态AI模型

Step3-VL-10B - 阶跃星辰开源的100亿参数多模态AI模型

Step3-VL-10B是阶跃星辰团队开源的100亿参数多模态AI模型，核心突破在于以轻量化设计实现顶级性能。模型通过统一预训练策略（1.2T多模态令牌数据）和创新的并行协同推理技术（PACORE...

Latest AI Resources

6mos ago

050.3K

PersonaPlex - 英伟达开源的全双工语音对话模型

PersonaPlex - 英伟达开源的全双工语音对话模型

PersonaPlex是英伟达开源的全双工语音对话模型，拥有70亿参数。摒弃了传统的语音识别→语言模型→文本到语音的级联流程，采用统一的Transformer架构，能同步处理语音理解与生成。模型支持全...

Latest AI Resources

6mos ago

056.4K

GLM-4.7-Flash - 智谱开源的混合专家架构语言模型

GLM-4.7-Flash - 智谱开源的混合专家架构语言模型

GLM-4.7-Flash是智谱开源的混合专家架构语言模型，参数规模为30B，激活参数量3B，上下文窗口达200K，最大输出令牌为128K。在编程能力上表现出色，SWE-bench验证集分数达59.2...

Latest AI Resources

6mos ago

050.9K

NovaSR - 开源的音频超分辨率模型，提升音频采样率

NovaSR - 开源的音频超分辨率模型，提升音频采样率

NovaSR是开源的音频超分辨率模型，主要用于将低质量音频（如16kHz采样率的电话音质）提升为高质量音频（如48kHz采样率的录音室级音质）。模型大小仅52KB，比一张微信表情包还小，可轻松部署在资...

Latest AI Resources

6mos ago

049.2K

FLUX.2 [klein] - Black Forest Labs 开源的轻量级图像生成与编辑模型

FLUX.2 [klein] - Black Forest Labs 开源的轻量级图像生成与编辑模型

FLUX.2 [klein] 是 Black Forest Labs 推出的开源轻量级图像生成与编辑模型，专为快速推理和低延迟应用场景设计。支持文本生成图像、图像编辑以及多参考图像生成，能在不到1秒内...

Latest AI Resources

6mos ago

055.5K

TranslateGemma - 谷歌开源的机器翻译模型系列

TranslateGemma - 谷歌开源的机器翻译模型系列

TranslateGemma是谷歌推出的基于Gemma 3的开源机器翻译模型系列，专为提升翻译质量而设计。通过两阶段微调（监督微调和强化学习）优化翻译效果，提供4B、12B、27B三种参数规模，支持5...

Latest AI Resources

6mos ago

044.1K

OpenWork - 开源AI Agent工作流桌面应用，Claude Cowork的免费平替

OpenWork - 开源AI Agent工作流桌面应用，Claude Cowork的免费平替

OpenWork是开源的智能代理工作流桌面应用，作为Claude Cowork的免费替代品，提供可视化操作界面和本地化运行能力。项目采用Tauri+Rust+Node.js技术栈，支持技能插件扩展和模...

Latest AI Resources

6mos ago

098.6K

ArenaRL - 高德地图联合阿里通义开源的对比式强化学习方法

ArenaRL - 高德地图联合阿里通义开源的对比式强化学习方法

ArenaRL是高德地图与阿里通义团队联合开源的对比式强化学习方法，专为解决开放域任务（如出行规划）中缺乏标准答案的问题。核心创新在于用“相对排序”替代传统“绝对打分”机制，通过智能体自动生成多套方案...

Latest AI Resources

6mos ago

042.1K

Step-Audio-R1.1 - 阶跃星辰开源的全球首个原生语音推理模型

Step-Audio-R1.1 - 阶跃星辰开源的全球首个原生语音推理模型

Step-Audio-R1.1是阶跃星辰开源的全球首个原生语音推理模型，最新升级版本在权威评测榜单Artificial Analysis Speech Reasoning中以96.4%准确率登顶。模型...

Latest AI Resources

6mos ago

055.6K

OctoCodingBench - MiniMax开源面向Coding Agent标准的评测集

OctoCodingBench - MiniMax开源面向Coding Agent标准的评测集

OctoCodingBench是MiniMax开源的首个面向Coding Agent生产级标准的评测集，核心创新在于通过Check-level准确率(CSR)和Instance-level成功率(IS...

Latest AI Resources

6mos ago

043.1K

GLM-Image - 智谱联合华为开源的多模态图像生成模型

GLM-Image - 智谱联合华为开源的多模态图像生成模型

GLM-Image是智谱与华为联合开源的多模态图像生成模型，基于昇腾Atlas 800T A2芯片和昇思MindSpore框架训练，采用创新的"自回归+扩散解码器"混合架构。核心突破在于实现了国产芯片...

Latest AI Resources

7mos ago

048.9K

Baichuan-M3 - 百川智能开源的新一代医疗大语言模型

Baichuan-M3 - 百川智能开源的新一代医疗大语言模型

Baichuan-M3是百川智能推出的新一代开源医疗大语言模型，专为医疗场景深度优化，具备强大的医疗推理和问诊能力。在权威的HealthBench评测中以65.1分的综合成绩位列全球第一，超越了GPT...

Latest AI Resources

7mos ago

046.8K

女娲智能体OS - 西南财经开源的通用智能体操作系统

女娲智能体OS - 西南财经开源的通用智能体操作系统

女娲智能体OS（Nuwax Agent OS）是西南财经大学赵宇教授团队推出的全球首个开源通用智能体操作系统。具备自主执行引擎，可实现从需求拆解到任务规划与执行的全链路自动化。系统支持可视化工作流编排...

Latest AI Resources

7mos ago

069K

Nemotron Speech ASR - 英伟达开源的实时语音识别模型

Nemotron Speech ASR - 英伟达开源的实时语音识别模型

Nemotron Speech ASR是英伟达开源的实时语音识别模型，专为低延迟场景优化，支持24毫秒极速转录和多人并发对话。核心采用混合Mamba-Transformer MoE架构，通过固定状态缓...

Latest AI Resources

7mos ago

048.9K

Qwen3-VL-Reranker - 阿里巴巴推出的多模态重排序模型

Qwen3-VL-Reranker - 阿里巴巴推出的多模态重排序模型

Qwen3-VL-Reranker是阿里巴巴推出的多模态重排序模型，专门用于提升跨模态检索的精准度。与Qwen3-VL-Embedding协同工作：前者负责快速召回候选结果，后者通过深度跨模态交互（如...

Latest AI Resources

7mos ago

052K

Qwen3-VL-Embedding - 阿里通义团队开源的多模态嵌入模型

Qwen3-VL-Embedding - 阿里通义团队开源的多模态嵌入模型

Qwen3-VL-Embedding是阿里通义团队开源的多模态嵌入模型，属于Qwen3-VL系列，主要用于跨模态检索任务。模型将文本、图像、视频等不同模态数据映射到同一语义空间，通过双塔架构生成向量表...

Latest AI Resources

7mos ago

056.5K

AntAngelMed - 蚂蚁联合浙江省卫生健康信息中心开源的医疗大模型

AntAngelMed - 蚂蚁联合浙江省卫生健康信息中心开源的医疗大模型

AntAngelMed（蚂蚁·安诊儿医疗大模型）是浙江省卫生健康信息中心、蚂蚁健康、浙江省安诊儿医学人工智能科技有限公司联合开发的开源医疗大模型。模型采用混合专家架构（MoE），总参数量达1000亿...

Latest AI Resources

7mos ago

059.5K

VoiceSculptor - 西北工业大学联合语图智能开源的音色设计模型

VoiceSculptor - 西北工业大学联合语图智能开源的音色设计模型

VoiceSculptor 是西北工业大学联合多家机构开源的音色设计模型，基于 LLaSA-3B 和 CosyVoice2 开发，专注于通过自然语言指令生成多样化音色的语音合成。支持对语速、音量、基频...

Latest AI Resources

7mos ago

049.6K

10Kh RealOmni-Open - 简智机器人开源的具身智能数据集

10Kh RealOmni-Open - 简智机器人开源的具身智能数据集

10Kh RealOmni-Open是简智机器人开源的具身智能数据集，是行业内规模最大的开源具身智能数据集。数据集累计拥有超10000小时数据、100万+片段，覆盖10大场景任务、超过30项技能。数据...

Latest AI Resources

7mos ago

055.8K

Youtu-LLM - 腾讯 Youtu 团队开源的轻量级语言模型

Youtu-LLM - 腾讯 Youtu 团队开源的轻量级语言模型

Youtu-LLM 是腾讯 Youtu 团队开源的轻量级语言模型，参数规模为 19.6 亿。专为智能体任务设计，具备强大的“原生智能体能力”，在多项任务中超越同规模甚至更大模型。

Latest AI Resources

7mos ago

049.6K

Genie Sim 3.0 - 智元机器人开源首个大语言模型驱动的仿真平台

Genie Sim 3.0 - 智元机器人开源首个大语言模型驱动的仿真平台

Genie Sim 3.0是智元机器人发布的首个大语言模型驱动的开源仿真平台。基于NVIDIA Isaac Sim构建，融合三维重建、视觉生成技术与物理引擎，实现毫米级精准复刻真实环境，通过自然语言指...

Latest AI Resources

7mos ago

045.1K

LandPPT - 开源免费的AI PPT生成工具，支持本地部署和云端协作

LandPPT - 开源免费的AI PPT生成工具，支持本地部署和云端协作

LandPPT是基于大语言模型的开源AI PPT生成工具，支持通过主题或上传文档（PDF/Word/Excel）一键生成专业演示文稿。集成了多模型驱动、实时联网搜索和AI绘图功能，提供丰富的模板和场景...

Latest AI Resources

7mos ago

067.2K

TuriX-CUA - 开源AI桌面自动化工具，AI直接操作电脑桌面

TuriX-CUA - 开源AI桌面自动化工具，AI直接操作电脑桌面

TuriX-CUA 是开源的 AI 桌面自动化工具，能通过截屏、多模态模型决策和自动化操作实现电脑交互。让 AI 模型直接操作电脑桌面环境。支持 macOS 和 Windows 系统，通过先进的计算机...

Latest AI Resources

7mos ago

063.6K

MiroThinker 1.5 - MiroMind 团队开源的搜索智能体模型

MiroThinker 1.5 - MiroMind 团队开源的搜索智能体模型

MiroThinker 1.5 是 MiroMind 团队开源的搜索智能体模型，基于 Qwen3 系列开发，包含 30B 和 235B 两种参数规模版本。模型采用交互式扩展技术，支持 256K 上下文...

Latest AI Resources

7mos ago

071.6K

UltraEval-Audio - 清华、OpenBMB联合面壁智能开源的音频模型评测框架

UltraEval-Audio - 清华、OpenBMB联合面壁智能开源的音频模型评测框架

UltraEval-Audio是清华大学NLP实验室、OpenBMB和面壁智能联合开发的音频模型评测框架，最新版本为v1.1.0。专注于解决音频模型复现难、依赖冲突等问题，提供一键复现热门模型（如Vo...

Latest AI Resources

7mos ago

053K

openPangu-VL-7B - 华为开源的7B参数多模态模型

openPangu-VL-7B - 华为开源的7B参数多模态模型

openPangu-VL-7B是华为开源的7B参数规模的多模态模型，专为昇腾端侧设备优化设计。模型在视觉定位、OCR识别、文档理解等任务中表现出色，支持实时推理（5FPS），单卡延迟仅160毫秒。

Latest AI Resources

7mos ago

051.4K

New API - 开源的AI模型接口管理与分发系统，统一为标准化接口

New API - 开源的AI模型接口管理与分发系统，统一为标准化接口

New API是基于Go语言开发的开源AI聚合网关工具，可统一管理30+种主流大模型（如OpenAI、Claude、Midjourney等），将不同模型接口转换为标准化OpenAI格式。

Latest AI Resources

7mos ago

054.7K

Paper2Any - 北大DCAI团队开源的AI科研与演示文稿生成平台

Paper2Any - 北大DCAI团队开源的AI科研与演示文稿生成平台

Paper2Any是北京大学DCAI课题组开源的多模态辅助平台，专注于从论文PDF、图片和文本中快速生成多种科研内容。具备一键生成科研绘图的功能，能从多种输入源生成模型架构图、技术路线图和实验数据图等...

Latest AI Resources

7mos ago

063.8K

StoryMem - 字节跳动与南洋理工联合开源的AI视频生成系统

StoryMem - 字节跳动与南洋理工联合开源的AI视频生成系统

StoryMem是字节跳动与南洋理工大学联合开源的AI视频生成系统，专为解决多场景视频中角色和环境一致性问题。核心通过"视觉记忆库"技术，自动存储关键帧并在后续生成时参考，确保人物外貌、服装、场景元素...

Latest AI Resources

7mos ago

049.4K

XVERSE-Ent - 元象科技开源的泛娱乐领域中英大模型

XVERSE-Ent - 元象科技开源的泛娱乐领域中英大模型

XVERSE-Ent是元象科技推出的专注于泛娱乐领域的开源大模型，包含中英文双版本，支持社交互动、游戏叙事和文化创作等场景。模型通过角色一致性强化、长剧情理解等技术优化，能在虚拟角色人设稳定性、复杂故...

Latest AI Resources

7mos ago

054.5K

Vibe Kanban - 开源的免费AI编程代理任务管理工具

Vibe Kanban - 开源的免费AI编程代理任务管理工具

Vibe Kanban是开源的AI编程代理任务管理工具，专为同时使用多个AI编程助手（如Claude Code、Gemini CLI、Codex等）的开发者设计。通过看板形式统一管理任务进度，支持并行...

Latest AI Resources

7mos ago

057.8K

Chatterbox-Turbo - Resemble AI开源的文本到语音模型

Chatterbox-Turbo - Resemble AI开源的文本到语音模型

Chatterbox-Turbo 是 Resemble AI 推出的开源文本到语音（TTS）模型，专为高效、低延迟的语音合成而设计。基于350M参数的精简架构，单步推理生成音频，时间延迟极低，在150...

Latest AI Resources

7mos ago

052.9K

IQuest-Coder-V1 - 至知创新研究院开源的代码大模型系列

IQuest-Coder-V1 - 至知创新研究院开源的代码大模型系列

IQuest-Coder-V1是九坤投资旗下至知创新研究院研发的开源代码大模型系列，专注于代码智能领域，具备自动编程、Bug修复和代码解释等能力。模型采用创新的Code-Flow训练范式，从代码库演化...

Latest AI Resources

7mos ago

058.6K

混元Motion1.0 - 腾讯混元团队开源的文本生成3D动作模型

Mixed Motion 1.0 - Tencent Mixed Motion team open source text to generate 3D action models

Hybrid Motion1.0 (HY-Motion1.0) is Tencent Hybrid team open source text generated 3D action model , using 1 billion parameters Diffusion Transformer architecture , can be directly generated through natural language description of high-quality 3D character animation .

Latest AI Resources

7mos ago

054.4K

Yume1.5 - 上海AI Lab联合复旦大学开源的交互式世界生成模型

Yume1.5 - An Interactive World Generation Model Open-Sourced by Shanghai AI Lab and Fudan University

Yume 1.5 is an open source interactive world generation model, jointly developed by Shanghai Artificial Intelligence Laboratory, Fudan University, and Shanghai Innovation Research Institute, which is capable of real-time interactive rendering (12 FPS on a single card). It adopts the joint spatio-temporal channel modeling (TSCM) technology, even if the context length increases...

Latest AI Resources

7mos ago

050.3K

AutoMV - M-A-P联合北邮、南大等开源的免费音乐视频生成系统

AutoMV - M-A-P open source free music video generation system in conjunction with the North Post, South University, etc.

AutoMV is an open source music video generation system developed by the M-A-P team in collaboration with several universities, which can automatically generate coherent music videos based on complete songs without training.It adopts a multi-intelligence body collaboration model, including music analysis, scriptwriting, directing, and quality control modules, and can accurately analyze the lyrics, beats, and...

Latest AI Resources

7mos ago

055.8K

Tencent-HY-MT1.5 - 腾讯混元开源的翻译模型系列

Tencent-HY-MT1.5 - Tencent hybrid open source translation model series

Tencent-HY-MT1.5 is Tencent hybrid open source translation model version 1.5, including 1.8B and 7B two models, support for 33 international languages and 5 kinds of folk Chinese/dialect translation.1.8B model is specially optimized for cell phones and other consumer-grade devices, only 1GB of RAM can be achieved end-side...

Latest AI Resources

7mos ago

061.4K

PersonaLive - 澳门大学等开源的实时AI人像动画生成直播框架

PersonaLive - The University of Macau and other open source real-time AI portrait animation generation live framework

PersonaLive is an open source real-time AI face-swapping live streaming framework, jointly developed by the University of Macau, dzine.ai, and the GVC Lab at the University of the Greater Bay Area. It can realize low-latency and high frame rate digital person drive on ordinary consumer-grade graphics cards (12GB video memory), and support real-time through the camera...

Latest AI Resources

7mos ago

046.9K

Computer Use Preview - Google开源的AI浏览器自动化工具

Computer Use Preview - Google's open source AI browser automation tool

Computer Use Preview is Google's open source AI browser automation tool based on the Gemini model , through natural language commands to achieve web page interaction . Using "screenshot→analysis→execution" visual recognition process , support Playwrigh...

Latest AI Resources

7mos ago

041K

ClipSketch AI - 开源的AI视频转手绘分镜工具，支持B站、小红书

ClipSketch AI - Open source AI video to hand-drawn split-screen tool, support B station, small red book

ClipSketch AI is open source video to hand-drawn split-screen tool designed for short video creators. It can convert videos from B station, Little Red Book and other platforms into hand-drawn style storyboards with one click, support marking key frames, automatic generation of sub-scenes and social copy, and can integrate user-defined roles.

Latest AI Resources

7mos ago

047.5K

MAI-UI - 阿里通义实验室开源的通用GUI智能体基座模型

MAI-UI - Ali Tongyi Labs Open Source Universal GUI Intelligent Body Base Model

MAI-UI is an open source generalized GUI intelligent body base model from Alibaba Tongyi Labs, with four major capabilities: cross-application operation, fuzzy semantic understanding, active user interaction and multi-step process coordination. Adopting end-cloud collaboration architecture, the lightweight model resides in the device to handle daily tasks, and complex tasks can call the cloud big...

Latest AI Resources

7mos ago

052.4K

MiniMax M2.1 - MiniMax开源的编码和代理模型

MiniMax M2.1 - MiniMax open source coding and agent modeling

MiniMax M2.1 is MiniMax's open source coding and agent model with 10 billion activations and support for many major programming languages such as Rust, Java, Golang, C++, Kotlin, Objective-C, TypeS...

Latest AI Resources

7mos ago

037.8K

InstanceAssemble - 小红书联合复旦大学开源的布局控制生成技术

InstanceAssemble - Little Red Book and Fudan University open source layout control generation technology

InstanceAssemble is a layout control generation technology jointly open-sourced by Little Red Book and Fudan University, which realizes accurate image generation from simple to complex and from sparse to dense layout through the mechanism of "Instance Assemble Attention". Adopting a two-stage cascade architecture, Mr. Mr. into the image background, and then one by one ...

Latest AI Resources

7mos ago

034.8K

Zen Browser - 基于Firefox内核的开源AI网页浏览器

Zen Browser - Open source AI web browser based on Firefox kernel

Zen Browser is an open source browser based on the Firefox kernel, focusing on a simple and efficient browsing experience, with the core features of vertical tab bar and workspace isolation. With the sidebar design, it can clearly display the full titles of 50+ tabs and supports multi-window split-screen browsing.

Latest AI Resources

7mos ago

051.4K

QwenLong-L1.5 - 阿里通义实验室开源的长文本推理模型

QwenLong-L1.5 - Ali Tongyi Labs open source long text inference model

QwenLong-L1.5 is an open source long text inference model from Alibaba Tongyi Lab, focusing on solving complex inference problems with ultra-long contexts (e.g., 1M-4M tokens). The core breakthrough lies in three major innovations in the post-training phase: through knowledge graph, SQL parsing and multi-intelligence...

Latest AI Resources

7mos ago

040K

Infographic - 阿里AntV团队开源的信息图生成框架

Infographic - Ali AntV team open source infographic generation framework

Infographic is a new generation of Ali AntV team open source framework , based on G2 and Ant Design development , focusing on rapid generation of high-quality infographics , providing 30 + layout templates , 120 + preset themes and AI intelligent generation capabilities .

Latest AI Resources

7mos ago

046.9K

opcode - 专为Claude Code设计的开源图形化桌面应用

opcode - open source graphical desktop application designed for Claude Code

opcode is designed for Claude Code open source graphical desktop application , the developer winfunc based on Tauri 2 + React 18 + Rust development. Provides a visual interface to manage Claude Code projects , support for creating ...

Latest AI Resources

7mos ago

045.1K

TurboDiffusion - 生数科技联合清华等开源的视频生成加速框架

TurboDiffusion - Raw Digital Technology, Tsinghua and other open source video generation acceleration framework

TurboDiffusion is a video generation acceleration framework jointly open-sourced by Tsinghua University, BioDigital Technology, and UC Berkeley, which is able to improve video generation speed by 100-200 times while maintaining nearly lossless picture quality. Through sparse linear attention, sample step distillation and 8-bit...

Latest AI Resources

7mos ago

050K

MedASR - 谷歌开源的医疗语音识别模型

MedASR - Google's open source medical speech recognition model

MedASR is a 105 million parameter medical speech recognition model open-sourced by Google, fine-tuned on a 5,000-hour desensitized clinical corpus, optimized for drug, dosage, and anatomical terminology, with a built-in 6-gram medical language model, and a word error rate of only 4.6 on the private radiology dataset RAD-DICT...

Latest AI Resources

7mos ago

051.5K

Fun-Audio-Chat-8B - 阿里通义开源的端到端语音交互大模型

Fun-Audio-Chat-8B - Ali Tongyi Open Source End-to-End Speech Interaction Grand Modeling

Fun-Audio-Chat-8B is an open source 8 billion parameter end-to-end speech big model by Ali Tongyi team, direct speech in speech out, no need for ASR+LLM+TTS splicing, bilingual fluent in Chinese and English, with low latency and natural timbre. Using dual-resolution shared LLM with 25Hz...

Latest AI Resources

7mos ago

047.5K

PromptFill - 开源的结构化提示词生成AI工具，专为AI绘画设计

PromptFill - Open Source Structured Prompt Word Generation AI Tool Designed for AI Drawing

PromptFill is a structured cue generation tool designed specifically for AI painting, which helps users quickly build, manage and iterate complex prompts through visual "fill-in-the-blank" interactions, improving the efficiency and quality of AI image generation.PromptFill's core features...

Latest AI Resources

7mos ago

049.5K

GLM-4.7 - 智谱AI开源的最新一代旗舰大模型

GLM-4.7 - Wisdom Spectrum AI open source the latest generation of flagship large models

GLM-4.7 is the latest generation of flagship grand model released and open-sourced by Smart Spectrum AI, which is deeply optimized for AI programming, complex reasoning and intelligent body tasks. The model supports 200k context length and 128k maximum output, with multi-language coding, long-range task planning and tool collaboration capabilities...

Latest AI Resources

7mos ago

067.3K

NitroGen - 英伟达联合斯坦福大学、加州理工等开源的游戏AI模型

NitroGen - NVIDIA's open-source gaming AI model in conjunction with Stanford, Caltech, and others

NitroGen is an open source gaming AI model developed by NVIDIA in conjunction with Stanford University, Caltech, and other institutions, capable of playing over 1,000 different types of games. The model is based on the GROOT N1.5 architecture, and is realized by analyzing 40,000 hours of game video data (including joystick operation annotation)...

Latest AI Resources

7mos ago

059.4K

Qwen-Image-Layered - 阿里团队开源的AI图像编辑模型

Qwen-Image-Layered - AI image editing model open-sourced by Ali team

Qwen-Image-Layered is an open source AI image editing model by Ali team, which can intelligently decompose ordinary images into independent transparent layers to achieve accurate editing similar to Photoshop. The model is open source using the Apache 2.0 protocol and supports flexible control of layers...

Latest AI Resources

7mos ago

060.3K

VTP - MiniMax海螺视频团队开源的视觉生成模型技术

VTP - MiniMax Conch Video Team's Open Source Visual Generative Modeling Technology

VTP (Visual Tokenizer Pre-training) is a key technology for visual generative models proposed by MiniMax Conch Video team, which enhances the performance of generative systems by improving the pre-training method of visual tokenizer (tokenizer). The traditional method...

Latest AI Resources

7mos ago

056.1K

T5Gemma 2 - 谷歌开源的新一代编码器-解码器模型

T5Gemma 2 - Google's open source next generation encoder-decoder model

T5Gemma 2 is a new generation encoder-decoder model open-sourced by Google, based on the Gemma 3 architecture upgraded with multimodal and long context processing capabilities. It supports a wide range of data types, including text and images, and is capable of handling very long contexts (up to 128K) in generating...

Latest AI Resources

7mos ago

050.1K

FunctionGemma - 谷歌开源专为函数调用优化的轻量级AI模型

FunctionGemma - Google open source lightweight AI model optimized for function calls

FunctionGemma is a lightweight AI model optimized for function calls launched by Google, developed based on the Gemma 3 base model with 270 million parameters, which can convert natural language into executable API instructions in real time on cell phones, browsers and other devices. The core feature is support for local off...

Latest AI Resources

7mos ago

050.5K

SHARP - 苹果开源的单目视图3D场景合成技术

SHARP - Apple's open source monocular view 3D scene synthesis technology

SHARP (Sharp Monocular View Synthesis in Less Than a Second) is Apple's open source monocular view synthesis technology. It can quickly generate a realistic 3D representation of a scene from a single photo in less than a second...

Latest AI Resources

7mos ago

054.5K

TRELLIS.2 - 微软开源的大型3D生成模型

TRELLIS.2 - Microsoft Open Source Large Scale 3D Generative Modeling

TRELLIS.2 is a Microsoft open source large-scale 3D generative model , with 4 billion parameters , focusing on high-fidelity image to 3D generation . Using the innovative "O-Voxel" sparse voxel structure , can efficiently handle complex topology and sharp features , to generate high-quality 3D information with full PBR material ...

Latest AI Resources

7mos ago

062.2K

Step-GUI - 阶跃星辰开源的AI Agent系列模型

Step-GUI - Step-Star Open Source AI Agent Series Models

Step-GUI is Step-Star's open source AI Agent series of models, including the cloud model Step-GUI, the first MCP protocol for GUI Agents, and the industry's first open source end-side model Step-GUI Edge to support cell phone deployment.Specialized...

Latest AI Resources

7mos ago

061.4K

A2UI - 谷歌开源的Agent驱动型用户交互界面声明式协议

A2UI - Google's open source declarative protocol for Agent-driven user interaction interfaces

A2UI (Agent-to-User Interface) is Google's open-source Agent-driven interface protocol that solves the problem of generating complex interactive interfaces for AI agents. Through a declarative JSON format that allows AI agents to describe the structure of the user interface , client applications ...

Latest AI Resources

7mos ago

066.1K

SAM Audio - Meta推出的开源多模态音频分割模型

SAM Audio - Open Source Multimodal Audio Segmentation Model from Meta

SAM Audio is an open source multimodal audio segmentation model introduced by Meta to accurately separate arbitrary target sounds from complex audio mixes. By combining textual, visual, and temporal dimensional cues, it enables flexible and efficient audio processing for tasks such as audio editing, denoising, sound extraction, and...

Latest AI Resources

7mos ago

053.1K

混元世界模型1.5 - 腾讯混元开源的实时世界模型生成框架

Mixed World Model 1.5 - Tencent Mixed Open Source Real-time World Model Generation Framework

Mixed World Model 1.5 (Tencent HY WorldPlay) is the industry's first open source real-time world modeling framework released by Tencent, covering the entire chain of data, training, and streaming inference deployment. The core is the WorldPlay autoregressive diffusion model, which uses Next-F...

Latest AI Resources

7mos ago

056.1K

Molmo 2 - Ai2开源的多模态视频图像理解模型系列

Molmo 2 - Ai2 open source multimodal video image understanding model series

Molmo 2 is an open source multimodal model released by the Allen Institute for AI (Ai2) to improve video and multi-image understanding. Three variants are included; Molmo 2 (8B), Molmo 2 (4B) and Molmo 2-O...

Latest AI Resources

7mos ago

061.7K

LongCat-Video-Avatar - MeiTuan open source avatar video generation model

LongCat-Video-Avatar is an advanced audio-driven video generation model built on LongCat-Video open-sourced by Meituan, focusing on generating hyper-realistic, lip-synchronized long videos with natural dynamics and consistent identity.

Latest AI Resources

7mos ago

062.2K

MiMo-V2-Flash - 小米发布的开源MoE架构大模型

MiMo-V2-Flash - a large model of the open source MoE architecture released by Xiaomi

MiMo-V2-Flash is an open source MoE architecture large model released by Xiaomi, with 309 billion total parameters and 15 billion active parameters, focusing on efficient reasoning and intelligent body applications. The model adopts hybrid attention architecture and multi-word meta-prediction technology, with an inference speed of 150 tokens/second, into...

Latest AI Resources

7mos ago

055.5K

Nemotron 3 - 英伟达发布的开源 AI 模型系列

Nemotron 3 - A family of open source AI models released by NVIDIA

Nemotron 3 is a family of open source AI models released by NVIDIA in Nano, Super and Ultra sizes. It adopts the hybrid potential expert hybrid (latent MoE) architecture to significantly improve inference efficiency and reduce operating costs. Among them...

Latest AI Resources

7mos ago

056.6K

Wan-Move - 阿里通义联合清华等开源的AI视频生成框架

Wan-Move - Ali Tongyi's open source AI video generation framework with Tsinghua and others

Wan-Move is an open source AI video generation framework jointly developed by Ali Tongyi Labs, Tsinghua University and other organizations, focusing on high-quality video synthesis through precise motion control technology. The core technology is "potential trajectory guidance", which can seamlessly add point-level motion control to the existing image-to-video model...

Latest AI Resources

7mos ago

054.3K

PaCoRe - 阶跃星辰开源的并行协同AI推理框架

PaCoRe - Step Star's open source parallel collaborative AI reasoning framework

PaCoRe (Parallel Coordinated Reasoning) is StepFun's open source innovative parallel collaborative reasoning framework, through a massively parallel thinking mechanism, from multiple perspectives to simultaneously explore the problem solution, breaking through the traditional...

Latest AI Resources

7mos ago

057.9K

Banana Slides - 基于Nano Banana Pro模型的开源AI PPT生成工具

Banana Slides - Open source AI PPT generation tool based on Nano Banana Pro models

Banana Slides is an open source intelligent PPT generator based on the Nano Banana Pro AI model, which supports the rapid creation of professional presentations using natural language commands. Allows users to describe the topic (e.g. "Human impact on the ecosystem") in a single sentence, which can be self...

Latest AI Resources

7mos ago

066.6K

Kaleido - 智谱AI联合清华大学等开源的多主体参考视频生成模型

Kaleido - A multi-subject reference video generation model open-sourced by Smart Spectrum AI in collaboration with Tsinghua University and others

Kaleido is an open source multi-subject reference video generation model jointly developed by Hefei University of Technology, Tsinghua University and Smart Spectrum AI. It generates subject-consistent videos through multiple reference images, solving the deficiencies of existing models in multi-subject consistency and background decoupling.Kaleido generates videos through specialized data...

Latest AI Resources

8mos ago

055.3K

Paper2Slides - 香港大学开源的学术论文转为幻灯片AI工具

Paper2Slides - HKU open source academic papers into slides AI tool

Paper2Slides is an open source AI tool from the Data Intelligence Laboratory of the University of Hong Kong that converts academic papers into professional slides or posters in one click. Using RAG (Retrieval Augmented Generation) technology, directly parsing the document content rather than relying on network information, to ensure that the generated PPT is highly consistent with the original...

Latest AI Resources

8mos ago

060.9K

RealVideo - 智谱 AI 开源的实时流式视频生成系统

RealVideo - Wisdom Spectrum AI's open source real-time streaming video generation system

RealVideo is an open-source real-time streaming video generation system from Smart Spectrum AI that can quickly generate natural and smooth video responses in 2 to 3 seconds. Users only need to upload a photo and enter text, and the system can generate corresponding voice and video, realizing real-time conversations with AI characters...

Latest AI Resources

8mos ago

050K

OpenScreen - 开源免费的屏幕录制工具，支持Mac和Windows双系统

OpenScreen - Open source free screen recording tool for Mac and Windows.

OpenScreen is an open source and free screen recording tool that provides users with an easy to use and fully functional alternative to Screen Studio. It supports both Mac and Windows, is completely free and follows the MIT protocol, and can be used for individual...

Latest AI Resources

8mos ago

060.9K

SCAIL - 智谱联合清华开源的影视级角色动画生成框架

SCAIL - Smart Spectrum and Tsinghua open source film and television character animation generation framework

SCAIL (Studio-Grade Character Animation via In-Context Learning) is a film and television grade character animation generation framework proposed by Smart Spectrum in collaboration with Prof. Liu Yongjin's group at Tsinghua University. Through...

Latest AI Resources

8mos ago

055.8K

DeepSearchQA - 谷歌开源的AI研究Agent测试基准

DeepSearchQA - Google's Open Source AI Research Agent Testing Benchmarks

DeepSearchQA is Google's open-source AI research Agent test benchmark, specifically designed to evaluate the performance of intelligences on complex multi-step query tasks. It consists of 900 hand-designed "causal chain" tasks covering 17 domains, requiring the AI to act like a human researcher and push through multi-step...

Latest AI Resources

8mos ago

052.3K