Latest AI Resources

Total 2920 articles posts

Course materials Latest AI Resources AI Knowledge Base AI News

Sorting

GLM-4.6V - 智谱AI开源的多模态大语言模型系列

GLM-4.6V - Wisdom Spectrum AI open source multimodal large language model series

GLM-4.6V is a series of multimodal large language models open-sourced by Smart Spectrum AI. The series contains two versions: GLM-4.6V (106B-A12B), the basic version for cloud and high-performance cluster scenarios, with the Mixed Expert (MoE) architecture, a total of about 106 billion references, and an activation...

Latest AI Resources

1mos ago

014.5K

吴恩达的LangChain for LLM应用开发免费课程

Free LangChain for LLM Application Development Course by Ernest Ng

LangChain for LLM Application Development is an online course presented by DeepLearning.AI, featuring LangChain founder Harrison Chase and Andrew Ng.

Latest AI Resources Course materials

4mos ago

041.6K

吴恩达的Transformer LLMs工作原理免费课程

Free course on how Transformer LLMs work by Enda Wu

Transformer LLMs work on the principle that DeepLearning.AI and Jay Alammar and Maarten Grootend, authors of Hands-On Large Language Models...

Latest AI Resources Course materials

4mos ago

037.5K

Kimi K2-0905 - 月之暗面推出的最新模型版本

Kimi K2-0905 - The latest model release from Dark Side of the Moon!

Kimi K2-0905 is an advanced AI model from Dark Side of the Moon Technologies Ltd. that excels in programming assistance, generates code efficiently, and supports the generation of neat and standardized code in front-end development. The model context length is extended to 256K to handle complex tasks.

Latest AI Resources

4mos ago

060.8K

Nano Banana - 谷歌推出的AI图像编辑模型

Nano Banana - AI image editing model launched by Google

Nano Banana is the Gemini 2.5 Flash Image codename for Gemini, an AI image generation and editing model from Google that generates detailed, photorealistic images based on simple text prompts to make high-quality modifications to existing images.

Latest AI Resources

5mos ago

058.3K

Skywork UniPic 2.0 - 昆仑万维开源的高效多模态模型

Skywork UniPic 2.0 - Open Source Efficient Multi-Modal Modeling by KunlunWanwei

Skywork UniPic 2.0 is an efficient multimodal model open-sourced by KunlunWei, focusing on image generation, editing and understanding. The model is based on a 2B-parameter SD3.5-Medium architecture, which is realized through pre-training, progressive dual-task reinforcement strategies and co-training...

Latest AI Resources

5mos ago

034.9K

MiniMax Speech 2.5 - MiniMax推出的语音生成模型

MiniMax Speech 2.5 - Speech Generation Model from MiniMax

MiniMax Speech 2.5 is an advanced speech generation model developed by MiniMax team. It has made significant progress in the field of speech synthesis, especially in multilingual expressiveness, timbre reproduction accuracy and language coverage. The model supports 40 languages...

Latest AI Resources

5mos ago

037.8K

GPT-5 - OpenAI推出的最强语言模型，统一智能系统

GPT-5 - The Strongest Language Model Introduced by OpenAI, Unified Intelligence System

GPT-5 is the latest language model released by OpenAI with several upgrades. It is a unified intelligence system with a built-in real-time router that automatically switches between efficient and deep thinking modes according to the complexity of the problem, realizing fast response and accurate answers.GPT-5 has several versions, including the one for general...

Latest AI Resources

5mos ago

035.5K

Qwen-Image - 通义千问推出开源的文生图基础模型

Qwen-Image - Tongyi Qianqian Launches Open Source Basic Model of Qwen-Image

Qwen-Image is an open source image generation base model released by Alibaba Tongyi Qianqian team. With 20 billion parameters, it adopts the Multimodal Diffusion Transformer Architecture (MMDiT), which integrates three modules: multimodal understanding, high-resolution coding and diffusion modeling.Qwen-Image's...

Latest AI Resources

5mos ago

035K

RedOne - 小红书最新推出的社交大模型

RedOne - the latest social mega-model from Little Red Book

RedOne is a large language model customized for social networks introduced by Little Red Book. The model is trained through a three-stage training strategy that incorporates social and cultural knowledge, strengthens multitasking capabilities, and aligns human preferences.RedOne significantly outperforms the base model in social task performance, in harmful content detection and browsing...

Latest AI Resources

5mos ago

035K

TRAE SOLO - 字节跳动TRAE推出的AI自动开发助手

TRAE SOLO - AI Automated Development Assistant from Wordhop TRAE

TRAE SOLO is an AI automated development assistant introduced by TRAE, an AI programming assistant launched by ByteDance, to simplify the software development process with AI technology.TRAE SOLO understands the user's needs, supports text descriptions, voice commands, and file uploads to input the requirements, and automatically plans...

Latest AI Resources

6mos ago

055.2K

LiveTalking：开源实时互动数字人直播系统，实现音视频同步对话

LiveTalking: open source real-time interactive digital human live system, to achieve synchronous audio and video dialogues

Comprehensive introduction LiveTalking is an open source real-time interactive digital human system , is committed to building high-quality digital human live solution . The project uses the Apache 2.0 open source protocol and integrates a number of cutting-edge technologies , including ER-NeRF rendering , real-time audio and video streaming processing ...

Latest AI Resources # AI Java Open Source Projecct # AI Digital Man

1yrs ago

092.6K

Qwen3-VL-Reranker - 阿里巴巴推出的多模态重排序模型

meso- (chemistry)Qwen3-VL-Reranker - 阿里巴巴推出的多模态重排序模型

Qwen3-VL-Reranker是阿里巴巴推出的多模态重排序模型，专门用于提升跨模态检索的精准度。与Qwen3-VL-Embedding协同工作：前者负责快速召回候选结果，后者通过深度跨模态交互（如...

Latest AI Resources

17hrs ago

02K

Qwen3-VL-Embedding - 阿里通义团队开源的多模态嵌入模型

Qwen3-VL-Embedding - 阿里通义团队开源的多模态嵌入模型

Qwen3-VL-Embedding是阿里通义团队开源的多模态嵌入模型，属于Qwen3-VL系列，主要用于跨模态检索任务。模型将文本、图像、视频等不同模态数据映射到同一语义空间，通过双塔架构生成向量表...

Latest AI Resources

2dys ago

05.6K

AntAngelMed - 蚂蚁联合浙江省卫生健康信息中心开源的医疗大模型

AntAngelMed - 蚂蚁联合浙江省卫生健康信息中心开源的医疗大模型

AntAngelMed（蚂蚁·安诊儿医疗大模型）是浙江省卫生健康信息中心、蚂蚁健康、浙江省安诊儿医学人工智能科技有限公司联合开发的开源医疗大模型。模型采用混合专家架构（MoE），总参数量达1000亿...

Latest AI Resources

3dys ago

06.2K

VoiceSculptor - 西北工业大学联合语图智能开源的音色设计模型

VoiceSculptor - 西北工业大学联合语图智能开源的音色设计模型

VoiceSculptor 是西北工业大学联合多家机构开源的音色设计模型，基于 LLaSA-3B 和 CosyVoice2 开发，专注于通过自然语言指令生成多样化音色的语音合成。支持对语速、音量、基频...

Latest AI Resources

3dys ago

04.3K

10Kh RealOmni-Open - 简智机器人开源的具身智能数据集

10Kh RealOmni-Open - 简智机器人开源的具身智能数据集

10Kh RealOmni-Open是简智机器人开源的具身智能数据集，是行业内规模最大的开源具身智能数据集。数据集累计拥有超10000小时数据、100万+片段，覆盖10大场景任务、超过30项技能。数据...

Latest AI Resources

4dys ago

07.3K

Youtu-LLM - 腾讯 Youtu 团队开源的轻量级语言模型

Youtu-LLM - 腾讯 Youtu 团队开源的轻量级语言模型

Youtu-LLM 是腾讯 Youtu 团队开源的轻量级语言模型，参数规模为 19.6 亿。专为智能体任务设计，具备强大的“原生智能体能力”，在多项任务中超越同规模甚至更大模型。

Latest AI Resources

4dys ago

05.8K

Genie Sim 3.0 - 智元机器人开源首个大语言模型驱动的仿真平台

Genie Sim 3.0 - 智元机器人开源首个大语言模型驱动的仿真平台

Genie Sim 3.0是智元机器人发布的首个大语言模型驱动的开源仿真平台。基于NVIDIA Isaac Sim构建，融合三维重建、视觉生成技术与物理引擎，实现毫米级精准复刻真实环境，通过自然语言指...

Latest AI Resources

4dys ago

05.7K

LandPPT - 开源免费的AI PPT生成工具，支持本地部署和云端协作

LandPPT - 开源免费的AI PPT生成工具，支持本地部署和云端协作

LandPPT是基于大语言模型的开源AI PPT生成工具，支持通过主题或上传文档（PDF/Word/Excel）一键生成专业演示文稿。集成了多模型驱动、实时联网搜索和AI绘图功能，提供丰富的模板和场景...

Latest AI Resources

5dys ago

011.7K

TuriX-CUA - 开源AI桌面自动化工具，AI直接操作电脑桌面

TuriX-CUA - 开源AI桌面自动化工具，AI直接操作电脑桌面

TuriX-CUA 是开源的 AI 桌面自动化工具，能通过截屏、多模态模型决策和自动化操作实现电脑交互。让 AI 模型直接操作电脑桌面环境。支持 macOS 和 Windows 系统，通过先进的计算机...

Latest AI Resources

6dys ago

012.2K

MiroThinker 1.5 - MiroMind 团队开源的搜索智能体模型

MiroThinker 1.5 - MiroMind 团队开源的搜索智能体模型

MiroThinker 1.5 是 MiroMind 团队开源的搜索智能体模型，基于 Qwen3 系列开发，包含 30B 和 235B 两种参数规模版本。模型采用交互式扩展技术，支持 256K 上下文...

Latest AI Resources

6dys ago

015.6K

UltraEval-Audio - 清华、OpenBMB联合面壁智能开源的音频模型评测框架

UltraEval-Audio - 清华、OpenBMB联合面壁智能开源的音频模型评测框架

UltraEval-Audio是清华大学NLP实验室、OpenBMB和面壁智能联合开发的音频模型评测框架，最新版本为v1.1.0。专注于解决音频模型复现难、依赖冲突等问题，提供一键复现热门模型（如Vo...

Latest AI Resources

6dys ago

08.5K

openPangu-VL-7B - 华为开源的7B参数多模态模型

openPangu-VL-7B - 华为开源的7B参数多模态模型

openPangu-VL-7B是华为开源的7B参数规模的多模态模型，专为昇腾端侧设备优化设计。模型在视觉定位、OCR识别、文档理解等任务中表现出色，支持实时推理（5FPS），单卡延迟仅160毫秒。

Latest AI Resources

6dys ago

09.3K

New API - 开源的AI模型接口管理与分发系统，统一为标准化接口

New API - 开源的AI模型接口管理与分发系统，统一为标准化接口

New API是基于Go语言开发的开源AI聚合网关工具，可统一管理30+种主流大模型（如OpenAI、Claude、Midjourney等），将不同模型接口转换为标准化OpenAI格式。

Latest AI Resources

6dys ago

08K

Paper2Any - 北大DCAI团队开源的AI科研与演示文稿生成平台

Paper2Any - 北大DCAI团队开源的AI科研与演示文稿生成平台

Paper2Any是北京大学DCAI课题组开源的多模态辅助平台，专注于从论文PDF、图片和文本中快速生成多种科研内容。具备一键生成科研绘图的功能，能从多种输入源生成模型架构图、技术路线图和实验数据图等...

Latest AI Resources

7dys ago

09.6K

StoryMem - 字节跳动与南洋理工联合开源的AI视频生成系统

StoryMem - 字节跳动与南洋理工联合开源的AI视频生成系统

StoryMem是字节跳动与南洋理工大学联合开源的AI视频生成系统，专为解决多场景视频中角色和环境一致性问题。核心通过"视觉记忆库"技术，自动存储关键帧并在后续生成时参考，确保人物外貌、服装、场景元素...

Latest AI Resources

7dys ago

07.8K

XVERSE-Ent - 元象科技开源的泛娱乐领域中英大模型

XVERSE-Ent - 元象科技开源的泛娱乐领域中英大模型

XVERSE-Ent是元象科技推出的专注于泛娱乐领域的开源大模型，包含中英文双版本，支持社交互动、游戏叙事和文化创作等场景。模型通过角色一致性强化、长剧情理解等技术优化，能在虚拟角色人设稳定性、复杂故...

Latest AI Resources

1wks ago

010.9K

Vibe Kanban - 开源的免费AI编程代理任务管理工具

Vibe Kanban - 开源的免费AI编程代理任务管理工具

Vibe Kanban是开源的AI编程代理任务管理工具，专为同时使用多个AI编程助手（如Claude Code、Gemini CLI、Codex等）的开发者设计。通过看板形式统一管理任务进度，支持并行...

Latest AI Resources

1wks ago

09.6K

Chatterbox-Turbo - Resemble AI开源的文本到语音模型

Chatterbox-Turbo - Resemble AI开源的文本到语音模型

Chatterbox-Turbo 是 Resemble AI 推出的开源文本到语音（TTS）模型，专为高效、低延迟的语音合成而设计。基于350M参数的精简架构，单步推理生成音频，时间延迟极低，在150...

Latest AI Resources

1wks ago

014.4K

IQuest-Coder-V1 - 至知创新研究院开源的代码大模型系列

IQuest-Coder-V1 - 至知创新研究院开源的代码大模型系列

IQuest-Coder-V1是九坤投资旗下至知创新研究院研发的开源代码大模型系列，专注于代码智能领域，具备自动编程、Bug修复和代码解释等能力。模型采用创新的Code-Flow训练范式，从代码库演化...

Latest AI Resources

1wks ago

017.9K

混元Motion1.0 - 腾讯混元团队开源的文本生成3D动作模型

Mixed Motion 1.0 - Tencent Mixed Motion team open source text to generate 3D action models

Hybrid Motion1.0 (HY-Motion1.0) is Tencent Hybrid team open source text generated 3D action model , using 1 billion parameters Diffusion Transformer architecture , can be directly generated through natural language description of high-quality 3D character animation .

Latest AI Resources

2wks ago

017.4K

Yume1.5 - 上海AI Lab联合复旦大学开源的交互式世界生成模型

Yume1.5 - An Interactive World Generation Model Open-Sourced by Shanghai AI Lab and Fudan University

Yume 1.5 is an open source interactive world generation model, jointly developed by Shanghai Artificial Intelligence Laboratory, Fudan University, and Shanghai Innovation Research Institute, which is capable of real-time interactive rendering (12 FPS on a single card). It adopts the joint spatio-temporal channel modeling (TSCM) technology, even if the context length increases...

Latest AI Resources

2wks ago

012K

AutoMV - M-A-P联合北邮、南大等开源的免费音乐视频生成系统

AutoMV - M-A-P open source free music video generation system in conjunction with the North Post, South University, etc.

AutoMV is an open source music video generation system developed by the M-A-P team in collaboration with several universities, which can automatically generate coherent music videos based on complete songs without training.It adopts a multi-intelligence body collaboration model, including music analysis, scriptwriting, directing, and quality control modules, and can accurately analyze the lyrics, beats, and...

Latest AI Resources

2wks ago

013.3K

Tencent-HY-MT1.5 - 腾讯混元开源的翻译模型系列

Tencent-HY-MT1.5 - Tencent hybrid open source translation model series

Tencent-HY-MT1.5 is Tencent hybrid open source translation model version 1.5, including 1.8B and 7B two models, support for 33 international languages and 5 kinds of folk Chinese/dialect translation.1.8B model is specially optimized for cell phones and other consumer-grade devices, only 1GB of RAM can be achieved end-side...

Latest AI Resources

2wks ago

017.6K

PersonaLive - 澳门大学等开源的实时AI人像动画生成直播框架

PersonaLive - The University of Macau and other open source real-time AI portrait animation generation live framework

PersonaLive is an open source real-time AI face-swapping live streaming framework, jointly developed by the University of Macau, dzine.ai, and the GVC Lab at the University of the Greater Bay Area. It can realize low-latency and high frame rate digital person drive on ordinary consumer-grade graphics cards (12GB video memory), and support real-time through the camera...

Latest AI Resources

2wks ago

012K

Computer Use Preview - Google开源的AI浏览器自动化工具

Computer Use Preview - Google's open source AI browser automation tool

Computer Use Preview is Google's open source AI browser automation tool based on the Gemini model , through natural language commands to achieve web page interaction . Using "screenshot→analysis→execution" visual recognition process , support Playwrigh...

Latest AI Resources

2wks ago

014.3K

ClipSketch AI - 开源的AI视频转手绘分镜工具，支持B站、小红书

ClipSketch AI - Open source AI video to hand-drawn split-screen tool, support B station, small red book

ClipSketch AI is open source video to hand-drawn split-screen tool designed for short video creators. It can convert videos from B station, Little Red Book and other platforms into hand-drawn style storyboards with one click, support marking key frames, automatic generation of sub-scenes and social copy, and can integrate user-defined roles.

Latest AI Resources

2wks ago

015.4K

MAI-UI - 阿里通义实验室开源的通用GUI智能体基座模型

MAI-UI - Ali Tongyi Labs Open Source Universal GUI Intelligent Body Base Model

MAI-UI is an open source generalized GUI intelligent body base model from Alibaba Tongyi Labs, with four major capabilities: cross-application operation, fuzzy semantic understanding, active user interaction and multi-step process coordination. Adopting end-cloud collaboration architecture, the lightweight model resides in the device to handle daily tasks, and complex tasks can call the cloud big...

Latest AI Resources

2wks ago

018.2K

MiniMax M2.1 - MiniMax开源的编码和代理模型

MiniMax M2.1 - MiniMax open source coding and agent modeling

MiniMax M2.1 is MiniMax's open source coding and agent model with 10 billion activations and support for many major programming languages such as Rust, Java, Golang, C++, Kotlin, Objective-C, TypeS...

Latest AI Resources

2wks ago

09.2K

InstanceAssemble - 小红书联合复旦大学开源的布局控制生成技术

InstanceAssemble - Little Red Book and Fudan University open source layout control generation technology

InstanceAssemble is a layout control generation technology jointly open-sourced by Little Red Book and Fudan University, which realizes accurate image generation from simple to complex and from sparse to dense layout through the mechanism of "Instance Assemble Attention". Adopting a two-stage cascade architecture, Mr. Mr. into the image background, and then one by one ...

Latest AI Resources

2wks ago

07.8K

Zen Browser - 基于Firefox内核的开源AI网页浏览器

Zen Browser - Open source AI web browser based on Firefox kernel

Zen Browser is an open source browser based on the Firefox kernel, focusing on a simple and efficient browsing experience, with the core features of vertical tab bar and workspace isolation. With the sidebar design, it can clearly display the full titles of 50+ tabs and supports multi-window split-screen browsing.

Latest AI Resources

2wks ago

013.1K

QwenLong-L1.5 - 阿里通义实验室开源的长文本推理模型

QwenLong-L1.5 - Ali Tongyi Labs open source long text inference model

QwenLong-L1.5 is an open source long text inference model from Alibaba Tongyi Lab, focusing on solving complex inference problems with ultra-long contexts (e.g., 1M-4M tokens). The core breakthrough lies in three major innovations in the post-training phase: through knowledge graph, SQL parsing and multi-intelligence...

Latest AI Resources

2wks ago

012.3K

Infographic - 阿里AntV团队开源的信息图生成框架

Infographic - Ali AntV team open source infographic generation framework

Infographic is a new generation of Ali AntV team open source framework , based on G2 and Ant Design development , focusing on rapid generation of high-quality infographics , providing 30 + layout templates , 120 + preset themes and AI intelligent generation capabilities .

Latest AI Resources

2wks ago

012K

opcode - 专为Claude Code设计的开源图形化桌面应用

opcode - open source graphical desktop application designed for Claude Code

opcode is designed for Claude Code open source graphical desktop application , the developer winfunc based on Tauri 2 + React 18 + Rust development. Provides a visual interface to manage Claude Code projects , support for creating ...

Latest AI Resources

3wks ago

013.2K

TurboDiffusion - 生数科技联合清华等开源的视频生成加速框架

TurboDiffusion - Raw Digital Technology, Tsinghua and other open source video generation acceleration framework

TurboDiffusion is a video generation acceleration framework jointly open-sourced by Tsinghua University, BioDigital Technology, and UC Berkeley, which is able to improve video generation speed by 100-200 times while maintaining nearly lossless picture quality. Through sparse linear attention, sample step distillation and 8-bit...

Latest AI Resources

3wks ago

015.7K

MedASR - 谷歌开源的医疗语音识别模型

MedASR - Google's open source medical speech recognition model

MedASR is a 105 million parameter medical speech recognition model open-sourced by Google, fine-tuned on a 5,000-hour desensitized clinical corpus, optimized for drug, dosage, and anatomical terminology, with a built-in 6-gram medical language model, and a word error rate of only 4.6 on the private radiology dataset RAD-DICT...

Latest AI Resources

3wks ago

011.4K

Fun-Audio-Chat-8B - 阿里通义开源的端到端语音交互大模型

Fun-Audio-Chat-8B - Ali Tongyi Open Source End-to-End Speech Interaction Grand Modeling

Fun-Audio-Chat-8B is an open source 8 billion parameter end-to-end speech big model by Ali Tongyi team, direct speech in speech out, no need for ASR+LLM+TTS splicing, bilingual fluent in Chinese and English, with low latency and natural timbre. Using dual-resolution shared LLM with 25Hz...

Latest AI Resources

3wks ago

012.1K

PromptFill - 开源的结构化提示词生成AI工具，专为AI绘画设计

PromptFill - Open Source Structured Prompt Word Generation AI Tool Designed for AI Drawing

PromptFill is a structured cue generation tool designed specifically for AI painting, which helps users quickly build, manage and iterate complex prompts through visual "fill-in-the-blank" interactions, improving the efficiency and quality of AI image generation.PromptFill's core features...

Latest AI Resources

3wks ago

011.5K

GLM-4.7 - 智谱AI开源的最新一代旗舰大模型

GLM-4.7 - Wisdom Spectrum AI open source the latest generation of flagship large models

GLM-4.7 is the latest generation of flagship grand model released and open-sourced by Smart Spectrum AI, which is deeply optimized for AI programming, complex reasoning and intelligent body tasks. The model supports 200k context length and 128k maximum output, with multi-language coding, long-range task planning and tool collaboration capabilities...

Latest AI Resources

3wks ago

020.4K

NitroGen - 英伟达联合斯坦福大学、加州理工等开源的游戏AI模型

NitroGen - NVIDIA's open-source gaming AI model in conjunction with Stanford, Caltech, and others

NitroGen is an open source gaming AI model developed by NVIDIA in conjunction with Stanford University, Caltech, and other institutions, capable of playing over 1,000 different types of games. The model is based on the GROOT N1.5 architecture, and is realized by analyzing 40,000 hours of game video data (including joystick operation annotation)...

Latest AI Resources

3wks ago

016.8K

Qwen-Image-Layered - 阿里团队开源的AI图像编辑模型

Qwen-Image-Layered - AI image editing model open-sourced by Ali team

Qwen-Image-Layered is an open source AI image editing model by Ali team, which can intelligently decompose ordinary images into independent transparent layers to achieve accurate editing similar to Photoshop. The model is open source using the Apache 2.0 protocol and supports flexible control of layers...

Latest AI Resources

3wks ago

018.9K

VTP - MiniMax海螺视频团队开源的视觉生成模型技术

VTP - MiniMax Conch Video Team's Open Source Visual Generative Modeling Technology

VTP (Visual Tokenizer Pre-training) is a key technology for visual generative models proposed by MiniMax Conch Video team, which enhances the performance of generative systems by improving the pre-training method of visual tokenizer (tokenizer). The traditional method...

Latest AI Resources

3wks ago

014.2K

T5Gemma 2 - 谷歌开源的新一代编码器-解码器模型

T5Gemma 2 - Google's open source next generation encoder-decoder model

T5Gemma 2 is a new generation encoder-decoder model open-sourced by Google, based on the Gemma 3 architecture upgraded with multimodal and long context processing capabilities. It supports a wide range of data types, including text and images, and is capable of handling very long contexts (up to 128K) in generating...

Latest AI Resources

3wks ago

014.5K

FunctionGemma - 谷歌开源专为函数调用优化的轻量级AI模型

FunctionGemma - Google open source lightweight AI model optimized for function calls

FunctionGemma is a lightweight AI model optimized for function calls launched by Google, developed based on the Gemma 3 base model with 270 million parameters, which can convert natural language into executable API instructions in real time on cell phones, browsers and other devices. The core feature is support for local off...

Latest AI Resources

3wks ago

013.1K

SHARP - 苹果开源的单目视图3D场景合成技术

SHARP - Apple's open source monocular view 3D scene synthesis technology

SHARP (Sharp Monocular View Synthesis in Less Than a Second) is Apple's open source monocular view synthesis technology. It can quickly generate a realistic 3D representation of a scene from a single photo in less than a second...

Latest AI Resources

3wks ago

014K

TRELLIS.2 - 微软开源的大型3D生成模型

TRELLIS.2 - Microsoft Open Source Large Scale 3D Generative Modeling

TRELLIS.2 is a Microsoft open source large-scale 3D generative model , with 4 billion parameters , focusing on high-fidelity image to 3D generation . Using the innovative "O-Voxel" sparse voxel structure , can efficiently handle complex topology and sharp features , to generate high-quality 3D information with full PBR material ...

Latest AI Resources

3wks ago

016.3K

Step-GUI - 阶跃星辰开源的AI Agent系列模型

Step-GUI - Step-Star Open Source AI Agent Series Models

Step-GUI is Step-Star's open source AI Agent series of models, including the cloud model Step-GUI, the first MCP protocol for GUI Agents, and the industry's first open source end-side model Step-GUI Edge to support cell phone deployment.Specialized...

Latest AI Resources

4wks ago

018.6K

A2UI - 谷歌开源的Agent驱动型用户交互界面声明式协议

A2UI - Google's open source declarative protocol for Agent-driven user interaction interfaces

A2UI (Agent-to-User Interface) is Google's open-source Agent-driven interface protocol that solves the problem of generating complex interactive interfaces for AI agents. Through a declarative JSON format that allows AI agents to describe the structure of the user interface , client applications ...

Latest AI Resources

4wks ago

022K

SAM Audio - Meta推出的开源多模态音频分割模型

SAM Audio - Open Source Multimodal Audio Segmentation Model from Meta

SAM Audio is an open source multimodal audio segmentation model introduced by Meta to accurately separate arbitrary target sounds from complex audio mixes. By combining textual, visual, and temporal dimensional cues, it enables flexible and efficient audio processing for tasks such as audio editing, denoising, sound extraction, and...

Latest AI Resources

4wks ago

013.3K

混元世界模型1.5 - 腾讯混元开源的实时世界模型生成框架

Mixed World Model 1.5 - Tencent Mixed Open Source Real-time World Model Generation Framework

Mixed World Model 1.5 (Tencent HY WorldPlay) is the industry's first open source real-time world modeling framework released by Tencent, covering the entire chain of data, training, and streaming inference deployment. The core is the WorldPlay autoregressive diffusion model, which uses Next-F...

Latest AI Resources

4wks ago

012.8K

Molmo 2 - Ai2开源的多模态视频图像理解模型系列

Molmo 2 - Ai2 open source multimodal video image understanding model series

Molmo 2 is an open source multimodal model released by the Allen Institute for AI (Ai2) to improve video and multi-image understanding. Three variants are included; Molmo 2 (8B), Molmo 2 (4B) and Molmo 2-O...

Latest AI Resources

4wks ago

014.3K

LongCat-Video-Avatar - MeiTuan open source avatar video generation model

LongCat-Video-Avatar is an advanced audio-driven video generation model built on LongCat-Video open-sourced by Meituan, focusing on generating hyper-realistic, lip-synchronized long videos with natural dynamics and consistent identity.

Latest AI Resources

4wks ago

016.9K

MiMo-V2-Flash - 小米发布的开源MoE架构大模型

MiMo-V2-Flash - a large model of the open source MoE architecture released by Xiaomi

MiMo-V2-Flash is an open source MoE architecture large model released by Xiaomi, with 309 billion total parameters and 15 billion active parameters, focusing on efficient reasoning and intelligent body applications. The model adopts hybrid attention architecture and multi-word meta-prediction technology, with an inference speed of 150 tokens/second, into...

Latest AI Resources

4wks ago

016.8K

Nemotron 3 - 英伟达发布的开源 AI 模型系列

Nemotron 3 - A family of open source AI models released by NVIDIA

Nemotron 3 is a family of open source AI models released by NVIDIA in Nano, Super and Ultra sizes. It adopts the hybrid potential expert hybrid (latent MoE) architecture to significantly improve inference efficiency and reduce operating costs. Among them...

Latest AI Resources

4wks ago

014.1K

Wan-Move - 阿里通义联合清华等开源的AI视频生成框架

Wan-Move - Ali Tongyi's open source AI video generation framework with Tsinghua and others

Wan-Move is an open source AI video generation framework jointly developed by Ali Tongyi Labs, Tsinghua University and other organizations, focusing on high-quality video synthesis through precise motion control technology. The core technology is "potential trajectory guidance", which can seamlessly add point-level motion control to the existing image-to-video model...

Latest AI Resources

4wks ago

013.3K

PaCoRe - 阶跃星辰开源的并行协同AI推理框架

PaCoRe - Step Star's open source parallel collaborative AI reasoning framework

PaCoRe (Parallel Coordinated Reasoning) is StepFun's open source innovative parallel collaborative reasoning framework, through a massively parallel thinking mechanism, from multiple perspectives to simultaneously explore the problem solution, breaking through the traditional...

Latest AI Resources

4wks ago

013.8K

Banana Slides - 基于Nano Banana Pro模型的开源AI PPT生成工具

Banana Slides - Open source AI PPT generation tool based on Nano Banana Pro models

Banana Slides is an open source intelligent PPT generator based on the Nano Banana Pro AI model, which supports the rapid creation of professional presentations using natural language commands. Allows users to describe the topic (e.g. "Human impact on the ecosystem") in a single sentence, which can be self...

Latest AI Resources

4wks ago

015.4K

Kaleido - 智谱AI联合清华大学等开源的多主体参考视频生成模型

Kaleido - A multi-subject reference video generation model open-sourced by Smart Spectrum AI in collaboration with Tsinghua University and others

Kaleido is an open source multi-subject reference video generation model jointly developed by Hefei University of Technology, Tsinghua University and Smart Spectrum AI. It generates subject-consistent videos through multiple reference images, solving the deficiencies of existing models in multi-subject consistency and background decoupling.Kaleido generates videos through specialized data...

Latest AI Resources

4wks ago

011.8K

Paper2Slides - 香港大学开源的学术论文转为幻灯片AI工具

Paper2Slides - HKU open source academic papers into slides AI tool

Paper2Slides is an open source AI tool from the Data Intelligence Laboratory of the University of Hong Kong that converts academic papers into professional slides or posters in one click. Using RAG (Retrieval Augmented Generation) technology, directly parsing the document content rather than relying on network information, to ensure that the generated PPT is highly consistent with the original...

Latest AI Resources

4wks ago

012.2K

RealVideo - 智谱 AI 开源的实时流式视频生成系统

RealVideo - Wisdom Spectrum AI's open source real-time streaming video generation system

RealVideo is an open-source real-time streaming video generation system from Smart Spectrum AI that can quickly generate natural and smooth video responses in 2 to 3 seconds. Users only need to upload a photo and enter text, and the system can generate corresponding voice and video, realizing real-time conversations with AI characters...

Latest AI Resources

4wks ago

010.8K

OpenScreen - 开源免费的屏幕录制工具，支持Mac和Windows双系统

OpenScreen - Open source free screen recording tool for Mac and Windows.

OpenScreen is an open source and free screen recording tool that provides users with an easy to use and fully functional alternative to Screen Studio. It supports both Mac and Windows, is completely free and follows the MIT protocol, and can be used for individual...

Latest AI Resources

4wks ago

014.9K

SCAIL - 智谱联合清华开源的影视级角色动画生成框架

SCAIL - Smart Spectrum and Tsinghua open source film and television character animation generation framework

SCAIL (Studio-Grade Character Animation via In-Context Learning) is a film and television grade character animation generation framework proposed by Smart Spectrum in collaboration with Prof. Liu Yongjin's group at Tsinghua University. Through...

Latest AI Resources

4wks ago

011.7K

DeepSearchQA - 谷歌开源的AI研究Agent测试基准

DeepSearchQA - Google's Open Source AI Research Agent Testing Benchmarks

DeepSearchQA is Google's open-source AI research Agent test benchmark, specifically designed to evaluate the performance of intelligences on complex multi-step query tasks. It consists of 900 hand-designed "causal chain" tasks covering 17 domains, requiring the AI to act like a human researcher and push through multi-step...

Latest AI Resources

4wks ago

011.9K

Claude-Mem - 开源Claude Code记忆插件，支持跨会话持久化记忆

Claude-Mem - Open Source Claude Code Memory Plugin with Cross-Session Persistent Memory Support

Claude-Mem is an open source plugin for Claude Code that solves the problem of AI memory loss across sessions. It helps Claude by automatically capturing the tool's use of observations, generating semantic summaries, and injecting relevant context in subsequent sessions...

Latest AI Resources

1mos ago

018.4K

KoalaQA - 开源的AI售后服务系统，帮企业快速搭建问答平台

KoalaQA - Open source AI after-sales service system to help companies quickly build Q&A platforms

KoalaQA is an open source intelligent after-sales service system developed by the Chaitin team. Based on the AI model, it provides AI customer service, AI search and knowledge base management functions to help enterprises quickly build an intelligent Q&A platform. The system supports 24/7 real-time response ...

Latest AI Resources

1mos ago

012.5K

VoxCPM 1.5 - 面壁智能开源的端到端文本到语音模型

VoxCPM 1.5 - Faceted Intelligence Open Source End-to-End Text-to-Speech Modeling

VoxCPM 1.5 is an open source speech generation model released by Facade Intelligence, based on text-to-speech (TTS) technology without the need for a splitter, featuring several innovations and improvements. Adopting an end-to-end diffusion autoregressive architecture, it generates continuous speech waveforms directly from text, avoiding the limitations of traditional segmentation methods...

Latest AI Resources

1mos ago

016.4K

Mistral Vibe - Mistral AI推出的开源命令行编码助手

Mistral Vibe - Open Source Command Line Coding Assistant from Mistral AI

Mistral Vibe is an open source command line coding assistant from Mistral AI, developed based on the Devstral model, which supports natural language interaction to complete code search, file manipulation, version control and other tasks. Can automatically scan the project structure and Git status through the @ symbol...

Latest AI Resources

1mos ago

011.4K

GLM-TTS - 智谱AI推出的开源工业级语音合成系统

GLM-TTS - Open Source Industrial Grade Speech Synthesis System by Smart Spectrum AI

GLM-TTS is an open source industrial-grade speech synthesis system with powerful speech synthesis capabilities. Adopting a two-stage generation architecture: the first stage will be converted to text into speech token sequences, and the second stage will be converted into high-quality audio token sequences. The system supports only 3 seconds of voice samples to complete the sound...

Latest AI Resources

1mos ago

013.7K

Devstral 2 - Mistral AI 推出的新一代编程模型家族

Devstral 2 - The Next Generation Family of Programming Models from Mistral AI

Devstral 2 is a family of next-generation programming models designed for software engineering tasks from Mistral AI, consisting of Devstral 2 (123B parameter) and Devstral Small 2 (24B parameter) versions.D...

Latest AI Resources

1mos ago

014.6K

GLM-ASR - 智谱AI开源的高性能语音识别模型系列

GLM-ASR - Wisdom Spectrum AI open source high-performance speech recognition model series

GLM-ASR is a family of high-performance speech recognition models open-sourced by Smart Spectrum AI, including the cloud-based model GLM-ASR-2512 and the open-source end-side model GLM-ASR-Nano-2512.GLM-ASR-2512 is the world's leading cloud-based speech recognition model, supporting multiple...

Latest AI Resources

1mos ago

017.3K

OpenAutoGLM - 智谱AI开源的手机AI Agent模型

OpenAutoGLM - Smart Spectrum AI open source cell phone AI Agent model

OpenAutoGLM is an open source intelligent body model with the ability of "cell phone use", which can understand the content of the cell phone screen through multi-modal perception, and automatically generate the operation flow to complete the user-specified tasks. Users only need to use natural language to describe the needs, such as "open Meituan to search for nearby hot pot ...

Latest AI Resources

1mos ago

019.2K

SurfSense - 开源的AI研究与知识管理工具，NotebookLM最强平替

SurfSense - Open source AI research and knowledge management tool, NotebookLM's strongest pinto!

SurfSense is an open source AI research and knowledge management tool. Highly customizable, it can connect to search engines, Slack, Jira, Notion, YouTube, GitHub, and many other external data sources to facilitate users to integrate information. Users can upload a variety of...

Latest AI Resources

1mos ago

014.2K

InkSight - Google开源的AI手写识别工具

InkSight - Google's open source AI handwriting recognition tool

InkSight is Google's open source AI handwriting recognition tool that converts paper handwritten notes into editable digital inked files (e.g. SVG format). Unlike traditional OCR , can recognize text content , can restore the handwriting style , paragraph structure and focus marking , support for multi-language processing .

Latest AI Resources

1mos ago

012.4K

NewBie-image-Exp0.1 - NewBieAI-Lab开源的实验性动漫文生图模型

NewBie-image-Exp0.1 - NewBieAI-Lab open source experimental anime literate graphical models

NewBie-image-Exp0.1 is the first experimental anime text-born graph model open-sourced by the NewBieAI-Lab team, using the Next-DiT architecture with 3.5B parameters, optimized for the secondary style. The model is optimized for the secondary style by a dual text encoder (GEMMA3-4B...

Latest AI Resources

1mos ago

015.4K

LongCat-Image - 美团LongCat团队开源的图像生成与编辑模型

LongCat-Image - LongCat team open source image generation and editing model of the Mission

LongCat-Image is an open source image generation and editing model released by the LongCat team of Meituan. Using a hybrid backbone architecture (MM-DiT+Single-DiT), combined with a visual language model (VLM) conditional encoder, it is able to realize text-generated images and multiple rounds of image editing...

Latest AI Resources

1mos ago

012K

VibeVoice-Realtime - 微软开源的轻量级实时文本转语音模型

VibeVoice-Realtime - Microsoft open source lightweight real-time text-to-speech model

VibeVoice-Realtime is Microsoft's open source lightweight real-time text-to-speech (TTS) model designed for low latency and real-time interaction. Supports streaming text input , from the first text token can be vocalized , the delay is only about 300 milliseconds , suitable for dynamic number ...

Latest AI Resources

1mos ago

014.5K

Flowra - 魔搭联合呜哩WULI团队开源的AI工作流开发工具

Flowra - AI workflow development tool open-sourced by Magic Hitch and Wooli WULI team

Flowra is ModelScope joint woo mile WULI team open source graph execution engine and node package development tools, is the core component of FlowBench. Through the directed acyclic graph (DAG) organization workflow , with intelligent caching , parallel scheduling , distributed support ...

Latest AI Resources

1mos ago

014.6K

RoboCOIN - 智源联合多所高校开源的双臂机器人真机数据集

RoboCOIN - A real robot dataset for dual-armed robots open-sourced by Wisdom Source in collaboration with several universities

RoboCOIN is the world's first large-scale dual-arm robot real machine dataset open-sourced by Beijing Zhiyuan Artificial Intelligence Research Institute in conjunction with a number of enterprises and colleges and universities, which contains 15 types of robot platforms, 180,000 real operation trajectories, and 421 types of task scenarios. The most important feature is the use of hierarchical annotation system to disassemble the task ...

Latest AI Resources

1mos ago

012.1K

TalkCody - 免费开源的AI编程桌面助手，支持复杂任务

TalkCody - Free and open source AI programming desktop assistant with support for complex tasks

TalkCody is a free and open source AI programming assistant desktop application , built on Rust + Tauri 2 , support for Windows, macOS and Linux three platforms , with native performance , fast startup and low resource consumption advantages . Supports more than 50 mainstream A...

Latest AI Resources

1mos ago

018.4K

MemMachine - MemVerge推出的开源AI记忆系统

MemMachine - Open Source AI Memory System by MemVerge

MemMachine is an open source AI memory system developed by MemVerge, designed for AI models and intelligences, which can store and recall interaction data like the human brain, solving the problem of AI "stateless memory loss". It adopts a layered architecture (short-term memory, long-term memory, user image...

Latest AI Resources

1mos ago

018.7K

PartCrafter - 北大联合字节开源的单图3D生成模型

PartCrafter - NU United Bytes open source single figure 3D generated models

PartCrafter is an advanced 3D generative model, jointly proposed by Peking University, ByteDance and Carnegie Mellon University. It can generate multiple semantically explicit and geometrically diverse 3D mesh parts from a single RGB image at once. The models are modeled through a combinatorial potential space and...

Latest AI Resources

1mos ago

014.4K

GigaWorld-0 - 极佳视界开源的世界模型框架

GigaWorld-0 - GigaVision open source world modeling framework

GigaWorld-0 is the open source world modeling framework of domestic Embodied Intelligence startup GigaAI, mainly used to solve the data bottleneck problem in the field of Embodied Intelligence (Embodied AI). Efficiently generating high-quality, diverse and physically realistic training data, push...

Latest AI Resources

1mos ago

014.3K

Mistral 3 - Mistral AI发布开源的最新多模态大模型系列

Mistral 3 - Mistral AI Releases Open Source's Newest Series of Multimodal Large Models

Mistral 3 is the latest multimodal large model series released as open source by Mistral AI, including the flagship model Mistral Large 3 (675B total parameters) and the lighter version of the Ministral series (3B/8B/14B), both supporting image understanding...

Latest AI Resources

1mos ago

013.2K

Vidi2 - 字节跳动开源的多模态视频理解与生成大模型

Vidi2 - ByteHop's open source multimodal video understanding and generation of large models

Vidi2 is a second-generation multimodal video understanding and generation big model open-sourced by ByteDance, focusing on video content understanding, analysis and creation. It supports joint input of text, video, and audio modalities, and can simultaneously understand picture content, sound information, and natural language commands to achieve cross-modal interaction and push...

Latest AI Resources

1mos ago

014.8K

Alpamayo-R1 - 英伟达开源的带推理能力的视觉-语言-行动模型

Alpamayo-R1 - NVIDIA's Open Source Vision-Language-Action Model with Reasoning Capabilities

Alpamayo-R1 is a NVIDIA-developed Vision-Language-Action (VLA) model with reasoning capability, designed to enhance the decision-making capability of autonomous driving in complex scenarios. By introducing a causal chain reasoning mechanism, the vehicle is able to analyze scene causality (e.g., "cause before...

Latest AI Resources

1mos ago

022.2K

Ovis-Image - 阿里AIDC-AI团队开源的文生图模型

Ovis-Image - Ali AIDC-AI team's open source Vincentian graph model

Ovis-Image is a 7 billion parameter text-generated graph model open-sourced by the AIDC-AI team of Alibaba International Digital Commerce Group, focusing on high-quality text rendering. Based on Ovis-U1 architecture, it inherits the advanced visual decoder and bi-directional Token refiner ...

Latest AI Resources

1mos ago

014.4K

悟界·Emu3.5 - 智源研究院开源的多模态世界大模型

Wujie-Emu3.5 - Wisdom Source Research Institute open source multimodal world big model

Wujie-Emu3.5 is an open source multimodal world grand model from Beijing Zhiyuan Artificial Intelligence Research Institute, with 34 billion references and native world modeling capability. Trained by 10 trillion multimodal Token (including 790 years of video data), it can simulate the laws of physics and realize graphic generation, visual guidance...

Latest AI Resources

1mos ago

015.8K

GELab-Zero - 阶跃团队开源的端侧多模态GUI Agent模型

GELab-Zero - Open source end-side multimodal GUI Agent model by Steps team

GELab-Zero is an open source end-side multimodal GUI Agent model by Step Leap Team , built on Qwen3-VL-4B-Instruct base model with 4B parameters.It can recognize UI elements and perform operations such as clicking and sliding, and supports cross-application tasking ...

Latest AI Resources

1mos ago

019.7K