Latest AI Resources

Total 2972 articles posts
Kaleido - 智谱AI联合清华大学等开源的多主体参考视频生成模型

Kaleido - A multi-subject reference video generation model open-sourced by Smart Spectrum AI in collaboration with Tsinghua University and others

Kaleido is an open source multi-subject reference video generation model jointly developed by Hefei University of Technology, Tsinghua University and Smart Spectrum AI. It generates subject-consistent videos through multiple reference images, solving the deficiencies of existing models in multi-subject consistency and background decoupling.Kaleido generates videos through specialized data...
3mos ago
026.4K
Paper2Slides - 香港大学开源的学术论文转为幻灯片AI工具

Paper2Slides - HKU open source academic papers into slides AI tool

Paper2Slides is an open source AI tool from the Data Intelligence Laboratory of the University of Hong Kong that converts academic papers into professional slides or posters in one click. Using RAG (Retrieval Augmented Generation) technology, directly parsing the document content rather than relying on network information, to ensure that the generated PPT is highly consistent with the original...
3mos ago
026.8K
VoxCPM 1.5 - 面壁智能开源的端到端文本到语音模型

VoxCPM 1.5 - Faceted Intelligence Open Source End-to-End Text-to-Speech Modeling

VoxCPM 1.5 is an open source speech generation model released by Facade Intelligence, based on text-to-speech (TTS) technology without the need for a splitter, featuring several innovations and improvements. Adopting an end-to-end diffusion autoregressive architecture, it generates continuous speech waveforms directly from text, avoiding the limitations of traditional segmentation methods...
3mos ago
032.4K
GLM-TTS - 智谱AI推出的开源工业级语音合成系统

GLM-TTS - Open Source Industrial Grade Speech Synthesis System by Smart Spectrum AI

GLM-TTS is an open source industrial-grade speech synthesis system with powerful speech synthesis capabilities. Adopting a two-stage generation architecture: the first stage will be converted to text into speech token sequences, and the second stage will be converted into high-quality audio token sequences. The system supports only 3 seconds of voice samples to complete the sound...
3mos ago
024.2K
OpenAutoGLM - 智谱AI开源的手机AI Agent模型

OpenAutoGLM - Smart Spectrum AI open source cell phone AI Agent model

OpenAutoGLM is an open source intelligent body model with the ability of "cell phone use", which can understand the content of the cell phone screen through multi-modal perception, and automatically generate the operation flow to complete the user-specified tasks. Users only need to use natural language to describe the needs, such as "open Meituan to search for nearby hot pot ...
3mos ago
027.9K
InkSight - Google开源的AI手写识别工具

InkSight - Google's open source AI handwriting recognition tool

InkSight is Google's open source AI handwriting recognition tool that converts paper handwritten notes into editable digital inked files (e.g. SVG format). Unlike traditional OCR , can recognize text content , can restore the handwriting style , paragraph structure and focus marking , support for multi-language processing .
3mos ago
022K
RoboCOIN - 智源联合多所高校开源的双臂机器人真机数据集

RoboCOIN - A real robot dataset for dual-armed robots open-sourced by Wisdom Source in collaboration with several universities

RoboCOIN is the world's first large-scale dual-arm robot real machine dataset open-sourced by Beijing Zhiyuan Artificial Intelligence Research Institute in conjunction with a number of enterprises and colleges and universities, which contains 15 types of robot platforms, 180,000 real operation trajectories, and 421 types of task scenarios. The most important feature is the use of hierarchical annotation system to disassemble the task ...
3mos ago
020.9K
MemMachine - MemVerge推出的开源AI记忆系统

MemMachine - Open Source AI Memory System by MemVerge

MemMachine is an open source AI memory system developed by MemVerge, designed for AI models and intelligences, which can store and recall interaction data like the human brain, solving the problem of AI "stateless memory loss". It adopts a layered architecture (short-term memory, long-term memory, user image...
3mos ago
026K
Vidi2 - 字节跳动开源的多模态视频理解与生成大模型

Vidi2 - ByteHop's open source multimodal video understanding and generation of large models

Vidi2 is a second-generation multimodal video understanding and generation big model open-sourced by ByteDance, focusing on video content understanding, analysis and creation. It supports joint input of text, video, and audio modalities, and can simultaneously understand picture content, sound information, and natural language commands to achieve cross-modal interaction and push...
3mos ago
023K
ViMax - 香港大学开源的多智能体视频生成框架

ViMax - Open Source Multi-intelligent Body Video Generation Framework at the University of Hong Kong

ViMax is an open source multi-intelligence body video generation framework from the Data Science Laboratory of the University of Hong Kong, which can automate the whole process from creative input to video output. Integration of script generation , scene design , shot planning and video rendering and other functions , to support users to generate coherent film and television grade video through natural language description ...
3mos ago
039.4K
HunyuanOCR - 腾讯混元开源的光学字符识别专家模型

HunyuanOCR - Tencent's open source expert model for optical character recognition

HunyuanOCR is a high-performance optical character recognition model open-sourced by the Tencent hybrid team, with a reference number of only 1 billion. Developed based on the hybrid multimodal architecture, it adopts an end-to-end design and can efficiently handle text detection, recognition and document parsing tasks. The model scored 94.1 points in the complex document test, surpassing...
3mos ago
029.9K
Awex - 蚂蚁集团开源的高性能权重交换框架

Awex - Ant Group open source high performance weight exchange framework

Awex is the Ant Group open source high performance weight exchange framework, designed for large-scale parameter synchronization in reinforcement learning. It can complete terabytes of parameter exchange in seconds, significantly improving the efficiency of training and inference.Awex has a very fast synchronization performance, in a thousand card cluster, trillion parameter models can be completed within 6 seconds of the full amount of...
4mos ago
071.1K
LoopTool - 上海交大联合小红书开源的自动化工具调用数据进化框架

LoopTool - Shanghai Jiaotong University and Little Red Book open source automated tool to call the data evolution framework

LoopTool is an automated tool-call data evolution framework open-sourced by Shanghai Jiao Tong University and Little Red Book team, designed to improve the tool-call capability of large language models. It optimizes data generation and model training through closed-loop iteration, using open-source models (e.g., Qwen3-32B) as data generation...
4mos ago
071.9K
ChatTutor - 开源的AI教学辅助工具,可视化互动学习

ChatTutor - Open source AI teaching aid to visualize interactive learning

ChatTutor is an open source AI teaching aid focused on visual and interactive learning of STEM subjects. Through the multi-intelligent body architecture to achieve dialogical Q&A and dynamic drawing function, can draw math graphs, physics circuits or mind maps on the whiteboard in real time, to help users intuitively understand the abstract generalization ...
4mos ago
020.5K
EverMemOS - 盛大团队推出的开源长期记忆操作系统

EverMemOS - Open Source Long-Term Memory Operating System by Team Shanda

EverMemOS is an open source long-term memory operating system launched by the Shanda team led by Chen Tianqiao, designed for AI intelligences to solve the problem of memory breakage caused by the fixed context window of large language models. The system is based on the human brain memory mechanism, using a four-layer architecture (agent layer, memory layer, index layer...
4mos ago
031.5K
Kosong - Moonshot AI开源的全新AI Agent开发框架

Kosong - Moonshot AI's New Open Source AI Agent Development Framework

Kosong is a new AI Agent development framework open-sourced by Dark Side of the Moon (Moonshot AI) that provides developers with a lightweight, flexible, and highly scalable underlying support for building next-generation intelligent body applications. With an asynchronous tool orchestration engine that efficiently schedules multiple tools...
4mos ago
025.7K
SenseNova-SI - 商汤科技开源的空间智能大模型系列

SenseNova-SI - A Family of Open Source Spatial Intelligence Large Models from ShangTech

SenseNova-SI is an open source spatial intelligence grand model released by ShangTech, focusing on improving AI's ability in spatial understanding and reasoning. The model excels in six core dimensions, including spatial measurement, reconstruction, relationship judgment, perspective transformation, deformation analysis, and spatial reasoning, significantly outperforming other...
4mos ago
021.7K
NocoBase - 免费开源的AI无代码开发平台,可视化构建应用

NocoBase - Free and open source AI no-code development platform to build apps visually

NocoBase is based on AI-driven open-source no-code development platform that supports the rapid construction of business systems, without programming to complete the application development through configuration. The project uses Apache-2.0 protocol , provides private deployment and flexible scalability , suitable for enterprise management , collaboration platforms and other fields ...
4mos ago
025.1K
UniWorld V2 - 兔展智能联合北大推出的新一代图像编辑模型

UniWorld V2 - A New Generation of Image Editing Models Launched by Rabbit Show Intelligence in Association with Peking University

UniWorld V2 is a new generation of image editing model jointly launched by RabbitZhan Intelligence and UniWorld team of Peking University. It has significant advantages in the field of image editing, especially in Chinese comprehension and execution of complex commands. The model can accurately render artistic Chinese fonts and support fine...
4mos ago
026.9K
Handy - 开源免费的本地AI语音转文字工具

Handy - Open Source Free Native AI Speech to Text Tool

Handy is open source and free local speech to text tool, supporting Windows, MacOS and Linux systems, developed by Rust and React. It is suitable for quick transcription and text input by processing voice data locally without uploading it to the cloud to ensure privacy and security.
4mos ago
048.9K