Latest AI Resources

Total 3094 articles posts
Yume1.5 - 上海AI Lab联合复旦大学开源的交互式世界生成模型

Yume1.5 - An Interactive World Generation Model Open-Sourced by Shanghai AI Lab and Fudan University

Yume 1.5 is an open source interactive world generation model, jointly developed by Shanghai Artificial Intelligence Laboratory, Fudan University, and Shanghai Innovation Research Institute, which is capable of real-time interactive rendering (12 FPS on a single card). It adopts the joint spatio-temporal channel modeling (TSCM) technology, even if the context length increases...
4mos ago
033.4K
AutoMV - M-A-P联合北邮、南大等开源的免费音乐视频生成系统

AutoMV - M-A-P open source free music video generation system in conjunction with the North Post, South University, etc.

AutoMV is an open source music video generation system developed by the M-A-P team in collaboration with several universities, which can automatically generate coherent music videos based on complete songs without training.It adopts a multi-intelligence body collaboration model, including music analysis, scriptwriting, directing, and quality control modules, and can accurately analyze the lyrics, beats, and...
4mos ago
036.3K
PersonaLive - 澳门大学等开源的实时AI人像动画生成直播框架

PersonaLive - The University of Macau and other open source real-time AI portrait animation generation live framework

PersonaLive is an open source real-time AI face-swapping live streaming framework, jointly developed by the University of Macau, dzine.ai, and the GVC Lab at the University of the Greater Bay Area. It can realize low-latency and high frame rate digital person drive on ordinary consumer-grade graphics cards (12GB video memory), and support real-time through the camera...
4mos ago
033.8K
MAI-UI - 阿里通义实验室开源的通用GUI智能体基座模型

MAI-UI - Ali Tongyi Labs Open Source Universal GUI Intelligent Body Base Model

MAI-UI is an open source generalized GUI intelligent body base model from Alibaba Tongyi Labs, with four major capabilities: cross-application operation, fuzzy semantic understanding, active user interaction and multi-step process coordination. Adopting end-cloud collaboration architecture, the lightweight model resides in the device to handle daily tasks, and complex tasks can call the cloud big...
4mos ago
038.3K
InstanceAssemble - 小红书联合复旦大学开源的布局控制生成技术

InstanceAssemble - Little Red Book and Fudan University open source layout control generation technology

InstanceAssemble is a layout control generation technology jointly open-sourced by Little Red Book and Fudan University, which realizes accurate image generation from simple to complex and from sparse to dense layout through the mechanism of "Instance Assemble Attention". Adopting a two-stage cascade architecture, Mr. Mr. into the image background, and then one by one ...
4mos ago
023.1K
MedASR - 谷歌开源的医疗语音识别模型

MedASR - Google's open source medical speech recognition model

MedASR is a 105 million parameter medical speech recognition model open-sourced by Google, fine-tuned on a 5,000-hour desensitized clinical corpus, optimized for drug, dosage, and anatomical terminology, with a built-in 6-gram medical language model, and a word error rate of only 4.6 on the private radiology dataset RAD-DICT...
4mos ago
034.5K
Kaleido - 智谱AI联合清华大学等开源的多主体参考视频生成模型

Kaleido - A multi-subject reference video generation model open-sourced by Smart Spectrum AI in collaboration with Tsinghua University and others

Kaleido is an open source multi-subject reference video generation model jointly developed by Hefei University of Technology, Tsinghua University and Smart Spectrum AI. It generates subject-consistent videos through multiple reference images, solving the deficiencies of existing models in multi-subject consistency and background decoupling.Kaleido generates videos through specialized data...
4mos ago
034.7K
Paper2Slides - 香港大学开源的学术论文转为幻灯片AI工具

Paper2Slides - HKU open source academic papers into slides AI tool

Paper2Slides is an open source AI tool from the Data Intelligence Laboratory of the University of Hong Kong that converts academic papers into professional slides or posters in one click. Using RAG (Retrieval Augmented Generation) technology, directly parsing the document content rather than relying on network information, to ensure that the generated PPT is highly consistent with the original...
4mos ago
038.3K
VoxCPM 1.5 - 面壁智能开源的端到端文本到语音模型

VoxCPM 1.5 - Faceted Intelligence Open Source End-to-End Text-to-Speech Modeling

VoxCPM 1.5 is an open source speech generation model released by Facade Intelligence, based on text-to-speech (TTS) technology without the need for a splitter, featuring several innovations and improvements. Adopting an end-to-end diffusion autoregressive architecture, it generates continuous speech waveforms directly from text, avoiding the limitations of traditional segmentation methods...
5mos ago
043.5K
GLM-TTS - 智谱AI推出的开源工业级语音合成系统

GLM-TTS - Open Source Industrial Grade Speech Synthesis System by Smart Spectrum AI

GLM-TTS is an open source industrial-grade speech synthesis system with powerful speech synthesis capabilities. Adopting a two-stage generation architecture: the first stage will be converted to text into speech token sequences, and the second stage will be converted into high-quality audio token sequences. The system supports only 3 seconds of voice samples to complete the sound...
5mos ago
033.6K
OpenAutoGLM - 智谱AI开源的手机AI Agent模型

OpenAutoGLM - Smart Spectrum AI open source cell phone AI Agent model

OpenAutoGLM is an open source intelligent body model with the ability of "cell phone use", which can understand the content of the cell phone screen through multi-modal perception, and automatically generate the operation flow to complete the user-specified tasks. Users only need to use natural language to describe the needs, such as "open Meituan to search for nearby hot pot ...
5mos ago
035.2K
InkSight - Google开源的AI手写识别工具

InkSight - Google's open source AI handwriting recognition tool

InkSight is Google's open source AI handwriting recognition tool that converts paper handwritten notes into editable digital inked files (e.g. SVG format). Unlike traditional OCR , can recognize text content , can restore the handwriting style , paragraph structure and focus marking , support for multi-language processing .
5mos ago
028.9K
RoboCOIN - 智源联合多所高校开源的双臂机器人真机数据集

RoboCOIN - A real robot dataset for dual-armed robots open-sourced by Wisdom Source in collaboration with several universities

RoboCOIN is the world's first large-scale dual-arm robot real machine dataset open-sourced by Beijing Zhiyuan Artificial Intelligence Research Institute in conjunction with a number of enterprises and colleges and universities, which contains 15 types of robot platforms, 180,000 real operation trajectories, and 421 types of task scenarios. The most important feature is the use of hierarchical annotation system to disassemble the task ...
5mos ago
029.1K
MemMachine - MemVerge推出的开源AI记忆系统

MemMachine - Open Source AI Memory System by MemVerge

MemMachine is an open source AI memory system developed by MemVerge, designed for AI models and intelligences, which can store and recall interaction data like the human brain, solving the problem of AI "stateless memory loss". It adopts a layered architecture (short-term memory, long-term memory, user image...
5mos ago
032.2K
Vidi2 - 字节跳动开源的多模态视频理解与生成大模型

Vidi2 - ByteHop's open source multimodal video understanding and generation of large models

Vidi2 is a second-generation multimodal video understanding and generation big model open-sourced by ByteDance, focusing on video content understanding, analysis and creation. It supports joint input of text, video, and audio modalities, and can simultaneously understand picture content, sound information, and natural language commands to achieve cross-modal interaction and push...
5mos ago
030.2K
ViMax - 香港大学开源的多智能体视频生成框架

ViMax - Open Source Multi-intelligent Body Video Generation Framework at the University of Hong Kong

ViMax is an open source multi-intelligence body video generation framework from the Data Science Laboratory of the University of Hong Kong, which can automate the whole process from creative input to video output. Integration of script generation , scene design , shot planning and video rendering and other functions , to support users to generate coherent film and television grade video through natural language description ...
5mos ago
049.5K
HunyuanOCR - 腾讯混元开源的光学字符识别专家模型

HunyuanOCR - Tencent's open source expert model for optical character recognition

HunyuanOCR is a high-performance optical character recognition model open-sourced by the Tencent hybrid team, with a reference number of only 1 billion. Developed based on the hybrid multimodal architecture, it adopts an end-to-end design and can efficiently handle text detection, recognition and document parsing tasks. The model scored 94.1 points in the complex document test, surpassing...
5mos ago
037.7K
Awex - 蚂蚁集团开源的高性能权重交换框架

Awex - Ant Group open source high performance weight exchange framework

Awex is the Ant Group open source high performance weight exchange framework, designed for large-scale parameter synchronization in reinforcement learning. It can complete terabytes of parameter exchange in seconds, significantly improving the efficiency of training and inference.Awex has a very fast synchronization performance, in a thousand card cluster, trillion parameter models can be completed within 6 seconds of the full amount of...
5mos ago
084.7K
ChatTutor - 开源的AI教学辅助工具,可视化互动学习

ChatTutor - Open source AI teaching aid to visualize interactive learning

ChatTutor is an open source AI teaching aid focused on visual and interactive learning of STEM subjects. Through the multi-intelligent body architecture to achieve dialogical Q&A and dynamic drawing function, can draw math graphs, physics circuits or mind maps on the whiteboard in real time, to help users intuitively understand the abstract generalization ...
5mos ago
026K
EverMemOS - 盛大团队推出的开源长期记忆操作系统

EverMemOS - Open Source Long-Term Memory Operating System by Team Shanda

EverMemOS is an open source long-term memory operating system launched by the Shanda team led by Chen Tianqiao, designed for AI intelligences to solve the problem of memory breakage caused by the fixed context window of large language models. The system is based on the human brain memory mechanism, using a four-layer architecture (agent layer, memory layer, index layer...
5mos ago
038.8K