AI open source project

Total 1020 articles posts
ColorFlow:漫画着色,黑白图像自动着色,提升图像色彩一致性和质量

ColorFlow: Comic book coloring, automatic coloring of black and white images to improve image color consistency and quality

Comprehensive Introduction ColorFlow is an image sequence auto-coloring tool developed by Tencent's ARC team to solve the problem of auto-coloring black and white image sequences. The tool utilizes a retrieval-enhanced coloring pipeline to accurately generate the colors of various elements through a pool of reference images, including the character's hair color and service...
1yrs ago
070.5K
Genesis:开源生成式物理引擎,实现基于真实物理的4D动态世界模拟

Genesis: open source generative physics engine for real physics-based 4D dynamic world simulation

General Introduction Genesis is a generative physics world designed for general purpose robotics and embodied AI learning. It provides a unified simulation platform that supports the simulation of a wide range of materials and physical phenomena.Genesis aims to unlock generative AI and physics simulation by combining...
1yrs ago
070.4K
Fay数字人框架:集成语言模型与3D数字角色,支持多种应用场景

Fay Digital Human Framework: Integrated language modeling and 3D digital characters to support multiple application scenarios

Comprehensive Introduction Fay is an open source 3D virtual digital human framework that integrates language models and digital characters for a variety of application scenarios, such as virtual shopping guides, virtual anchors, assistants, waiters, teachers, and voice- or text-based mobile assistants.The Fay framework supports full offline use, providing m...
1yrs ago
070.3K
Perplexica:1比1复刻 Perplexity AI 功能和界面的开源AI搜索引擎

Perplexica: an open source AI search engine that replicates Perplexity AI's features and interface 1 to 1

Comprehensive Introduction Perplexica is an open source AI-driven search engine designed to provide answers that delve deep into the Internet. It uses advanced machine learning algorithms, such as similarity search and embedding techniques, to optimize search results and provide clear answers with cited sources.Perple...
1yrs ago
070.2K
UltraRAG:一站式RAG系统解决方案,简化数据构建与模型微调

UltraRAG: A One-Stop RAG System Solution to Simplify Data Construction and Model Fine-Tuning

Comprehensive Introduction UltraRAG is a RAG (Retrieval Augmented Generation) system solution jointly proposed by the THUNLP group at Tsinghua University, the NEUIR group at Northeastern University, Modelbest.Inc and the 9#AISoft team. The framework is based on agile deployment and modularized building...
1yrs ago
070.1K
飞桨 PP-TableMagic:复杂表格结构化信息提取神器

Flying Paddle PP-TableMagic: Structured Information Extraction for Complex Tables

The goal of table recognition is to parse tables in images, accurately identify table structures and cell locations, and reduce them to structured table formats (e.g., HTML). In today's information age, a large amount of important tabular data still exists in an unstructured state (e.g., scanned documents with pictures of statistical tables...).
1yrs ago
070.1K
MedRAX: 利用多模态大模型进行胸部X光片分析的智能体

MedRAX: A Smart Body for Chest X-ray Analysis Using Multimodal Large Models

Comprehensive Introduction MedRAX is a state-of-the-art AI intelligence designed for chest radiograph (CXR) analysis. It integrates state-of-the-art CXR analysis tools and multimodal large language models to dynamically process complex medical queries without additional training.MedRAX, through its modular design...
1yrs ago
070.1K
Sana:快速生成高分辨率图像,0.6B超小尺寸模型,低配笔记本GPU运行

Sana: fast generation of high-resolution images, 0.6B ultra-small size model, low-profile laptop GPU operation

General Introduction Sana is an efficient high-resolution image generation framework developed by NVIDIA Labs, capable of generating images up to 4096 × 4096 resolution in a matter of seconds.Sana utilizes a linear diffusion transformer and deep compression self-encoder technology to significantly...
1yrs ago
069.7K
Ant Design X:快速构建AI聊天界面的工具包,支持模型集成和数据流管理。

Ant Design X: A toolkit for rapidly building AI chat interfaces with support for model integration and data flow management.

Comprehensive Introduction Ant Design X is a toolkit open-sourced by Ant Group, designed to help developers quickly build AI-driven dialog interfaces. It provides a rich set of components and templates, supports model integration compatible with OpenAI standards, and is suitable for a variety of applications such as intelligent customer service, AI assistants, and other...
1yrs ago
069K
Hibiki:实时语音翻译模型,保留原声特点的流式翻译

Hibiki: a real-time speech translation model, streaming translation that preserves the characteristics of the original voice

General Introduction Hibiki is a high-fidelity real-time speech translation model developed by Kyutai Labs. Unlike traditional offline translation, Hibiki is able to generate natural speech translation in the target language and provide text translation in real time while the user is speaking. The model...
1yrs ago
069K
Flow(Laminar):构建智能体的轻量级任务引擎,简化并灵活管理任务

Flow (Laminar): a lightweight task engine for building intelligences that simplifies and flexibly manages tasks

Comprehensive Introduction Flow is a lightweight task engine designed for building AI agents, emphasizing simplicity and flexibility. Unlike traditional node- and edge-based workflows, Flow uses a dynamic task queuing system that supports parallel execution, dynamic scheduling, and intelligent dependency management. Its core concept is ...
1yrs ago
068.9K
NodeRAG:基于异构图的精准信息检索与生成工具

NodeRAG: A Heterogeneous Graph-Based Tool for Accurate Information Retrieval and Generation

A Comprehensive Introduction NodeRAG is an open source Retrieval Augmented Generation (RAG) system hosted on GitHub and developed by Terry-Xu-666. It optimizes information retrieval and generation through heterogeneous graph structures, significantly improving retrieval accuracy and contextual relevance.Nod...
1yrs ago
068.7K
TxAgent:帮医生分析药物作用和治疗方案的AI工具

TxAgent: the AI tool that helps doctors analyze drug effects and treatment options

Comprehensive Introduction TxAgent is an open-source AI tool developed by Harvard University's Medical and Scientific Artificial Intelligence Team (MIMS) to help physicians analyze drug interactions and develop personalized treatment plans. It combines patient-specific situations through multi-step reasoning and real-time retrieval of biomedical knowledge...
1yrs ago
068.5K
Ultravox:实时端到端语音对话的音频多模态大模型,GPT-4o语音交互的开源实现

Ultravox: an audio multimodal macromodel for real-time end-to-end voice dialog, an open source implementation of GPT-4o voice interaction

Comprehensive Introduction Ultravox is an innovative multimodal Large Language Model (LLM) designed for real-time speech processing. Unlike traditional speech recognition systems, Ultravox eliminates the need for a separate Audio Speech Recognition (ASR) stage, and is able to directly convert audio into high-dimensional space in...
1yrs ago
068.3K
AI2SRT:利用 Gemini模型,一键为长视频创建解说短视频或视频总结

AI2SRT: Create short narrated videos or video summaries for long videos with one click using Gemini models

Comprehensive Introduction AI2SRT is an open source project that utilizes the GeminiAI Big Model to generate short narrated videos and video summaries for long videos with one click, while supporting audio and video transcription subtitles. The project aims to simplify the video content creation process and provide efficient subtitle generation and translation functions. Users can pass...
1yrs ago
068.3K
MMAudio:为视频画面生成同步音效与配乐,视频到音频的多模态联合训练工具

MMAudio: generating synchronized sound effects and soundtracks for video footage, video-to-audio multimodal co-training tool

General Introduction MMAudio is an open-source project aiming to generate high-quality synchronized audio through joint multimodal training. Developed by Ho Kei Cheng et al. at the Chinese University of Hong Kong, the project's main function is to generate synchronized audio based on video and/or text input.MM...
1yrs ago
068.3K
ModelBest(面壁智能):全球领先的轻量高性能端侧大模型

ModelBest: The World's Leading Lightweight, High-Performance End-Side Big Model

General Introduction ModelBest is a company specializing in developing lightweight and high-performance large models, dedicated to applying advanced AI technologies to mainstream consumer electronics and various end devices in daily life. Its MiniCPM series of end-side models are characterized by extreme arithmetic power and memory usage efficiency...
2yrs ago
068.2K
CogAgent:智谱开源的智能视觉语言模型,实现图形界面自动化操作

CogAgent: Smart Spectrum's open source intelligent visual language model for automating graphical interfaces

Comprehensive Introduction CogAgent is an open source visual language model developed by Tsinghua University Data Mining Research Group (THUDM), aiming to automate the operation of cross-platform graphical user interface (GUI). The model is based on CogVLM (GLM-4V-9B) and supports bilingual Chinese and English...
1yrs ago
068K
LazyLLM:商汤开源构建多智能体应用的低代码开发工具

LazyLLM: Shangtang's open source low-code development tool for building multi-intelligence body applications

Comprehensive Introduction LazyLLM is an open source tool developed by the LazyAGI team, focusing on simplifying the development process of multi-intelligence large model applications. It helps developers quickly build complex AI applications through one-click deployment and lightweight gateway mechanisms, saving tedious engineering configuration...
1yrs ago
067.8K
Infinity:生成高分辨率图像的比特自回归建模,实现无限制高分辨率图像生成

Infinity: bitwise autoregressive modeling for generating high-resolution images for unlimited high-resolution image generation

General Introduction Infinity is a groundbreaking high-resolution image generation framework developed by the FoundationVision team. The project breaks through the limitations of traditional image generation models through an innovative bit-level visual autoregressive modeling approach.The core features of Infinity...
1yrs ago
067.7K
VideoRAG:理解超长视频的RAG框架,支持多模态检索和知识图谱构建

VideoRAG: A RAG framework for understanding ultra-long videos with support for multimodal retrieval and knowledge graph construction

Comprehensive Introduction VideoRAG is a retrieval-enhanced generative framework designed for processing and understanding very long contextual videos. The tool combines a graph-driven textual knowledge base with hierarchical multimodal context encoding to efficiently process on a single NVIDIA RTX 3090 GPU...
1yrs ago
067.6K
SegAnyMo:从视频中自动分割任意运动物体的开源工具

SegAnyMo: open source tool to automatically segment arbitrary moving objects from video

General Introduction SegAnyMo is an open source project developed by a team of researchers at UC Berkeley and Peking University, including members such as Nan Huang. This tool focuses on video processing and can automatically recognize and segment arbitrary moving objects in a video, such as people, animals or...
1yrs ago
067.5K
YuE:将歌词转化为完整歌曲的基础模型,支持多种音乐风格

YuE: Transforms lyrics into a base model of a complete song, supporting a wide range of musical styles

General Introduction YuE is an open source full song generation base model that focuses on transforming lyrics into full songs. Unlike other models that can only generate short snippets of non-vocal music, YuE is capable of generating full songs with lead and backing vocals up to several minutes in length. The model addresses music generation in...
1yrs ago
067.3K
xyks:小猿口算逆向笔记,逆向工程与解密算法

xyks: small ape oral math reverse notes, reverse engineering and decryption algorithms

Comprehensive Introduction Ape Mouth Calculator Reverse Notes is an open source project that aims to document and share the process and methods of reverse engineering the Ape Mouth Calculator application. The project contains a variety of reverse tools and techniques to use the instructions , such as Frida, dexdump , etc., to help users understand and crack the little ape oral math add...
2yrs ago
067.2K
Amurex:开源AI会议记录助手,自动记录会议内容生成总结

Amurex: open source AI meeting recording assistant, automatic recording of meeting content to generate summaries

General Introduction Amurex is an open source AI meeting assistant developed by The Personal AI Company that aims to improve meeting efficiency through intelligent features.Amurex can provide real-time suggestions, generate intelligent summaries, record meeting content, and automatically send follow...
1yrs ago
067.2K
Deep Recall:为大模型提供企业级记忆框架的开源工具

Deep Recall: an open source tool that provides an enterprise-class memory framework for large models

Comprehensive Introduction Deep Recall is an open source, enterprise-class memory framework designed for large-scale language models (LLMs). It provides hyper-personalized responsiveness through efficient contextual retrieval and integration. The framework uses a three-tier architecture, including a memory service, a reasoning service, and a coordinator, supporting...
12mos ago
066.8K
DeepRant:实时翻译游戏聊天内容的开源客户端

DeepRant: An Open Source Client for Real-Time Translation of Game Chat Content

General Introduction DeepRant is an open source translation tool for gamers, designed to solve the problem of language barriers in international servers. It realizes instant translation of in-game text through shortcut keys, supports multiple languages to translate each other, and allows players to quickly understand and reply to chat messages without exiting the game...
1yrs ago
066.7K
AnimeGamer:用语言指令生成动漫视频和角色互动的开源工具

AnimeGamer: An Open Source Tool for Generating Anime Videos and Character Interactions with Language Commands

AnimeGamer is an open source tool launched by Tencent ARC Lab. Users can generate anime videos with simple language commands, such as "Sousuke drive around in a purple car", as well as allow different anime characters to interact with each other, such as Kiki from The Witch's House, and Sky City...
1yrs ago
066.7K