AI open source project

Total 1020 articles posts
Story-Adapter:根据长篇故事生成连续且风格一致的图像插画

Story-Adapter: generating continuous and consistent graphic illustrations based on a long story

General Introduction Story-Adapter is an innovative story visualization framework that converts textual stories into coherent image sequences. Developed by researchers, this project employs an iterative approach that requires no training to generate high-quality story illustrations. The framework is characterized by its ability to handle long...
1yrs ago
045.9K
Deep Recall:为大模型提供企业级记忆框架的开源工具

Deep Recall: an open source tool that provides an enterprise-class memory framework for large models

Comprehensive Introduction Deep Recall is an open source, enterprise-class memory framework designed for large-scale language models (LLMs). It provides hyper-personalized responsiveness through efficient contextual retrieval and integration. The framework uses a three-tier architecture, including a memory service, a reasoning service, and a coordinator, supporting...
8mos ago
045.9K
RLAMA:命令行操作的本地文档智能问答 RAG 系统

RLAMA: A RAG System for Intelligent Quizzing of Local Documents Operated from the Command Line

Comprehensive Introduction RLAMA is a document intelligent Q&A RAG (Retrieval Augmentation Generation) system developed open-source by DonTizi and hosted on GitHub, whose core feature lies in the realization of functionality through command line operations. Users can use simple terminal commands to connect to local ...
10mos ago
045.7K
VideoRAG:理解超长视频的RAG框架,支持多模态检索和知识图谱构建

VideoRAG: A RAG framework for understanding ultra-long videos with support for multimodal retrieval and knowledge graph construction

Comprehensive Introduction VideoRAG is a retrieval-enhanced generative framework designed for processing and understanding very long contextual videos. The tool combines a graph-driven textual knowledge base with hierarchical multimodal context encoding to efficiently process on a single NVIDIA RTX 3090 GPU...
11mos ago
045.6K
Sketch-Gen:生成高质量线稿和草图,反推图像提示词,一键安装包

Sketch-Gen: Generate high-quality line drawings and sketches, backpropagate image cue words, one-click package installation

General Introduction Sketch-Gen is an AI technology-based line drawing and sketch generation tool designed to help artists and designers quickly generate high-quality line drawings and sketches. The tool is derived from the Paints-UNDO project and utilizes advanced machine learning models that can...
1yrs ago
045.5K
Fay数字人框架:集成语言模型与3D数字角色,支持多种应用场景

Fay Digital Human Framework: Integrated language modeling and 3D digital characters to support multiple application scenarios

Comprehensive Introduction Fay is an open source 3D virtual digital human framework that integrates language models and digital characters for a variety of application scenarios, such as virtual shopping guides, virtual anchors, assistants, waiters, teachers, and voice- or text-based mobile assistants.The Fay framework supports full offline use, providing m...
1yrs ago
045.4K
FiveThirtyNine:基于搜索知识对未来事件发生概率预测

FiveThirtyNine: Predicting the probability of future events based on search knowledge

Comprehensive Introduction Forecast AI is a superb forecasting platform based on advanced artificial intelligence technology. It utilizes powerful data analytics and machine learning algorithms to provide users with highly accurate predictions of future events. Whether it's political elections, economic trends or social events, Forecast ...
1yrs ago
045.4K
Leffa:高保真模特虚拟试穿与人物姿势调整,Meta开源的可控人物图像生成模型

Leffa: High-fidelity model virtual fitting and character pose adjustment, Meta open source controllable character image generation model

Comprehensive Introduction Leffa is a unified framework for generating controllable character images, enabling precise manipulation of character appearance (e.g., virtual fitting) and pose (e.g., pose transfer). The framework significantly reduces distortion of fine-grained details by directing the target query to focus on the correct reference key in the attention layer, with ...
1yrs ago
045.3K
LangGraph Supervisor:利用监督智能体来管理多智能体协作的工具

LangGraph Supervisor: a tool for managing multi-intelligence collaboration using supervising intelligences

Comprehensive Introduction LangGraph Supervisor is a Python library based on the LangGraph framework, designed for creating and managing multi-intelligent body systems. The library coordinates the work of multiple specialized agents through a central supervisory agent, ensuring that communication flows and tasks are divided...
11mos ago
045.2K
MindSearch:开源AI搜索引擎框架,部署您自己的 Perplexity 搜索引擎!

MindSearch: open source AI search engine framework to deploy your own Perplexity search engine!

Comprehensive Introduction MindSearch is an open source AI search engine framework launched by Shanghai Artificial Intelligence Laboratory (SAL), aiming to simulate human thought process for complex information gathering and integration. The tool combines the advanced technology of large-scale language modeling (LLM) and search engine through multi-intelligence...
1yrs ago
045.2K
MoneyPrinterTurbo:输入视频主题一键生成视频文案和高清短视频

MoneyPrinterTurbo: Generate video copy and short HD videos in one click by entering a video theme

Comprehensive Introduction MoneyPrinterTurbo is an open source project that utilizes advanced AI big model technology to achieve the function of generating short HD videos with one click. Users only need to provide a video theme or keywords, the system will automatically generate video copy, video clips, video subtitles and...
10mos ago
045.1K
DCT-Net:照片和视频转绘为动漫风格化的开源工具

DCT-Net: An Open Source Tool for Transpainting Photos and Videos to Anime Stylization

Comprehensive Introduction DCT-Net is an open source project developed by DAMO Academy and Wang Xuan Institute of Computer Technology, Peking University, aiming at anime stylized transformation of images. The project utilizes deep learning techniques through Domain-Calibrated Translation (Domain-Calibrat...
1yrs ago
045.1K
Agentic Security:开源的LLM漏洞扫描工具,提供全面的模糊测试和攻击技术

Agentic Security: open source LLM vulnerability scanning tool that provides comprehensive fuzz testing and attack techniques

General Introduction Agentic Security is an open source LLM (Large Language Model) vulnerability scanning tool designed to provide developers and security professionals with comprehensive fuzz testing and attack techniques. The tool supports customized rule sets or agent-based attacks and is able to integrate LLM AP...
11mos ago
045K
StreamingT2V:从文本到长视频的动态且可扩展的生成技术

StreamingT2V: A Dynamic and Scalable Generation Technique from Text to Long Video

Comprehensive Introduction StreamingT2V is a public project developed by the Picsart AI research team focused on generating coherent, dynamic and scalable long videos based on textual descriptions. This technology uses an advanced autoregressive approach that guarantees temporal consistency of the video with the description text tightly...
1yrs ago
044.9K
DeepRant:实时翻译游戏聊天内容的开源客户端

DeepRant: An Open Source Client for Real-Time Translation of Game Chat Content

General Introduction DeepRant is an open source translation tool for gamers, designed to solve the problem of language barriers in international servers. It realizes instant translation of in-game text through shortcut keys, supports multiple languages to translate each other, and allows players to quickly understand and reply to chat messages without exiting the game...
11mos ago
044.9K
HealthGPT:支持医学图像分析与诊断问答的医疗大模型

HealthGPT: A Medical Big Model to Support Medical Image Analysis and Diagnostic Q&A

Comprehensive Introduction HealthGPT is a state-of-the-art medical grand visual language model designed to enable unified medical visual understanding and generation capabilities through heterogeneous knowledge adaptation. The goal of the project is to integrate medical visual understanding and generation capabilities into a unified autoregressive framework that significantly improves the medical graph...
11mos ago
044.9K
Moondream:批量反推图像提示词的开源轻量级视觉语言模型

Moondream: an open source lightweight visual language model for batch backpropagation of image cue words

Comprehensive Introduction Moondream is an open source lightweight visual language model designed to enable image description capabilities through deep learning and computer vision techniques. The model is able to run efficiently on a variety of platforms and is particularly suitable for edge devices.Moondream uses advanced techniques and...
1yrs ago
044.8K
VideoChat:自定义形象和音色克隆的实时语音交互数字人,支持端到端语音方案和级联方案

VideoChat: real-time voice-interactive digital person with customized image and tone cloning, supporting end-to-end voice solutions and cascading solutions

Comprehensive Introduction VideoChat is a real-time voice interaction digital person project based on open source technology, supporting both end-to-end voice schemes (GLM-4-Voice - THG) and cascade schemes (ASR-LLM-TTS-THG). The project allows users to customize the digital ...
1yrs ago
044.8K
TxAgent:帮医生分析药物作用和治疗方案的AI工具

TxAgent: the AI tool that helps doctors analyze drug effects and treatment options

Comprehensive Introduction TxAgent is an open-source AI tool developed by Harvard University's Medical and Scientific Artificial Intelligence Team (MIMS) to help physicians analyze drug interactions and develop personalized treatment plans. It combines patient-specific situations through multi-step reasoning and real-time retrieval of biomedical knowledge...
10mos ago
044.8K
MagicArticulate:将静态3D模型生成骨骼结构动画资产

MagicArticulate: generating skeletal structure animation assets from static 3D models

Comprehensive Introduction MagicArticulate is an AI framework developed by ByteDance in collaboration with Nanyang Technological University, focusing on rapidly transforming static 3D models into animation-enabled digital assets. It does this through an advanced autoregressive Transformer and functional diffusion modeling, self...
11mos ago
044.8K
xyks:小猿口算逆向笔记,逆向工程与解密算法

xyks: small ape oral math reverse notes, reverse engineering and decryption algorithms

Comprehensive Introduction Ape Mouth Calculator Reverse Notes is an open source project that aims to document and share the process and methods of reverse engineering the Ape Mouth Calculator application. The project contains a variety of reverse tools and techniques to use the instructions , such as Frida, dexdump , etc., to help users understand and crack the little ape oral math add...
1yrs ago
044.7K
AnimeGamer:用语言指令生成动漫视频和角色互动的开源工具

AnimeGamer: An Open Source Tool for Generating Anime Videos and Character Interactions with Language Commands

AnimeGamer is an open source tool launched by Tencent ARC Lab. Users can generate anime videos with simple language commands, such as "Sousuke drive around in a purple car", as well as allow different anime characters to interact with each other, such as Kiki from The Witch's House, and Sky City...
9mos ago
044.7K
Magic 1-For-1: 高效生成视频的开源项目,号称在一分钟内生成一分钟的视频

Magic 1-For-1: efficient generation of video open source project that claims to generate a minute of video in one minute

Comprehensive Introduction Magic 1-For-1 is an efficient video generation model designed to optimize memory usage and reduce inference latency. The model decomposes the text-to-video generation task into two subtasks: text-to-image generation and image-to-video generation, enabling more efficient training and distillation...
11mos ago
044.7K
XRAG:优化检索增强生成系统的可视化评估工具

XRAG: A Visual Evaluation Tool for Optimizing Retrieval Enhancement Generation Systems

Comprehensive Introduction XRAG (eXamining the Core) is a benchmarking framework designed for evaluating the underlying components of advanced retrieval augmentation generation (RAG) systems. By profiling and analyzing each core module, XRAG provides information on how different configurations and components affect RAG...
12mos ago
044.6K
CogVLM2:开源多模态模型,支持视频理解与多轮对话

CogVLM2: Open Source Multimodal Modeling with Support for Video Comprehension and Multi-Round Dialogue

Comprehensive Introduction CogVLM2 is an open source multimodal model developed by the Tsinghua University Data Mining Research Group (THUDM), based on the Llama3-8B architecture, and designed to provide performance comparable to or even better than GPT-4V. The model supports image understanding, multi-round dialogs, and visual ...
11mos ago
044.6K
Harbor:一键部署本地LLM开发环境,轻松管理和运行AI服务的容器化工具集

Harbor: a containerized toolset for easily managing and running AI services with one-click deployment of local LLM development environments

Comprehensive Introduction Harbor is a revolutionary containerized LLM toolset focused on simplifying the deployment and management of local AI development environments. It enables developers with a clean command line interface (CLI) and companion application to launch and manage with a single click, including LLM backends, API interfaces, front...
1yrs ago
044.5K
PromptWizard:优化提示工程的开源框架,提升任务性能

PromptWizard: an open source framework for optimizing prompt projects to improve task performance

Comprehensive Introduction PromptWizard is an open source framework developed by Microsoft that uses a self-evolutionary mechanism that allows the model to generate, evaluate, and improve prompt words and generate examples on its own, improving the quality of the output through continuous feedback. It can autonomously optimize the prompt words, generate and select appropriate examples, and...
1yrs ago
044.4K