AI open source project

Total 1020 articles posts
XiaoYuanKouSuan_Auto:小猿口算自动答题工具,高效解决口算题目

XiaoYuanKouSuan_Auto: XiaoYuanKouSuan automatic question and answer tool, efficiently solving oral arithmetic questions

Comprehensive introduction Ape Mouth Calculator Automatic Question Answer Tool is a Python based open source project designed to efficiently solve the questions in the Ape Mouth Calculator application through OCR recognition and automation scripts. The tool utilizes technologies such as OpenCV and Tesseract to be able to recognize the questions on the screen in real time...
10mos ago
02.7K
InstantID:上传一张图片,迁移人像特征来生成不同风格图片

InstantID: upload an image and migrate the portrait features to generate different styles of images

Comprehensive Introduction InstantID is an advanced technology focused on generating images with personalized styles or poses in seconds while ensuring a high level of fidelity using a single reference ID picture. The technology employs a diffusion model-based solution by integrating facial images, landmark maps...
12mos ago
02.7K
Agentic Security:开源的LLM漏洞扫描工具,提供全面的模糊测试和攻击技术

Agentic Security: open source LLM vulnerability scanning tool that provides comprehensive fuzz testing and attack techniques

General Introduction Agentic Security is an open source LLM (Large Language Model) vulnerability scanning tool designed to provide developers and security professionals with comprehensive fuzz testing and attack techniques. The tool supports customized rule sets or agent-based attacks and is able to integrate LLM AP...
6mos ago
02.7K
ExtractThinker:提取和分类文档为结构化数据,优化文档处理流程

ExtractThinker: extracting and classifying documents into structured data to optimize the document processing flow

Comprehensive Introduction ExtractThinker is a flexible document intelligence tool that utilizes Large Language Models (LLMs) to extract and classify structured data from documents, providing a seamless ORM-like document processing workflow. It supports a variety of document loaders, including Tess...
7mos ago
02.7K
Harbor:一键部署本地LLM开发环境,轻松管理和运行AI服务的容器化工具集

Harbor: a containerized toolset for easily managing and running AI services with one-click deployment of local LLM development environments

Comprehensive Introduction Harbor is a revolutionary containerized LLM toolset focused on simplifying the deployment and management of local AI development environments. It enables developers with a clean command line interface (CLI) and companion application to launch and manage with a single click, including LLM backends, API interfaces, front...
7mos ago
02.7K
Flow(Laminar):构建智能体的轻量级任务引擎,简化并灵活管理任务

Flow (Laminar): a lightweight task engine for building intelligences that simplifies and flexibly manages tasks

Comprehensive Introduction Flow is a lightweight task engine designed for building AI agents, emphasizing simplicity and flexibility. Unlike traditional node- and edge-based workflows, Flow uses a dynamic task queuing system that supports parallel execution, dynamic scheduling, and intelligent dependency management. Its core concept is ...
8mos ago
02.7K
Orion:小米开源的端到端自动驾驶推理与规划框架

Orion: Xiaomi's Open Source End-to-End Autonomous Driving Reasoning and Planning Framework

Comprehensive Introduction Orion is an open source project developed by Xiaomi Labs, focusing on end-to-end (E2E) autonomous driving technology. It solves the problem of insufficient causal reasoning in complex scenarios of traditional autonomous driving approaches through visual language modeling (VLM) and generative planners.Orion integrates long...
4mos ago
02.7K
MedRAX: 利用多模态大模型进行胸部X光片分析的智能体

MedRAX: A Smart Body for Chest X-ray Analysis Using Multimodal Large Models

Comprehensive Introduction MedRAX is a state-of-the-art AI intelligence designed for chest radiograph (CXR) analysis. It integrates state-of-the-art CXR analysis tools and multimodal large language models to dynamically process complex medical queries without additional training.MedRAX, through its modular design...
5mos ago
02.7K
FramePack:6G低显存快速生成长视频的开源项目

FramePack: 6G low graphics memory fast raw long video open source project

General Introduction FramePack is an open source video generation tool focused on making video diffusion techniques more practical. It decouples the generation workload from the video length by compressing the input frames to a fixed length through a unique next frame prediction neural network. This means that even when generating long videos, the video memory requirements...
3mos ago
02.7K
Magic 1-For-1: 高效生成视频的开源项目,号称在一分钟内生成一分钟的视频

Magic 1-For-1: efficient generation of video open source project that claims to generate a minute of video in one minute

Comprehensive Introduction Magic 1-For-1 is an efficient video generation model designed to optimize memory usage and reduce inference latency. The model decomposes the text-to-video generation task into two subtasks: text-to-image generation and image-to-video generation, enabling more efficient training and distillation...
6mos ago
02.7K
SmartRead:自动标注技术PDF文档并提供相关引用源

SmartRead: Automatically annotate technical PDF documents and provide relevant citation sources

Comprehensive Introduction SmartRead is an AI-based open source tool designed for technical documents. It can automatically analyze PDF files, mark key content, such as important terms, titles or core ideas to help users quickly understand complex documents. At the same time, it can also provide with the main document...
5mos ago
02.7K
MM-EUREKA:探索视觉推理的多模态强化学习工具

MM-EUREKA: A Multimodal Reinforcement Learning Tool for Exploring Visual Reasoning

Comprehensive Introduction MM-EUREKA is an open source project developed by Shanghai Artificial Intelligence Laboratory, Shanghai Jiao Tong University and other parties. It extends textual reasoning capabilities to multimodal scenarios through rule-based reinforcement learning techniques to help models process image and textual information. The core of this tool...
5mos ago
02.7K
RAG Web UI:构建智能文档问答系统,简单构建私有Web端知识库

RAG Web UI: Building an Intelligent Documentation Q&A System and Simply Building a Private Web-Side Knowledge Base

Comprehensive Introduction RAG Web UI is an intelligent dialog system based on RAG (Retrieval Augmented Generation) technology. It helps organizations and individuals build intelligent Q&A systems based on their own knowledge base. By combining document retrieval and large language modeling, RAG Web UI provides accurate and reliable...
7mos ago
02.7K
FinGPT:开源金融大语言模型平台,助力金融分析与预测

FinGPT: Open Source Financial Big Language Modeling Platform for Financial Analytics and Prediction

Comprehensive Introduction FinGPT is an open source financial big language modeling platform developed by the AI4Finance Foundation, designed for the financial sector to solve complex financial tasks and drive innovation in fintech.FinGPT utilizes lightweight adaptation techniques and reinforcement learning approaches...
7mos ago
02.7K
VoAPI:高颜值的AI模型转发接口管理系统,官网每日提供免费API额度

VoAPI: High-value AI model forwarding interface management system, the official website provides free API quota on a daily basis

Comprehensive Introduction VoAPI is a new high-color and high-performance AI model interface management and distribution system, which is mainly used for personal or enterprise internal management and distribution channels. Developed based on NewAPI, the system provides rich functional modules and optimized user interface, aiming to enhance...
9mos ago
02.7K
AgentClientDemo:演示智能体运行过程的Python客户端,提供直观的图形用户界面

AgentClientDemo: a Python client that demonstrates the process of running an intelligent body, providing an intuitive graphical user interface

Comprehensive Introduction AgentClientDemo is a comprehensive Python project that integrates intelligent (Agent) and client (Client) functionality. The project is based on the PyQt framework and provides an intuitive and easy-to-use graphical user interface (G...
8mos ago
02.7K
opensource_notebooklm:基于Deepseek-V3和PlayHT TTS的NotebookLM开源实现

opensource_notebooklm: open source implementation of NotebookLM based on Deepseek-V3 and PlayHT TTS

General Introduction Open Source NotebookLM is an innovative artificial intelligence project that combines Deepseek-V3's language understanding capabilities with PlayHT's speech synthesis technology, aiming to create an intelligent note-taking conversation system. The project was developed by Build Fast w...
7mos ago
02.7K
TryOffAnyone:从人物身上提取服装为平铺服装展示图的AI工具

TryOffAnyone: AI tool for extracting garments from a person as a tiled garment display image

Comprehensive Introduction TryOffAnyone is a breakthrough AI image processing tool specialized in solving the challenges of clothing display in the e-commerce field. It is able to intelligently convert photos of clothes in real people's wearing state into lay-flat display effect images, this technology is based on the latest Latent Dif...
7mos ago
02.7K
muAgent:由 LLM 和 EKG(行业知识)驱动的全新Agent编排框架

muAgent: A New Agent Orchestration Framework Driven by LLM and EKG (Industry Knowledge)

General Introduction muAgent is an innovative multi-intelligentsia framework developed by Ant Group. The framework collaborates with multi-intelligentsia, function calls, code interpreters and other technologies through canvas drag-and-drop and simple text writing to help users execute various complex standard operating procedures (SOPs) under human guidance...
9mos ago
02.7K
LangGraph Supervisor:利用监督智能体来管理多智能体协作的工具

LangGraph Supervisor: a tool for managing multi-intelligence collaboration using supervising intelligences

Comprehensive Introduction LangGraph Supervisor is a Python library based on the LangGraph framework, designed for creating and managing multi-intelligent body systems. The library coordinates the work of multiple specialized agents through a central supervisory agent, ensuring that communication flows and tasks are divided...
6mos ago
02.7K
NVIDIA Garak:检测LLM漏洞的开源工具,确保生成式AI的安全性

NVIDIA Garak: Open-source tool to detect LLM vulnerabilities and secure generative AI

Comprehensive Introduction NVIDIA Garak is an open source tool that specializes in detecting vulnerabilities in Large Language Models (LLMs). It checks the model for multiple weaknesses such as illusions, data leakage, hint injection, error message generation, harmful content generation, etc. through static, dynamic and adaptive probing...
9mos ago
02.7K
Mini-Cover:在线封面制作,专为博客、短视频、社交媒体等生成个性化封面

Mini-Cover: online cover creation, designed to generate personalized covers for blogs, short videos, social media and more

General Introduction Mini-Cover is an open source online cover generation tool designed to generate personalized covers for platforms such as blogs, short videos and social media. Developed by JLinMr, the tool aims to provide a simple and efficient solution to help users quickly generate covers that meet their needs...
8mos ago
02.6K
小半 WordPress AI 助手:实现对话、文章生成与翻译的 WordPress AI助手插件

Little Half WordPress AI Assistant: A WordPress AI Assistant Plugin for Conversation, Post Generation and Translation

Comprehensive Introduction WordPress AI Assistant Plugin (wp-ai-chat) is an open source WordPress plugin designed to provide users with a variety of AI features, including AI conversations, article generation, article summarization, article translation and content reading. The plugin supports docking multiple ...
6mos ago
02.6K
AI投资系统:自动化A股投资决策系统,利用多智能体系统分析市场数据

AI investment system: automated A-share investment decision-making system that utilizes a multi-intelligence system to analyze market data

Comprehensive Introduction A_Share_investment_Agent is an A-share investment decision aid based on a multi-intelligence system. The system is designed to analyze market data, calculate the intrinsic value of stocks, analyze market sentiment, and fundamental data through multiple collaborative intelligences to...
7mos ago
02.6K
Megrez-3B-Omni:端侧多模态理解模型,支持文本、图像、音频多模态理解和分析

Megrez-3B-Omni: an end-side multimodal understanding model supporting text, image, and audio multimodal understanding and analysis

Comprehensive Introduction Infini-Megrez is an edge intelligence solution developed by the unquestioned core dome (Infinigence AI), aiming to achieve efficient multimodal understanding and analysis through hardware and software co-design. At the core of the project is the Megrez-3B model, which supports graph...
7mos ago
02.6K
wdoc:从海量、多源文档中检索内容并总结知识

wdoc: retrieve content and summarize knowledge from massive, multi-source documents

Comprehensive Introduction wdoc is a powerful RAG (Retrieval Augmentation Generation) system designed for processing and analyzing large and diverse documents. It is capable of retrieving from a wide range of document types, including PDFs, web pages, YouTube videos, audio files, etc. wdoc is particularly well suited for processing...
6mos ago
02.6K
Sketch-Gen:生成高质量线稿和草图,反推图像提示词,一键安装包

Sketch-Gen: Generate high-quality line drawings and sketches, backpropagate image cue words, one-click package installation

General Introduction Sketch-Gen is an AI technology-based line drawing and sketch generation tool designed to help artists and designers quickly generate high-quality line drawings and sketches. The tool is derived from the Paints-UNDO project and utilizes advanced machine learning models that can...
8mos ago
02.6K
中文基于满血 DeepSeek-R1 蒸馏数据集,支持中文R1蒸馏SFT数据集

Chinese based full-blooded DeepSeek-R1 distillation dataset, supports Chinese R1 distillation SFT dataset

Comprehensive Introduction The Chinese DeepSeek-R1 distillation dataset is an open source Chinese dataset containing 110K pieces of data designed to support machine learning and natural language processing research. The dataset is released by Cong Liu's NLP team. The dataset contains not only mathematical data, but also a large number of general types...
6mos ago
02.6K
Linly-Talker:数字人智能对话系统,结合大语言模型与视觉模型,实现互动新体验

Linly-Talker: An Intelligent Dialogue System for Digital People, Combining Big Language Modeling and Visual Modeling for a New Interactive Experience

Comprehensive Introduction Linly-Talker is an innovative digital human dialog system that combines Large Language Models (LLMs) with visual models to create a novel approach to human-computer interaction. The system integrates a variety of technologies such as Whisper, Linly, Micros...
6mos ago
02.6K
zChunk:基于Llama-70B的通用语义分块策略

zChunk: a generic semantic chunking strategy based on Llama-70B

Comprehensive Introduction zChunk is a novel chunking strategy developed by ZeroEntropy that aims to provide a solution for generic semantic chunking. The strategy is based on the Llama-70B model, which optimizes the chunking process of documents by prompting for chunks to be generated, ensuring that information retrieval is maintained at a high...
6mos ago
02.6K
FoloUp:开源AI语音面试平台,生成定制面试题并进行智能分析

FoloUp: Open Source AI Voice Interview Platform Generates Customized Interview Questions and Performs Intelligent Analysis

General Introduction FoloUp is an open source platform that specializes in AI-powered voice interview solutions for enterprises. With FoloUp, enterprises can quickly generate customized interview questions for job descriptions and conduct natural conversational interviews with AI. The platform also provides detailed interview analysis...
5mos ago
02.6K
MegaTTS3:合成中英文语音的轻量模型

MegaTTS3: A Lightweight Model for Synthesizing Chinese and English Speech

Comprehensive Introduction MegaTTS3 is an open source speech synthesis tool developed by ByteDance in cooperation with Zhejiang University, focusing on generating high-quality Chinese and English speech. Its core model is only 0.45B parameters , lightweight and efficient , support for mixed Chinese and English speech generation and speech cloning . The project is hosted on ...
5mos ago
02.6K