AI open source project

Total 1020 articles posts
TransRouter:基于Gemini多模态模型,实时中英互译的音频转换工具

TransRouter: A Real-Time Audio Conversion Tool for Chinese-to-English Translation Based on Gemini Multimodal Modeling

TransRouter is a real-time voice translation tool based on Google's Gemini model, specifically designed for real-time voice translation between English and Chinese. The tool can be seamlessly integrated into video conferencing software such as Zoom, providing an easy way for cross-language...
1yrs ago
058.9K
GenXD:生成任意3D和4D场景视频的开源框架

GenXD: open source framework for generating videos of arbitrary 3D and 4D scenes

General Introduction GenXD is an open source project, developed by the National University of Singapore (NUS) and Microsoft team. It focuses on generating arbitrary 3D and 4D scenes , to solve the real-world 3D and 4D generation due to insufficient data and model design complexity brought about by the problem . The project was developed by ...
1yrs ago
058.9K
VoAPI:高颜值的AI模型转发接口管理系统,官网每日提供免费API额度

VoAPI: High-value AI model forwarding interface management system, the official website provides free API quota on a daily basis

Comprehensive Introduction VoAPI is a new high-color and high-performance AI model interface management and distribution system, which is mainly used for personal or enterprise internal management and distribution channels. Developed based on NewAPI, the system provides rich functional modules and optimized user interface, aiming to enhance...
1yrs ago
058.9K
HealthGPT:支持医学图像分析与诊断问答的医疗大模型

HealthGPT: A Medical Big Model to Support Medical Image Analysis and Diagnostic Q&A

Comprehensive Introduction HealthGPT is a state-of-the-art medical grand visual language model designed to enable unified medical visual understanding and generation capabilities through heterogeneous knowledge adaptation. The goal of the project is to integrate medical visual understanding and generation capabilities into a unified autoregressive framework that significantly improves the medical graph...
1yrs ago
058.8K
Knowledge Table:高效提取与探索结构化数据的开源工具

Knowledge Table: an open source tool for efficient extraction and exploration of structured data

Comprehensive Introduction Knowledge Table (Knowledge Table) is an open source project designed to simplify the process of extracting and exploring structured data from unstructured documents. Users can create structured knowledge representations such as tables and graphs through a natural language query interface. The tool supports customizing the extraction ...
1yrs ago
058.5K
Diffbot GraphRAG LLM:依赖外部实时知识图谱数据的LLM推理服务

Diffbot GraphRAG LLM: LLM reasoning service relying on external real-time knowledge graph data

Comprehensive Introduction Diffbot LLM Reasoning Server is an innovative large-scale language modeling system with special optimizations and improvements based on the LLama model architecture. The most important feature of the project is the integration of real-time Knowledge Graph with retrieval-enhanced generation...
1yrs ago
058.5K
StreamingT2V:从文本到长视频的动态且可扩展的生成技术

StreamingT2V: A Dynamic and Scalable Generation Technique from Text to Long Video

Comprehensive Introduction StreamingT2V is a public project developed by the Picsart AI research team focused on generating coherent, dynamic and scalable long videos based on textual descriptions. This technology uses an advanced autoregressive approach that guarantees temporal consistency of the video with the description text tightly...
1yrs ago
058.4K
VideoMind:视频按时间戳定位内容与问答的开源项目

VideoMind: video by timestamp positioning content and Q&A open source project

General Introduction VideoMind is an open source multimodal AI tool focused on inference, Q&A and summary generation for long videos. It was developed by Ye Liu of the Hong Kong Polytechnic University and a team from Show Lab at the National University of Singapore. The tool mimics human understanding of video...
10mos ago
058.3K
Confident AI:自动化大语言模型评估框架,对比不同大模型提示词输出质量

Confident AI: A Framework for Automated Large Language Model Evaluation, Comparing the Output Quality of Different Large Model Cue Words

Comprehensive Introduction DeepEval is an easy-to-use open source LLM evaluation framework for evaluating and testing large language modeling systems. It is similar to Pytest, but focuses on unit testing of LLM output.DeepEval combines the latest research results through G-Eval, phantom...
1yrs ago
058.3K
ScrapeGraphAI:一个提示词搞定网页抓取,无需编写规则智能网页内容提取工具

ScrapeGraphAI: A single cue word for web crawling, no need to write rules intelligent web content extraction tools

Comprehensive Introduction ScrapeGraphAI is an innovative Python web crawling library that cleverly combines Large Language Modeling (LLM) and Direct Graph Logic to create crawling pipelines for websites and local documents. The uniqueness of this tool lies in its perfect level of simplicity and power...
1yrs ago
058.3K
AI Podcast Generator:自动抓取新闻生成音频播客

AI Podcast Generator: Automatically Capturing News to Generate Audio Podcasts

General Introduction AI Podcast Generator is an intelligent podcast generation tool that utilizes advanced AI technology to automatically create engaging audio content from web sources. The system generates natural flowing narratives by capturing news content and converting it into audio podcasts. The project is based on Next...
1yrs ago
058.3K
JoyGen:音频驱动的3D深度感知人像说话视频编辑工具

JoyGen: Audio-Driven 3D Depth-Sensitive Portrait Talking Video Editing Tool

Comprehensive Introduction JoyGen is an innovative two-stage video generation framework for talking faces, focusing on solving the problem of audio-driven facial expression generation. Developed by a team from Jingdong Technology, the project uses advanced 3D reconstruction techniques and audio feature extraction methods to accurately capture the identity characteristics of the speaker and the expression...
1yrs ago
058.1K
AIEvo:创建多智能体协作应用的高效框架

AIEvo: An Efficient Framework for Creating Multi-Intelligent Collaborative Applications

General Introduction AIEvo is Ant Group's open source multi-agent framework designed to efficiently create multi-agent applications. The framework strictly follows the SOP task graph to improve the execution success rate of complex tasks , and through feedback and monitoring mechanisms to ensure high flexibility and scalability.AIEvo has been produced within Ant Group ...
1yrs ago
058.1K
CR-Mentor:知识库+LLM 驱动的GitHub智能代码审查导师

CR-Mentor: Knowledge Base + LLM Driven Intelligent Code Review Mentor for GitHub

Comprehensive Introduction CR-Mentor is an intelligent code review tool that combines a specialized knowledge base with the power of Large Language Modeling (LLM). It not only supports code review for all programming languages, but also customizes exclusive review criteria and focus areas for teams based on best practices accumulated in the knowledge base. Through...
1yrs ago
058K
TankWork:通过语音和文字操作电脑,并提供实时语音反馈的智能体

TankWork: an intelligent body that operates computers via voice and text and provides real-time voice feedback

General Introduction TankWork is an open source desktop agent framework designed to enable AI to perceive and control your computer through computer vision and system-level interaction. The framework allows agents to directly control computers through voice and text commands, process real-time screen content, and provide continuous audio visual...
1yrs ago
058K
Mini-Cover:在线封面制作,专为博客、短视频、社交媒体等生成个性化封面

Mini-Cover: online cover creation, designed to generate personalized covers for blogs, short videos, social media and more

General Introduction Mini-Cover is an open source online cover generation tool designed to generate personalized covers for platforms such as blogs, short videos and social media. Developed by JLinMr, the tool aims to provide a simple and efficient solution to help users quickly generate covers that meet their needs...
1yrs ago
058K
Swarm:学习轻量级多智能体系统的实验性教学项目(OpenAI示例)

Swarm: an experimental pedagogical program for learning lightweight multi-intelligent body systems (OpenAI example)

General Introduction Swarm is an experimental educational framework developed by OpenAI to explore lightweight, controlled, and easy-to-test interfaces for multi-agent systems. The framework is primarily used to demonstrate handoffs and routine patterns between agents to help developers understand and implement the coordination and execution of multi-agent systems...
1yrs ago
058K
TryOffAnyone:从人物身上提取服装为平铺服装展示图的AI工具

TryOffAnyone: AI tool for extracting garments from a person as a tiled garment display image

Comprehensive Introduction TryOffAnyone is a breakthrough AI image processing tool specialized in solving the challenges of clothing display in the e-commerce field. It is able to intelligently convert photos of clothes in real people's wearing state into lay-flat display effect images, this technology is based on the latest Latent Dif...
1yrs ago
057.9K
MM-EUREKA:探索视觉推理的多模态强化学习工具

MM-EUREKA: A Multimodal Reinforcement Learning Tool for Exploring Visual Reasoning

Comprehensive Introduction MM-EUREKA is an open source project developed by Shanghai Artificial Intelligence Laboratory, Shanghai Jiao Tong University and other parties. It extends textual reasoning capabilities to multimodal scenarios through rule-based reinforcement learning techniques to help models process image and textual information. The core of this tool...
1yrs ago
057.8K
Bambo:轻量灵活的智能体框架,简单配置角色和工具,处理多种负载任务

Bambo: a lightweight and flexible framework for intelligent bodies, with simple configuration of roles and tools to handle multiple loads of tasks

Comprehensive Introduction Bambo is a new type of proxy framework, which is lighter and more flexible than the mainstream frameworks and can handle a variety of load tasks.Bambo achieves efficient proxy functionality by defining all the tools in the tool catalog and using asynchronous custom functions. Users can use the llm_c...
1yrs ago
057.7K
GPTme:在命令行终端中运行的智能编程助手,ChatGPT代码解释器的本地化替代方案

GPTme: Intelligent Programming Assistant Running in a Command Line Terminal, Localized Alternative to ChatGPT Code Interpreter

Comprehensive Introduction GPTMe is a revolutionary terminal AI assistant tool designed to enhance developers' work efficiency. It perfectly combines powerful AI capabilities with the terminal environment, supporting diverse functions such as code execution, file editing, web browsing and visual recognition. As ChatGPT code solving...
1yrs ago
057.6K
Agent Laboratory:为研究人员提供自动化代码及研究报告撰写助手

Agent Laboratory: automated code and study writing assistant for researchers

Comprehensive Introduction Agent Laboratory is an end-to-end autonomous research workflow designed to help researchers realize their research ideas. The system consists of dedicated agents driven by large language models that support the entire research workflow - from conducting literature reviews and developing plans to executing...
1yrs ago
057.5K
HivisionIDPhotos:开源智能AI证件照制作工具

HivisionIDPhotos: open source intelligent AI photo ID creation tool

Comprehensive introduction HivisionIDPhotos is an open source lightweight AI document photo production tool, can intelligently identify the user photo scene and keying, to generate a standard document photo in line with a variety of specifications. The tool supports custom background color and size, the future will also introduce beauty and...
2yrs ago
057.5K
IMS Toucan:快速可控的多语言(支持7000+语言)文本转语音工具

IMS Toucan: Fast and Controllable Multilingual (7000+ languages supported) Text-to-Speech Tool

General Introduction IMS Toucan is a state-of-the-art text-to-speech (TTS) toolkit developed by the Institute for Natural Language Processing (IMS) at the University of Stuttgart, Germany. The toolkit supports more than 7000 languages and is characterized by fast, controllable and low computational resource requirements.IMS...
1yrs ago
057.5K
OmAgent:构建多模态智能设备的智能体框架

OmAgent: an intelligent body framework for building multimodal smart devices

Comprehensive Introduction OmAgent is a multimodal intelligent body framework developed by Om AI Lab, aiming to provide powerful AI-powered features for smart devices. By integrating state-of-the-art multimodal base models and intelligent body algorithms, the project enables developers to create efficient smart devices on a variety of...
1yrs ago
057.3K
Crawl4LLM:为LLM预训练提供的高效网页爬取工具

Crawl4LLM: An Efficient Web Crawling Tool for LLM Pretraining

Comprehensive Introduction Crawl4LLM is an open source project jointly developed by Tsinghua University and Carnegie Mellon University, focusing on optimizing the efficiency of web crawling for pre-training of large models (LLM). It significantly reduces ineffective crawling by intelligently selecting high-quality web page data, claiming to be able to originally need to crawl 1...
1yrs ago
057.3K
TripoSF:快速生成高分辨率3D模型的实用工具

TripoSF: A useful tool for quickly generating high-resolution 3D models

Comprehensive Introduction TripoSF is an open source project built by the VAST-AI-Research team, specifically designed to quickly generate high-resolution 3D models from a single image. It uses a technique called SparseFlex, which has high processing efficiency and is able to generate high-resolution 3D models from a single image in a general...
1yrs ago
057.3K
BotSharp:基于.NET的多智能体AI应开发与管理平台

BotSharp: .NET-based multi-intelligence body AI should development and management platform

Comprehensive Introduction BotSharp is an open source project based on .NET Core dedicated to providing a comprehensive AI chatbot platform building tool. It uses C# programming, supports cross-platform operation, and aims to simplify the application of machine learning algorithms, enabling enterprise-level developers to efficiently ...
1yrs ago
057.2K
LangManus:支持多智能体协作的开源AI自动化框架

LangManus: an open source AI automation framework supporting multi-intelligence collaboration

General Introduction LangManus is an open source AI automation framework hosted on GitHub. Developed by a group of former colleagues in their spare time, it is an academically-driven project with the goal of combining language models and specialized tools to accomplish web search, data crawling, and code execution...
1yrs ago
057.1K
dsRAG:用于处理非结构化数据和复杂查询的检索引擎

dsRAG: A Retrieval Engine for Unstructured Data and Complex Queries

Comprehensive Introduction dsRAG is a high-performance retrieval engine designed to handle complex queries on unstructured data. It performs particularly well in handling challenging queries in dense text such as financial reports, legal documents, and academic papers. dsRAG employs three key approaches to improve performance: language...
1yrs ago
056.6K