AI open source project

Total 1020 articles posts
TextDistiller:一键总结一整本书,高效提炼书籍内容,快速掌握核心思想

TextDistiller: summarize an entire book in one click, efficiently distill the content of the book, quickly grasp the core ideas

Comprehensive Introduction TextDistiller is an advanced AI-driven tool designed to summarize books chapter-by-chapter or as a whole, providing a concise yet comprehensive overview. By using TextDistiller, users are able to quickly grasp the core ideas and key points of any book...
8mos ago
03K
DocsGPT:文档聊天助手,从单个文档、网站来源获取可靠的答案,支持本地部署

DocsGPT: Document Chat Assistant, get reliable answers from single documents, web sources, support local deployment

General Introduction DocsGPT is an open source documentation assistant designed to simplify the process of querying project documentation. By integrating a powerful GPT model , developers can easily ask questions about the project and get accurate answers.DocsGPT supports local deployment to ensure data privacy while...
9mos ago
03K
MegaParse:解析各类型文档为LLM可用数据,完整保留文档中的表格、图片等所有信息

MegaParse: parses all types of documents into LLM-available data, preserving all information in the document such as tables, pictures, etc. in its entirety

Comprehensive Introduction MegaParse is a powerful and versatile document parsing tool designed to optimize data processing for the Large Language Model (LLM). Whether you are working with text, PDF, PowerPoint presentations or Word documents, MegaParse...
8mos ago
03K
VideoRAG:理解超长视频的RAG框架,支持多模态检索和知识图谱构建

VideoRAG: A RAG framework for understanding ultra-long videos with support for multimodal retrieval and knowledge graph construction

Comprehensive Introduction VideoRAG is a retrieval-enhanced generative framework designed for processing and understanding very long contextual videos. The tool combines a graph-driven textual knowledge base with hierarchical multimodal context encoding to efficiently process on a single NVIDIA RTX 3090 GPU...
6mos ago
03K
Moondream:批量反推图像提示词的开源轻量级视觉语言模型

Moondream: an open source lightweight visual language model for batch backpropagation of image cue words

Comprehensive Introduction Moondream is an open source lightweight visual language model designed to enable image description capabilities through deep learning and computer vision techniques. The model is able to run efficiently on a variety of platforms and is particularly suitable for edge devices.Moondream uses advanced techniques and...
7mos ago
02.9K
AnyText:生成和编辑多语言图像文本,高可控在图像中生成多行中文

AnyText: Generate and edit multi-language image text, highly controllable to generate multiple lines of Chinese in the image

Comprehensive Introduction AnyText is a revolutionary multilingual visual text generation and editing tool developed based on the diffusion model. It generates natural, high-quality multilingual text in images and supports flexible text editing features. It was developed by a team of researchers and presented at ICLR 2024...
7mos ago
02.9K
SP-MangaEditer:专业四格漫画插图创作工具,生成图像、编辑漫画页面

SP-MangaEditer: Professional four-panel manga illustration creation tool, generating images, editing manga pages

General Introduction SP-MangaEditer is an independent manga editing platform designed for manga creators. The platform supports image generation, layer editing, image adjustment, filter application and many other functions to help users easily create high-quality manga illustrations. Users can simply manipulate...
7mos ago
02.9K
Amurex:开源AI会议记录助手,自动记录会议内容生成总结

Amurex: open source AI meeting recording assistant, automatic recording of meeting content to generate summaries

General Introduction Amurex is an open source AI meeting assistant developed by The Personal AI Company that aims to improve meeting efficiency through intelligent features.Amurex can provide real-time suggestions, generate intelligent summaries, record meeting content, and automatically send follow...
7mos ago
02.9K
AIHawk:智能求职助手,自动化投放简历(限英文)

AIHawk: Intelligent Job Search Assistant, Automated Resume Placement (English only)

General Introduction Auto_Jobs_Applier_AIHawk is a tool to automate job search using artificial intelligence technology. It helps users to automatically deliver a large number of resumes in a short period of time and personalize them according to their personal information and job search intentions. The tool is designed to raise...
8mos ago
02.9K
Refly:基于自由画布上流程编排的AI写作平台,自动化生成文章

Refly: an AI writing platform based on process orchestration on a free canvas for automated article generation

Comprehensive Introduction Refly is a free canvas-based AI native authoring engine designed to help users turn ideas into high-quality content through multi-threaded conversations, knowledge base integration, contextual memory and intelligent search technology. The platform covers over 20 professional scenario templates, including learning...
6mos ago
02.9K
SciToolAgent:整合500+科研工具,自动化研究科研任务的智能体

SciToolAgent: Integration of 500+ research tools and automation of research and scientific tasks for intelligent bodies

Comprehensive Introduction SciToolAgent is an open source tool platform developed by the Innovation Center of Zhejiang University in Hangzhou (HICAI-ZJU). It integrates more than 500 scientific tools through knowledge graph (SciToolKG) and big language modeling technologies to help researchers deal with...
5mos ago
02.9K
AI ContentCraft:生成短故事、对话脚本、配音、配图的多功能AI内容创作工具

AI ContentCraft: a versatile AI content creation tool for generating short stories, dialog scripts, voiceovers, and graphics

General Introduction AI ContentCraft is a versatile content creation tool that integrates text generation, speech synthesis, image generation and more. It helps creators quickly generate stories, podcast scripts, and accompanying audio and video content. The tool supports multiple language conversions and can batch...
7mos ago
02.9K
Fish Agent:端到端AI语音克隆助手,实时语音对话助理,Fish Speech衍生项目

Fish Agent: end-to-end AI voice cloning assistant, real-time voice conversation assistant, Fish Speech spin-off project

Comprehensive Introduction Fish Speech Derivative Project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on the V0.1 3B model architecture. As a fully end-to-end speech clone processing system, its most important feature is the use of innovative speechless...
7mos ago
02.9K
Vision is All You Need:使用视觉语言模型构建智能文档检索系统(Vision RAG)

Vision is All You Need: Building an Intelligent Document Retrieval System Using Visual Language Models (Vision RAG)

Comprehensive Introduction Vision-is-all-you-need is an innovative visual RAG (Retrieval Augmented Generation) system demonstration project that breaks new ground in applying Visual Language Modeling (VLM) to the document processing domain. Unlike traditional text chunking methods, the system directly makes...
7mos ago
02.9K
MakeSense:免费使用的图像标注工具,提升计算机视觉项目效率

MakeSense: a free-to-use image annotation tool to improve computer vision project efficiency

General Introduction Make Sense is a free online image annotation tool designed to help users quickly prepare datasets for computer vision projects. It requires no complicated installation, just open a browser access to use it, supports multiple operating systems, and is perfect for small deep learning projects. Users can...
6mos ago
02.9K
Kolors:生成高质量图像的文本到图像模型,支持生成中文海报

Kolors: text-to-image model for generating high-quality images, support for generating Chinese posters

Comprehensive Introduction Kolors is a large-scale text-to-image generation model developed by the Racer team, based on potential diffusion techniques. The model is trained on billions of text-image data pairs, and is capable of generating high-quality, complex semantically accurate images with support for both Chinese and English input.Kolors in visual quality...
8mos ago
02.9K
TankWork:通过语音和文字操作电脑,并提供实时语音反馈的智能体

TankWork: an intelligent body that operates computers via voice and text and provides real-time voice feedback

General Introduction TankWork is an open source desktop agent framework designed to enable AI to perceive and control your computer through computer vision and system-level interaction. The framework allows agents to directly control computers through voice and text commands, process real-time screen content, and provide continuous audio visual...
7mos ago
02.9K
CR-Mentor:知识库+LLM 驱动的GitHub智能代码审查导师

CR-Mentor: Knowledge Base + LLM Driven Intelligent Code Review Mentor for GitHub

Comprehensive Introduction CR-Mentor is an intelligent code review tool that combines a specialized knowledge base with the power of Large Language Modeling (LLM). It not only supports code review for all programming languages, but also customizes exclusive review criteria and focus areas for teams based on best practices accumulated in the knowledge base. Through...
9mos ago
02.9K
Qlib:微软开发的AI量化投资研究工具

Qlib: an AI quantitative investment research tool developed by Microsoft

Comprehensive Introduction Qlib is an open source platform developed by Microsoft that focuses on using AI technology to help users research quantitative investments. It starts from the most basic data processing and supports users to explore investment ideas and turn them into usable strategies. The platform is simple and easy to use, and is suitable for those who want to use machine learning to improve their investment research...
5mos ago
02.8K
VideoChat:自定义形象和音色克隆的实时语音交互数字人,支持端到端语音方案和级联方案

VideoChat: real-time voice-interactive digital person with customized image and tone cloning, supporting end-to-end voice solutions and cascading solutions

Comprehensive Introduction VideoChat is a real-time voice interaction digital person project based on open source technology, supporting both end-to-end voice schemes (GLM-4-Voice - THG) and cascade schemes (ASR-LLM-TTS-THG). The project allows users to customize the digital ...
9mos ago
02.8K
StreamingT2V:从文本到长视频的动态且可扩展的生成技术

StreamingT2V: A Dynamic and Scalable Generation Technique from Text to Long Video

Comprehensive Introduction StreamingT2V is a public project developed by the Picsart AI research team focused on generating coherent, dynamic and scalable long videos based on textual descriptions. This technology uses an advanced autoregressive approach that guarantees temporal consistency of the video with the description text tightly...
9mos ago
02.8K
Fay数字人框架:集成语言模型与3D数字角色,支持多种应用场景

Fay Digital Human Framework: Integrated language modeling and 3D digital characters to support multiple application scenarios

Comprehensive Introduction Fay is an open source 3D virtual digital human framework that integrates language models and digital characters for a variety of application scenarios, such as virtual shopping guides, virtual anchors, assistants, waiters, teachers, and voice- or text-based mobile assistants.The Fay framework supports full offline use, providing m...
7mos ago
02.8K
Devika:开源的AI软件工程师智能体,能够理解、拆分指令为子任务并编写代码

Devika: open-source AI software engineer intelligence that understands, splits instructions into subtasks and writes code

General Introduction Devika is an advanced AI software engineer that understands high-level human instructions, breaks them down into steps, studies the relevant information, and writes code to achieve a given goal. It intelligently develops software using large-scale language models, planning and reasoning algorithms, and web browsing capabilities.D...
5mos ago
02.8K
Aggregator:一站式代理爬取与聚合平台,免费代理池(请合规使用)

Aggregator: one-stop agent crawling and aggregation platform, free agent pool (please use in compliance)

Comprehensive introduction Aggregator is an open source project aimed at creating a free proxy pool that can crawl a variety of available proxy nodes. The platform has a flexible plug-in system , the user can according to the special needs of the target site , through plug-ins to achieve specific functions . The project is mainly used to learn to crawl ...
9mos ago
02.8K