AI open source project

Total 1020 articles posts
Refly:基于自由画布上流程编排的AI写作平台,自动化生成文章

Refly: an AI writing platform based on process orchestration on a free canvas for automated article generation

Comprehensive Introduction Refly is a free canvas-based AI native authoring engine designed to help users turn ideas into high-quality content through multi-threaded conversations, knowledge base integration, contextual memory and intelligent search technology. The platform covers over 20 professional scenario templates, including learning...
6mos ago
03.2K
DeOldify:使用AI技术为黑白照片和视频上色的经典开源工具

DeOldify: the classic open-source tool for colorizing black-and-white photos and videos using AI technology

Comprehensive Introduction DeOldify is an open source project based on deep learning technology, specifically designed for intelligent colorization and restoration of black and white photos and videos. The project uses an innovative NoGAN training method to successfully solve the common defects of traditional GAN networks in the image coloring process...
8mos ago
03.7K
Browser-Use:构建智能网页自动化工具,让AI智能体轻松操作浏览器

Browser-Use: Building Intelligent Web Automation Tools for AI Intelligents to Easily Operate Browsers

Comprehensive Introduction Browser-Use is an innovative open source web automation tool specifically designed to enable Language Models (LLMs) to naturally interact with websites. It provides a powerful and flexible framework that supports a wide range of mainstream language models, including GPT-4, Claud...
8mos ago
03.9K
PromptWizard:优化提示工程的开源框架,提升任务性能

PromptWizard: an open source framework for optimizing prompt projects to improve task performance

Comprehensive Introduction PromptWizard is an open source framework developed by Microsoft that uses a self-evolutionary mechanism that allows the model to generate, evaluate, and improve prompt words and generate examples on its own, improving the quality of the output through continuous feedback. It can autonomously optimize the prompt words, generate and select appropriate examples, and...
8mos ago
03.2K
Genesis:开源生成式物理引擎,实现基于真实物理的4D动态世界模拟

Genesis: open source generative physics engine for real physics-based 4D dynamic world simulation

General Introduction Genesis is a generative physics world designed for general purpose robotics and embodied AI learning. It provides a unified simulation platform that supports the simulation of a wide range of materials and physical phenomena.Genesis aims to unlock generative AI and physics simulation by combining...
8mos ago
03.5K
Kolors:生成高质量图像的文本到图像模型,支持生成中文海报

Kolors: text-to-image model for generating high-quality images, support for generating Chinese posters

Comprehensive Introduction Kolors is a large-scale text-to-image generation model developed by the Racer team, based on potential diffusion techniques. The model is trained on billions of text-image data pairs, and is capable of generating high-quality, complex semantically accurate images with support for both Chinese and English input.Kolors in visual quality...
8mos ago
03.1K
ColorFlow:漫画着色,黑白图像自动着色,提升图像色彩一致性和质量

ColorFlow: Comic book coloring, automatic coloring of black and white images to improve image color consistency and quality

Comprehensive Introduction ColorFlow is an image sequence auto-coloring tool developed by Tencent's ARC team to solve the problem of auto-coloring black and white image sequences. The tool utilizes a retrieval-enhanced coloring pipeline to accurately generate the colors of various elements through a pool of reference images, including the character's hair color and service...
8mos ago
02.6K
R2R:多模态内容解析并结合知识图谱与混合搜索的先进AI检索(RAG)系统

R2R: An Advanced AI Retrieval (RAG) System for Multimodal Content Parsing and Combining Knowledge Graph with Hybrid Search

Comprehensive Introduction R2R (RAG to Riches) is an advanced AI retrieval system supporting Retrieval Augmented Generation (RAG) functionality with production-ready features. Built on a containerized RESTful API, the system provides multimodal content parsing, hybrid search functionality...
8mos ago
03K
Megrez-3B-Omni:端侧多模态理解模型,支持文本、图像、音频多模态理解和分析

Megrez-3B-Omni: an end-side multimodal understanding model supporting text, image, and audio multimodal understanding and analysis

Comprehensive Introduction Infini-Megrez is an edge intelligence solution developed by the unquestioned core dome (Infinigence AI), aiming to achieve efficient multimodal understanding and analysis through hardware and software co-design. At the core of the project is the Megrez-3B model, which supports graph...
7mos ago
02.7K
RAGFlow:基于深度文档理解的开源RAG引擎,提供高效的检索增强生成工作流

RAGFlow: an open source RAG engine based on deep document understanding, providing efficient retrieval-enhanced generation workflows

Comprehensive Introduction RAGFlow is an open source Retrieval Augmented Generation (RAG) engine based on deep document understanding technology. It provides an efficient RAG workflow for organizations of all sizes, incorporating a large-scale language model (LLM) capable of delivering data in complex formats based on real...
7mos ago
03.8K
CrewAI:多角色扮演协作智能框架,简化复杂任务

CrewAI: A Multi-Roleplay Collaborative Intelligence Framework to Simplify Complex Tasks

Comprehensive Introduction CrewAI is an advanced framework designed to orchestrate collaboration between role-playing and autonomous AI agents. By facilitating collaborative intelligence, CrewAI enables agents to work together seamlessly to solve complex tasks. Whether you're building an intelligent assistant platform, automating customer service teams, or multi-agent...
8mos ago
03.5K
Leffa:高保真模特虚拟试穿与人物姿势调整,Meta开源的可控人物图像生成模型

Leffa: High-fidelity model virtual fitting and character pose adjustment, Meta open source controllable character image generation model

Comprehensive Introduction Leffa is a unified framework for generating controllable character images, enabling precise manipulation of character appearance (e.g., virtual fitting) and pose (e.g., pose transfer). The framework significantly reduces distortion of fine-grained details by directing the target query to focus on the correct reference key in the attention layer, with ...
8mos ago
03.8K
MMAudio:为视频画面生成同步音效与配乐,视频到音频的多模态联合训练工具

MMAudio: generating synchronized sound effects and soundtracks for video footage, video-to-audio multimodal co-training tool

General Introduction MMAudio is an open-source project aiming to generate high-quality synchronized audio through joint multimodal training. Developed by Ho Kei Cheng et al. at the Chinese University of Hong Kong, the project's main function is to generate synchronized audio based on video and/or text input.MM...
8mos ago
04K
Qwen-Agent:基于Qwen的智能代理应用框架,包括工具调用、代码解释器、RAG和Chrome扩展。

Qwen-Agent: Qwen-based framework for intelligent agent applications, including tool calls, code interpreters, RAGs and Chrome extensions.

Comprehensive Introduction Qwen-Agent is an intelligent agent application framework developed based on Qwen 2.0 and above, with capabilities such as command following, tool usage, planning and memorization. The framework provides a variety of sample applications such as browser assistants, code interpreters and custom assistants...
8mos ago
03.5K
Mini-Cover:在线封面制作,专为博客、短视频、社交媒体等生成个性化封面

Mini-Cover: online cover creation, designed to generate personalized covers for blogs, short videos, social media and more

General Introduction Mini-Cover is an open source online cover generation tool designed to generate personalized covers for platforms such as blogs, short videos and social media. Developed by JLinMr, the tool aims to provide a simple and efficient solution to help users quickly generate covers that meet their needs...
8mos ago
02.7K
Swarms:多智能体编排框架,企业级生产工具

Swarms: Multi-intelligent Orchestration Framework, Enterprise Production Tool

General Introduction Swarms is an enterprise-grade production-ready multi-agent orchestration framework designed to boost business productivity through efficient agent management and task processing. With support for multiple models, multiple memory systems and custom agent creation, the framework provides a modular design and comprehensive logging capabilities to ensure that the system...
8mos ago
02.6K
Sonic:音频驱动肖像图片生成面部表情生动的数字人口播视频

Sonic: Audio-driven portrait images generate digital demo videos with vivid facial expressions

General Introduction Sonic is an innovative platform focusing on global audio perception designed to generate vivid portrait animations driven by audio. Developed by a team of researchers from Tencent and Zhejiang University, the platform utilizes audio information to control facial expressions and head movements to generate natural and smooth animated videos.S...
4mos ago
03K
Ultravox:实时端到端语音对话的音频多模态大模型,GPT-4o语音交互的开源实现

Ultravox: an audio multimodal macromodel for real-time end-to-end voice dialog, an open source implementation of GPT-4o voice interaction

Comprehensive Introduction Ultravox is an innovative multimodal Large Language Model (LLM) designed for real-time speech processing. Unlike traditional speech recognition systems, Ultravox eliminates the need for a separate Audio Speech Recognition (ASR) stage, and is able to directly convert audio into high-dimensional space in...
8mos ago
02.9K
Research Rabbit:使用本地LLM进行网页研究和报告撰写,自动深入用户指定主题并生成总结。

Research Rabbit: Web research and report writing using native LLM, automatically drilling down into user-specified topics and generating summaries.

General Introduction Research Rabbit is a native LLM (Large Language Model) based web research and summarization assistant. After the user provides a research topic, Research Rabbit generates a search query, obtains relevant web results, and summarizes those results...
4mos ago
02.6K
AgentClientDemo:演示智能体运行过程的Python客户端,提供直观的图形用户界面

AgentClientDemo: a Python client that demonstrates the process of running an intelligent body, providing an intuitive graphical user interface

Comprehensive Introduction AgentClientDemo is a comprehensive Python project that integrates intelligent (Agent) and client (Client) functionality. The project is based on the PyQt framework and provides an intuitive and easy-to-use graphical user interface (G...
8mos ago
02.7K
ChatFree(ChatAnywhere-2):使用GPT API创建的本地Copilot,支持任意窗口中补全对话

ChatFree (ChatAnywhere-2): Native Copilot created using the GPT API to support complementary conversations in any window.

General Introduction ChatFree is an open source project that aims to free users' AI apps from the constraints of browsers to run locally. Created using GPT API, Copilot is designed to support a wide range of office software such as Office, Word, WPS, and more. The project was developed by ...
8mos ago
02.5K
Sketch-Gen:生成高质量线稿和草图,反推图像提示词,一键安装包

Sketch-Gen: Generate high-quality line drawings and sketches, backpropagate image cue words, one-click package installation

General Introduction Sketch-Gen is an AI technology-based line drawing and sketch generation tool designed to help artists and designers quickly generate high-quality line drawings and sketches. The tool is derived from the Paints-UNDO project and utilizes advanced machine learning models that can...
8mos ago
02.7K
混元文生视频:生成写实镜头感的高质量视频,腾讯开源视频生成大模型

Hybrid Vincennes video: generating realistic footage sense of high-quality video, Tencent open source video generation large model

Comprehensive Introduction Tencent Mixed Yuan Text Generation Video (available in Yuanbao APP) is a video generation platform based on AI technology launched by Tencent. The platform utilizes the Tencent Mixed Yuan Big Model with powerful cross-domain knowledge and natural language understanding to generate high-quality videos based on users' text descriptions...
7mos ago
03.5K
Director:智能视频代理框架,用自然语言描述执行视频搜索、编辑和生成工作流

Director: Intelligent Video Agent Framework for Performing Video Search, Editing, and Generation Workflows with Natural Language Descriptions

General Introduction Director is an open source framework designed to simplify and optimize video interactions and workflows by building intelligent video agents. The framework is based on VideoDB's "video-as-data" infrastructure and is capable of handling complex video tasks such as searching, editing, compiling and generating...
8mos ago
03K
MoneyPrinterTurbo:输入视频主题一键生成视频文案和高清短视频

MoneyPrinterTurbo: Generate video copy and short HD videos in one click by entering a video theme

Comprehensive Introduction MoneyPrinterTurbo is an open source project that utilizes advanced AI big model technology to achieve the function of generating short HD videos with one click. Users only need to provide a video theme or keywords, the system will automatically generate video copy, video clips, video subtitles and...
5mos ago
02.8K
TRELLIS:Microsoft开发的3D资产生成模型,支持多种格式和灵活编辑

TRELLIS: Microsoft-developed 3D asset generation model with multiple format support and flexible editing

General Introduction TRELLIS is a large-scale 3D asset generation model developed by Microsoft. It is capable of receiving text or image prompts and generating high-quality 3D assets in a variety of formats, such as radial fields, 3D Gaussians, and meshes.At the heart of TRELLIS is a unified structured latent...
8mos ago
03.9K
Bambo:轻量灵活的智能体框架,简单配置角色和工具,处理多种负载任务

Bambo: a lightweight and flexible framework for intelligent bodies, with simple configuration of roles and tools to handle multiple loads of tasks

Comprehensive Introduction Bambo is a new type of proxy framework, which is lighter and more flexible than the mainstream frameworks and can handle a variety of load tasks.Bambo achieves efficient proxy functionality by defining all the tools in the tool catalog and using asynchronous custom functions. Users can use the llm_c...
8mos ago
02.9K
Marco-o1:基于Qwen2-7B-Instruct微调的开源版OpenAI o1模型,探索开放式推理模型,解决复杂问题

Marco-o1: An Open Source Version of the OpenAI o1 Model Based on Qwen2-7B-Instruct Fine-Tuning to Explore Open Inference Models for Solving Complex Problems

Comprehensive Introduction Marco-o1 is an open reasoning model developed by Alibaba International Digital Commerce Group (AIDC-AI) to solve complex real-world problems. The model combines Chain of Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), and innovative reasoning strategies...
8mos ago
03.3K
Flow(Laminar):构建智能体的轻量级任务引擎,简化并灵活管理任务

Flow (Laminar): a lightweight task engine for building intelligences that simplifies and flexibly manages tasks

Comprehensive Introduction Flow is a lightweight task engine designed for building AI agents, emphasizing simplicity and flexibility. Unlike traditional node- and edge-based workflows, Flow uses a dynamic task queuing system that supports parallel execution, dynamic scheduling, and intelligent dependency management. Its core concept is ...
8mos ago
02.8K
MegaParse:解析各类型文档为LLM可用数据,完整保留文档中的表格、图片等所有信息

MegaParse: parses all types of documents into LLM-available data, preserving all information in the document such as tables, pictures, etc. in its entirety

Comprehensive Introduction MegaParse is a powerful and versatile document parsing tool designed to optimize data processing for the Large Language Model (LLM). Whether you are working with text, PDF, PowerPoint presentations or Word documents, MegaParse...
8mos ago
03.1K
RMBG-2-Studio:批量移除图像和视频背景的开源程序,基于RMBG 2.0优化

RMBG-2-Studio: open source program for batch removal of image and video backgrounds, optimized for RMBG 2.0

General Introduction RMBG-2-Studio is an enhanced background removal and replacement application developed based on the BRIA-RMBG-2.0 model. The application is designed to provide users with efficient and accurate image background processing capabilities for a variety of image types, including e-commerce, gaming and...
8mos ago
03.6K
OpenAlternative:精选常用SaaS产品的开源软件替代方案,寻找最佳开源替代方案

OpenAlternative: a selection of open source software alternatives to commonly used SaaS products, finding the best open source alternatives

General Introduction OpenAlternative is a platform focused on providing open source software alternatives, aiming to help users find suitable open source tools to replace the commercial SaaS products they use on a daily basis. The site helps users save money and improve through a carefully curated collection of open source tools...
8mos ago
02.3K
TextDistiller:一键总结一整本书,高效提炼书籍内容,快速掌握核心思想

TextDistiller: summarize an entire book in one click, efficiently distill the content of the book, quickly grasp the core ideas

Comprehensive Introduction TextDistiller is an advanced AI-driven tool designed to summarize books chapter-by-chapter or as a whole, providing a concise yet comprehensive overview. By using TextDistiller, users are able to quickly grasp the core ideas and key points of any book...
8mos ago
03K
ChainForge:测试和评估大型语言模型提示效果的开源可视化编程环境

ChainForge: An Open Source Visual Programming Environment for Testing and Evaluating the Effectiveness of Large Language Model Hints

Comprehensive Introduction ChainForge is an open source visual programming environment designed for testing and evaluating the effectiveness of Large Language Model (LLM) cues. It provides a data flow cueing engineering environment through which users can quickly explore and analyze the quality of different cues on LLM response...
8mos ago
02.7K
AI Hedge Fund:开源自动化交易系统,利用多智能体进行复杂对冲基金交易决策

AI Hedge Fund: open-source automated trading system utilizing multiple intelligences for complex hedge fund trading decisions

General Introduction AI Hedge Fund is an artificial intelligence hedge fund that utilizes a multi-agent system for trading decisions. The system works in concert with multiple specialized agents, including market data agents, quantitative agents, risk management agents, and portfolio management agents, to achieve complex trading...
7mos ago
04.1K