AI open source project

Total 1020 articles posts
RealtimeVoiceChat:低延迟与AI进行自然口语对话

RealtimeVoiceChat: low-latency natural spoken conversation with AI

General Introduction RealtimeVoiceChat is an open source project focused on real-time, natural conversations with artificial intelligence via voice. Users use a microphone to input their voice, and the system captures the audio through a browser, quickly converts it to text, and a large-scale language model (LLM) generates back...
8mos ago
056.8K
MetaGPT:多智能体协作框架,构建 AI 软件开发团队实现自然语言编程

MetaGPT: A Multi-Intelligence Collaboration Framework for Building AI Software Development Teams for Natural Language Programming

Comprehensive Introduction MetaGPT is an innovative multi-intelligence body framework designed to model the operations of a complete AI software company. Created by geekan (Alexander Wu), the goal of the project is to combine GPT models with different roles into a collaborative entity...
10mos ago
056.7K
NeoAI:让AI接管电脑远程操作,使用自然语言控制电脑的开源项目

NeoAI: Open source project that lets AI take over remote operation of computers and control them using natural language

General Introduction NeoAI is an innovative open source AI assistant tool that allows users to easily control and manage their computers through natural language conversations. Without writing any code, users can simply use everyday conversations to find files, automate tasks, manage devices, etc.NeoAI...
1yrs ago
056.7K
InstantID:上传一张图片,迁移人像特征来生成不同风格图片

InstantID: upload an image and migrate the portrait features to generate different styles of images

Comprehensive Introduction InstantID is an advanced technology focused on generating images with personalized styles or poses in seconds while ensuring a high level of fidelity using a single reference ID picture. The technology employs a diffusion model-based solution by integrating facial images, landmark maps...
1yrs ago
056.3K
A2A:谷歌发布AI智能间通信的开放协议

A2A: Google releases open protocol for communication between AI intelligences

General Introduction A2A (Agent2Agent) is an open source protocol developed by Google to allow AI intelligences developed by different frameworks or vendors to communicate and collaborate with each other. It provides a standardized set of methods for intelligences to discover each other's capabilities, share tasks, and complete work...
9mos ago
055.8K
ChatFree(ChatAnywhere-2):使用GPT API创建的本地Copilot,支持任意窗口中补全对话

ChatFree (ChatAnywhere-2): Native Copilot created using the GPT API to support complementary conversations in any window.

General Introduction ChatFree is an open source project that aims to free users' AI apps from the constraints of browsers to run locally. Created using GPT API, Copilot is designed to support a wide range of office software such as Office, Word, WPS, and more. The project was developed by ...
1yrs ago
055K
TRELLIS:Microsoft开发的3D资产生成模型,支持多种格式和灵活编辑

TRELLIS: Microsoft-developed 3D asset generation model with multiple format support and flexible editing

General Introduction TRELLIS is a large-scale 3D asset generation model developed by Microsoft. It is capable of receiving text or image prompts and generating high-quality 3D assets in a variety of formats, such as radial fields, 3D Gaussians, and meshes.At the heart of TRELLIS is a unified structured latent...
1yrs ago
054.9K
CrewAI:多角色扮演协作智能框架,简化复杂任务

CrewAI: A Multi-Roleplay Collaborative Intelligence Framework to Simplify Complex Tasks

Comprehensive Introduction CrewAI is an advanced framework designed to orchestrate collaboration between role-playing and autonomous AI agents. By facilitating collaborative intelligence, CrewAI enables agents to work together seamlessly to solve complex tasks. Whether you're building an intelligent assistant platform, automating customer service teams, or multi-agent...
1yrs ago
054.6K
腾讯混元3D(Hunyuan3D):生成高分辨率3D资产,多种3D素材生成工作流

Tencent Hybrid 3D (Hunyuan3D): Generate high-resolution 3D assets, multiple 3D material generation workflows

Comprehensive Introduction Tencent Hunyuan3D (Hunyuan3D 2.0) is an advanced large-scale 3D synthesis system from Tencent designed to generate high-resolution textured 3D assets. The system consists of two core components: Hunyuan3D-DiT, a large-scale shape generation model, and Hunyuan3D-DiT, a large-scale texture...
12mos ago
054.4K
FinRobot:提升金融数据分析效率和投资研究的的智能体

FinRobot: An Intelligent Body to Improve Financial Data Analysis Efficiency and Investment Research

Comprehensive Introduction FinRobot is an open source AI intelligence platform developed by AI4Finance Foundation and designed for financial analytics. It not only covers traditional language models, but also incorporates a variety of AI technologies, aiming to provide a comprehensive solution for the financial industry.F...
11mos ago
054.3K
Agent S:像人类一样操作电脑的开源智能体框架

Agent S: An Open Source Framework for Intelligent Bodies to Operate Computers Like Humans

General Introduction Agent S is an open-source framework developed by Simular AI that lets intelligences operate computers like humans through a graphical user interface (GUI). It uses a multimodal large language model and empirical learning techniques to accomplish tasks such as browsing the web, editing documents, using software...
9mos ago
054.1K
Sonic:音频驱动肖像图片生成面部表情生动的数字人口播视频

Sonic: Audio-driven portrait images generate digital demo videos with vivid facial expressions

General Introduction Sonic is an innovative platform focusing on global audio perception designed to generate vivid portrait animations driven by audio. Developed by a team of researchers from Tencent and Zhejiang University, the platform utilizes audio information to control facial expressions and head movements to generate natural and smooth animated videos.S...
10mos ago
053.6K
Llasa 1~8B:高品质语音生成和克隆的开源文本转语音模型

Llasa 1~8B: an open source text-to-speech model for high quality speech generation and cloning

General Introduction Llasa-3B is an open source text-to-speech (TTS) model developed by the Audio Lab of the Hong Kong University of Science and Technology (HKUST Audio). The model is based on the Llama 3.2B architecture, which has been carefully tuned to provide high-quality speech generation that not only supports multiple...
11mos ago
053.4K
DeOldify:使用AI技术为黑白照片和视频上色的经典开源工具

DeOldify: the classic open-source tool for colorizing black-and-white photos and videos using AI technology

Comprehensive Introduction DeOldify is an open source project based on deep learning technology, specifically designed for intelligent colorization and restoration of black and white photos and videos. The project uses an innovative NoGAN training method to successfully solve the common defects of traditional GAN networks in the image coloring process...
1yrs ago
053.4K
InstantIR:受损图像修复与图像高清放大开源项目,最低16G显存

InstantIR: damaged image repair and image high-definition zoom open source project, minimum 16G video memory

General Description InstantIR is an innovative single-image restoration model developed by the InstantX team, designed to resurrect your damaged images with extremely high-quality and realistic details, capable of high-quality restoration of damaged images. The tool not only restores the details of the image...
1yrs ago
053.2K
Fish Agent:端到端AI语音克隆助手,实时语音对话助理,Fish Speech衍生项目

Fish Agent: end-to-end AI voice cloning assistant, real-time voice conversation assistant, Fish Speech spin-off project

Comprehensive Introduction Fish Speech Derivative Project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on the V0.1 3B model architecture. As a fully end-to-end speech clone processing system, its most important feature is the use of innovative speechless...
1yrs ago
053.2K
AsrTools:语音转字幕工具,内置剪映、快手、必剪接口的轻量客户端

AsrTools: speech-to-subtitle tool, lightweight client with built-in interfaces to Cutscene, Racer, and Must-Cut

Comprehensive Introduction AsrTools is an intelligent speech-to-text tool with built-in interfaces from big players such as Cutscene, Racer, Must Cut, etc. It does not require GPU or cumbersome configuration, and supports efficient multi-threaded batch processing. It is based on PyQt5 development, beautiful and user-friendly interface, able to output SRT and TXT format words...
1yrs ago
052.9K
WeClone:用微信聊天记录和语音训练数字分身

WeClone: training digital doppelgangers with WeChat chats and voices

Comprehensive introduction WeClone is an open source project that uses WeChat chat logs and voice messages, combined with large language models and speech synthesis technology, to allow users to create personalized digital doppelgangers. The project can analyze the user's chat habits to train the model , but also a small number of voice samples to generate realistic sound...
9mos ago
052.9K
RD-Agent:自动化数据驱动研发工具,通过AI技术推动以数据为导向的研发过程

RD-Agent: an automated data-driven R&D tool to drive data-driven R&D processes through AI technology

Comprehensive Introduction RD-Agent is an open source tool from Microsoft designed to automate and optimize the research and development (R&D) process. The tool focuses on data-driven scenarios to improve the efficiency of model and data development through artificial intelligence techniques.RD-Agent integrates research...
10mos ago
052.9K
Step-Audio:多模态语音交互框架,识别语音并使用克隆语音交流等功能

Step-Audio: a multimodal voice interaction framework that recognizes speech and communicates using cloned speech, among other features

Comprehensive Introduction Step-Audio is an open source intelligent speech interaction framework designed to provide out-of-the-box speech understanding and generation capabilities for production environments. The framework supports multi-language dialog (e.g., Chinese, English, Japanese), emotional speech (e.g., happy, sad), regional dialects (e.g., Cantonese, Szechuan ...
11mos ago
052.4K
Goose:开源可扩展的编程智能体,自动化执行编程全流程任务

Goose: open source scalable programming intelligences that automate the full range of programming tasks

General Introduction Goose is an open source AI agent tool developed by Block, Inc. designed to help developers automate everyday development tasks. It supports a wide range of Large Language Models (LLMs) and interacts with users via the command line or desktop application interfaces.Goose can perform a wide range of tasks from agent...
12mos ago
052.4K
tldraw:开源无限画布白板SDK,AI生成简约线框图和UML图

tldraw: open source unlimited canvas whiteboard SDK, AI to generate minimalist wireframe diagrams and UML diagrams

General Description tldraw is a free and instant collaborative drawing tool that provides an unlimited canvas where users can quickly draw graphics, write text and collaborate instantly. Featuring an intuitive interface and excellent performance, it is suitable for team collaboration and remote work. Supported through the open source community, tldr...
1yrs ago
052.3K
AI reads books:AI逐页阅读PDF书籍,自动提取知识要点并生成总结

AI reads books: AI reads PDF books page by page, automatically extracts the main points of knowledge and generates summaries.

Comprehensive Introduction AI-reads-books-page-by-page is a Python-based development of intelligent PDF book analysis tool, which can automate the page-by-page analysis of PDF books, extract the key knowledge points, and after the specified page interval to generate stage...
1yrs ago
052.2K
Qwen-Agent:基于Qwen的智能代理应用框架,包括工具调用、代码解释器、RAG和Chrome扩展。

Qwen-Agent: Qwen-based framework for intelligent agent applications, including tool calls, code interpreters, RAGs and Chrome extensions.

Comprehensive Introduction Qwen-Agent is an intelligent agent application framework developed based on Qwen 2.0 and above, with capabilities such as command following, tool usage, planning and memorization. The framework provides a variety of sample applications such as browser assistants, code interpreters and custom assistants...
1yrs ago
052K