AI Personal Learning
and practical guidance
Beanbag Marscode1
Total 910 articles

Tags: ai open source projects Page 33

MMAudio:为视频画面生成同步音效与配乐,视频到音频的多模态联合训练工具-首席AI分享圈

MMAudio: generating synchronized sound effects and soundtracks for video footage, video-to-audio multimodal co-training tool

General Introduction MMAudio is an open source project aiming at generating high-quality synchronized audio through joint multimodal training. Developed by Ho Kei Cheng et al. at the Chinese University of Hong Kong, the project's main function is to generate synchronized audio based on video and/or text input.The core innovation of MMAudio is...

LocalGPT:在本地设备上与多文档对话,确保数据隐私-首席AI分享圈

LocalGPT: Ensure data privacy by talking to multiple documents on local devices

General Introduction LocalGPT is an open source project designed to allow users to talk to documents on local devices and ensure data privacy. By using various open source models, LocalGPT can process and understand document content without uploading the data to the cloud. The project supports a variety of platforms, including GPU, C...

AutoGPT:工作流自动化与自主执行任务的智能体构建平台-首席AI分享圈

AutoGPT: Intelligent Body Building Platform for Workflow Automation and Autonomous Task Execution

Comprehensive Introduction AutoGPT is a powerful platform designed to help users create, deploy, and manage continuously running AI agents that automate complex workflows. Developed by Significant Gravitas, the platform offers a wide range of tools and features that enable users to focus on important tasks without worrying about technical...

Qwen-Agent:基于Qwen的智能代理应用框架,包括工具调用、代码解释器、RAG和Chrome扩展。-首席AI分享圈

Qwen-Agent: Qwen-based framework for intelligent agent applications, including tool calls, code interpreters, RAGs and Chrome extensions.

Comprehensive Introduction Qwen-Agent is an intelligent agent application framework developed based on Qwen 2.0 and above, with capabilities such as command following, tool usage, planning and memorization. The framework provides a variety of sample applications such as browser assistants, code interpreters and custom assistants to help developers quickly construct...

Claude Engineer: 利用Claude模型自主生成和管理AI工具的智能体对话助手-首席AI分享圈

Claude Engineer: A Conversational Assistant for Intelligent Bodies to Autonomously Generate and Manage AI Tools Using Claude Models

General Introduction Claude Engineer is an interactive command line interface (CLI) developed by Doriandarko that utilizes Anthropic's Claude-3.5-Sonnet model to assist in software development tasks. The framework allows Claude to generate and manage its own tools, continuously extending its capabilities through dialog...

Swarms:多智能体编排框架,企业级生产工具-首席AI分享圈

Swarms: Multi-intelligent Orchestration Framework, Enterprise Production Tool

Comprehensive Introduction Swarms is an enterprise-grade production-ready multi-agent orchestration framework designed to boost business productivity through efficient agent management and task processing. With support for multiple models, multiple memory systems and custom agent creation, the framework provides a modular design and comprehensive logging capabilities to ensure system...

Sonic:音频驱动肖像图片生成面部表情生动的数字人口播视频-首席AI分享圈

Sonic: Audio-driven portrait images generate digital demo videos with vivid facial expressions

General Introduction Sonic is an innovative platform focused on global audio perception, designed to generate vivid portrait animations driven by audio. Developed by a team of researchers from Tencent and Zhejiang University, the platform utilizes audio information to control facial expressions and head movements to generate natural and smooth animated videos.Sonic ...

Ultravox:实时端到端语音对话的音频多模态大模型,GPT-4o语音交互的开源实现-首席AI分享圈

Ultravox: an audio multimodal macromodel for real-time end-to-end voice dialog, an open source implementation of GPT-4o voice interaction

Comprehensive Introduction Ultravox is an innovative multimodal Large Language Model (LLM) designed for real-time speech processing. Unlike traditional speech recognition systems, Ultravox eliminates the need for a separate Audio Speech Recognition (ASR) stage, and is able to directly convert audio to text in high-dimensional space. This feature makes...

Research Rabbit:使用本地LLM进行网页研究和报告撰写,自动深入用户指定主题并生成总结。-首席AI分享圈

Research Rabbit: Web research and report writing using native LLM, automatically drilling down into user-specified topics and generating summaries.

General Introduction Research Rabbit is a native LLM (Large Language Model) based web research and summarization assistant. After the user provides a research topic, Research Rabbit generates a search query, obtains relevant web results, and summarizes those results. It will iterate this process to fill the knowledge gap...

AgentClientDemo: a Python client that demonstrates the process of running an intelligent body, providing an intuitive graphical user interface

Comprehensive Introduction AgentClientDemo is a comprehensive Python project that integrates intelligent (Agent) and client (Client) functionality. The project is based on the PyQt framework and provides an intuitive and easy-to-use graphical user interface (GUI). With this project, users can experience the Intelligent...

HelloMeme:生成局部高保真表情动作一致的图像或视频,Runway Act one 开源平替-首席AI分享圈

HelloMeme: Generate localized high-fidelity expression-action-consistent images or videos, Runway Act one open-source ping-pong!

Comprehensive Introduction HelloMeme is an open source project developed by HelloVision, aiming to generate high-quality images and videos by integrating Spatial Knitting Attentions to embed high-level and high-fidelity conditions in diffusion models. The project's code and modeling ...

en_USEnglish