AI open source project

Total 1020 articles posts
AI ContentCraft:生成短故事、对话脚本、配音、配图的多功能AI内容创作工具

AI ContentCraft: a versatile AI content creation tool for generating short stories, dialog scripts, voiceovers, and graphics

General Introduction AI ContentCraft is a versatile content creation tool that integrates text generation, speech synthesis, image generation and more. It helps creators quickly generate stories, podcast scripts, and accompanying audio and video content. The tool supports multiple language conversions and can batch...
7mos ago
02.9K
Aggregator:一站式代理爬取与聚合平台,免费代理池(请合规使用)

Aggregator: one-stop agent crawling and aggregation platform, free agent pool (please use in compliance)

Comprehensive introduction Aggregator is an open source project aimed at creating a free proxy pool that can crawl a variety of available proxy nodes. The platform has a flexible plug-in system , the user can according to the special needs of the target site , through plug-ins to achieve specific functions . The project is mainly used to learn to crawl ...
9mos ago
02.9K
TransRouter:基于Gemini多模态模型,实时中英互译的音频转换工具

TransRouter: A Real-Time Audio Conversion Tool for Chinese-to-English Translation Based on Gemini Multimodal Modeling

TransRouter is a real-time voice translation tool based on Google's Gemini model, specifically designed for real-time voice translation between English and Chinese. The tool can be seamlessly integrated into video conferencing software such as Zoom, providing an easy way for cross-language...
7mos ago
02.9K
StreamingT2V:从文本到长视频的动态且可扩展的生成技术

StreamingT2V: A Dynamic and Scalable Generation Technique from Text to Long Video

Comprehensive Introduction StreamingT2V is a public project developed by the Picsart AI research team focused on generating coherent, dynamic and scalable long videos based on textual descriptions. This technology uses an advanced autoregressive approach that guarantees temporal consistency of the video with the description text tightly...
9mos ago
02.9K
TankWork:通过语音和文字操作电脑,并提供实时语音反馈的智能体

TankWork: an intelligent body that operates computers via voice and text and provides real-time voice feedback

General Introduction TankWork is an open source desktop agent framework designed to enable AI to perceive and control your computer through computer vision and system-level interaction. The framework allows agents to directly control computers through voice and text commands, process real-time screen content, and provide continuous audio visual...
7mos ago
02.9K
AI2SRT:利用 Gemini模型,一键为长视频创建解说短视频或视频总结

AI2SRT: Create short narrated videos or video summaries for long videos with one click using Gemini models

Comprehensive Introduction AI2SRT is an open source project that utilizes the GeminiAI Big Model to generate short narrated videos and video summaries for long videos with one click, while supporting audio and video transcription subtitles. The project aims to simplify the video content creation process and provide efficient subtitle generation and translation functions. Users can pass...
8mos ago
02.8K
VideoChat:自定义形象和音色克隆的实时语音交互数字人,支持端到端语音方案和级联方案

VideoChat: real-time voice-interactive digital person with customized image and tone cloning, supporting end-to-end voice solutions and cascading solutions

Comprehensive Introduction VideoChat is a real-time voice interaction digital person project based on open source technology, supporting both end-to-end voice schemes (GLM-4-Voice - THG) and cascade schemes (ASR-LLM-TTS-THG). The project allows users to customize the digital ...
9mos ago
02.8K
Diffbot GraphRAG LLM:依赖外部实时知识图谱数据的LLM推理服务

Diffbot GraphRAG LLM: LLM reasoning service relying on external real-time knowledge graph data

Comprehensive Introduction Diffbot LLM Reasoning Server is an innovative large-scale language modeling system with special optimizations and improvements based on the LLama model architecture. The most important feature of the project is the integration of real-time Knowledge Graph with retrieval-enhanced generation...
7mos ago
02.8K
CogAgent:智谱开源的智能视觉语言模型,实现图形界面自动化操作

CogAgent: Smart Spectrum's open source intelligent visual language model for automating graphical interfaces

Comprehensive Introduction CogAgent is an open source visual language model developed by Tsinghua University Data Mining Research Group (THUDM), aiming to automate the operation of cross-platform graphical user interface (GUI). The model is based on CogVLM (GLM-4V-9B) and supports bilingual Chinese and English...
8mos ago
02.8K
WeClone:用微信聊天记录和语音训练数字分身

WeClone: training digital doppelgangers with WeChat chats and voices

Comprehensive introduction WeClone is an open source project that uses WeChat chat logs and voice messages, combined with large language models and speech synthesis technology, to allow users to create personalized digital doppelgangers. The project can analyze the user's chat habits to train the model , but also a small number of voice samples to generate realistic sound...
4mos ago
02.8K
Step-Audio:多模态语音交互框架,识别语音并使用克隆语音交流等功能

Step-Audio: a multimodal voice interaction framework that recognizes speech and communicates using cloned speech, among other features

Comprehensive Introduction Step-Audio is an open source intelligent speech interaction framework designed to provide out-of-the-box speech understanding and generation capabilities for production environments. The framework supports multi-language dialog (e.g., Chinese, English, Japanese), emotional speech (e.g., happy, sad), regional dialects (e.g., Cantonese, Szechuan ...
6mos ago
02.8K
BotSharp:基于.NET的多智能体AI应开发与管理平台

BotSharp: .NET-based multi-intelligence body AI should development and management platform

Comprehensive Introduction BotSharp is an open source project based on .NET Core dedicated to providing a comprehensive AI chatbot platform building tool. It uses C# programming, supports cross-platform operation, and aims to simplify the application of machine learning algorithms, enabling enterprise-level developers to efficiently ...
7mos ago
02.8K
Swarm:学习轻量级多智能体系统的实验性教学项目(OpenAI示例)

Swarm: an experimental pedagogical program for learning lightweight multi-intelligent body systems (OpenAI example)

General Introduction Swarm is an experimental educational framework developed by OpenAI to explore lightweight, controlled, and easy-to-test interfaces for multi-agent systems. The framework is primarily used to demonstrate handoffs and routine patterns between agents to help developers understand and implement the coordination and execution of multi-agent systems...
7mos ago
02.8K
AigoTools:自动收录网站并支持多语言的开源AI工具导航站

AigoTools: automatic inclusion of the site and support for multilingual open source AI tools navigation station

General Introduction AigoTools is an open source AI web site navigation designed to help users quickly create and manage navigation sites. It has built-in site management and AI-based auto-inclusion features , support for multi-language , dark/light theme switching , and SEO optimization.AigoTools proposes ...
10mos ago
02.8K
ChatTTS:模仿真人说话声音的语音生成模型(ChatTTS一键加速包)

ChatTTS: a speech generation model that mimics the voice of a real person speaking (ChatTTS one-click acceleration package)

General Introduction ChatTTS is a generative speech model designed for conversational scenarios. It generates natural and expressive speech, supports multiple languages and multiple speakers, and is suitable for interactive conversations. The model does this by predicting and controlling fine-grained prosodic features such as laughter, pauses and interjections, sup...
6mos ago
02.8K
Confident AI:自动化大语言模型评估框架,对比不同大模型提示词输出质量

Confident AI: A Framework for Automated Large Language Model Evaluation, Comparing the Output Quality of Different Large Model Cue Words

Comprehensive Introduction DeepEval is an easy-to-use open source LLM evaluation framework for evaluating and testing large language modeling systems. It is similar to Pytest, but focuses on unit testing of LLM output.DeepEval combines the latest research results through G-Eval, phantom...
6mos ago
02.8K
Pyramid Flow:快手推出的开源版

Pyramid Flow: an open source version of "Kringle" launched by Racer, based on SD3 and running on GPUs of less than 8GB (one-click deployment version)

Comprehensive Introduction Pyramid Flow is an efficient autoregressive video generation method based on the Flow Matching technique. The method achieves higher computational efficiency in generating and decompressing video content by interpolating between different resolutions and noise levels...
9mos ago
02.8K
ModelBest(面壁智能):全球领先的轻量高性能端侧大模型

ModelBest: The World's Leading Lightweight, High-Performance End-Side Big Model

General Introduction ModelBest is a company specializing in developing lightweight and high-performance large models, dedicated to applying advanced AI technologies to mainstream consumer electronics and various end devices in daily life. Its MiniCPM series of end-side models are characterized by extreme arithmetic power and memory usage efficiency...
10mos ago
02.8K
MindSearch:开源AI搜索引擎框架,部署您自己的 Perplexity 搜索引擎!

MindSearch: open source AI search engine framework to deploy your own Perplexity search engine!

Comprehensive Introduction MindSearch is an open source AI search engine framework launched by Shanghai Artificial Intelligence Laboratory (SAL), aiming to simulate human thought process for complex information gathering and integration. The tool combines the advanced technology of large-scale language modeling (LLM) and search engine through multi-intelligence...
8mos ago
02.8K
SpeechGPT 2.0-preview:实时交互的端到端拟人语音对话大模型

SpeechGPT 2.0-preview: an end-to-end anthropomorphic speech dialog grand model for real-time interaction

SpeechGPT 2.0-preview is the first anthropomorphic real-time interaction system introduced by OpenMOSS, which is trained based on millions of hours of speech data. The system is equipped with anthropomorphic spoken expression and 100ms low latency response, supporting natural and smooth real...
6mos ago
02.8K