Synthesis Muyan-TTS is an open source text-to-speech (TTS) model designed for podcasting scenarios. It is pre-trained with over 100,000 hours of podcast audio data and supports zero-sample speech synthesis to generate high-quality natural speech. The model is built based on Llama-3.2-3B, combined with SoVITS decoding ...
General Introduction CAD-MCP is an open source project that allows users to control CAD software for drawing operations through natural language commands. It combines natural language processing and CAD automation technologies to allow users to create and modify drawings without having to manually manipulate the CAD interface, just by entering simple text commands. Project ...
Enable Builder Smart Programming Mode, unlimited use of DeepSeek-R1 and DeepSeek-V3, smoother experience than the overseas version. Just enter the Chinese commands, even a novice programmer can write his own apps with zero threshold.
Comprehensive introduction manga-image-translator (Cotrans Translator open source version) for translating manga or pictures in the text . Provides command line interaction and online demo , with batch conversion mode , web server mode and other diverse options for use . Can be set in multiple languages target translation and ...
Comprehensive Introduction GraphGen is an open source framework developed by OpenScienceLab, an artificial intelligence lab in Shanghai, hosted on GitHub, focused on optimizing supervised fine-tuning of Large Language Models (LLMs) by guiding synthetic data generation through knowledge graphs. It constructs fine-grained knowledge graphs from source text, utilizing pre...
General Introduction ACI.dev is an open source infrastructure platform designed to provide AI intelligences with rapid integration to over 600 tools. It ensures that intelligences have secure access to tools such as Google Calendar, Slack, and Brave Search through multi-tenant authentication and fine-grained permissions management. developers can...
General Introduction llm.pdf is an open source project that allows users to run large-scale language models (LLMs) directly in PDF files. Developed by EvanZhouDev and hosted on GitHub, this project demonstrates an innovative approach: compiling llama.cpp to asm.js via Emscripten,...
General Introduction Abogen is an open source tool designed to quickly convert ePub, PDF or plain text files to high quality audio. It uses the Kokoro-82M model to generate natural and smooth speech, and also supports synchronized subtitle generation, which is suitable for producing audiobooks, video dubbing or study aids. Use...
General Introduction Local Deep Research is an open source AI research assistant designed to help users conduct deep research and generate detailed reports for complex problems. It supports local operation, allowing users to accomplish research tasks without relying on cloud services. The tool combines local large language modeling...
General Introduction DeepWiki is a free tool from Cognition AI focused on generating structured, Wikipedia-like documentation for GitHub repositories. It analyzes code, README files, and configuration files to automatically create detailed documentation and interactive diagrams that help developers quickly understand...
General Introduction Trackers is an open source Python tool library focused on multi-object tracking in video. It integrates several leading tracking algorithms such as SORT and DeepSORT, allowing users to combine different object detection models (e.g. YOLO, RT-DETR) for flexible video analysis. Users can ...
Comprehensive Introduction Kimi-Audio is an open source audio base model developed by Moonshot AI that focuses on audio understanding, generation and dialog. It supports a variety of audio processing tasks such as speech recognition, audio Q&A, and speech emotion recognition. The model has been pre-trained with over 13 million hours of audio data,...
General Description Describe Anything is an open source project developed by NVIDIA and several universities, with the Describe Anything Model (DAM) at its core. This tool generates detailed descriptions based on areas (such as dots, boxes, doodles, or masks) that the user marks in an image or video. It does not ...
General Introduction Cooragent is an open source AI agent collaboration framework developed by LeapLab at Tsinghua University and hosted on GitHub.It allows users to create intelligent AI agents with a one-sentence description and supports multiple agents to collaborate on complex tasks. The framework provides two modes: Agent Factory automatically generates customized...
General Introduction InstantCharacter is an open source project developed by Tencent Hunyuan and the InstantX team, hosted on GitHub. It uses a reference image and a text description to generate consistent-looking character images for a wide range of scenarios and styles. The project is based on diffusion transformation...
Comprehensive Introduction MCP Server Deep Research is an open source tool that automatically generates structured research reports for complex problems through artificial intelligence and web search. Users enter a research question, and the tool breaks down the question, searches for authoritative information, evaluates the credibility of the source, and generates Markdo...
Comprehensive Introduction Deep Recall is an open source, enterprise-class memory framework designed for large-scale language models (LLMs). It provides hyper-personalized responsiveness through efficient contextual retrieval and integration. The framework uses a three-tier architecture, including a memory service, an inference service and a coordinator, and supports GPU-optimized inference...
General Introduction CleverBee is an open source AI research assistant hosted on GitHub and developed by SureScaleAI. It helps users quickly collect, analyze, and summarize information by combining web browsing technology with large language models (such as Gemini and Claude) to generate research reports with citations....
General Introduction FantasyTalking is an open source project developed by the Fantasy-AMAP team, focusing on generating realism talking portrait videos through audio-driven generation. The project is based on the advanced video diffusion model Wan2.1 , combined with the audio encoder Wav2Vec and proprietary model weights , using artificial intelligence techniques ...
General Introduction Paper2Code is an open source project that aims to solve the problem of lack of code implementations for machine learning papers. It automatically transforms scientific papers into runnable code repositories through the multi-agent Large Language Modeling (LLM) system PaperCoder. The system uses a three-phase flow of planning, analysis and code generation...