AI Personal Learning
and practical guidance
CyberKnife Drawing Mirror
Total 908 articles

Tags: ai open source projects Page 30

AnyText:生成和编辑多语言图像文本,高可控在图像中生成多行中文-首席AI分享圈

AnyText: Generate and edit multi-language image text, highly controllable to generate multiple lines of Chinese in the image

Comprehensive Introduction AnyText is a revolutionary multilingual visual text generation and editing tool developed based on the diffusion model. It generates natural, high-quality multilingual text in images and supports flexible text editing features. It was developed by a team of researchers and won the Spot at the ICLR 2024 conference...

AIGCPanel:开源克隆数字人整合系统,一键部署免费数字人客户端-首席AI分享圈

AIGCPanel: open source clone of the digital man integration system, one-click deployment of free digital man client

Comprehensive Introduction AigcPanel is a one-stop AI digital human production system for all users, developed with electron+vue3+typescript technology stack, supporting one-click deployment on Windows systems. The system is designed to be user-friendly as the core, even users with a weak technical foundation can easily master it. Main features ...

AI Dev Gallery:Windows本地AI模型开发工具集,端侧模型集成到Windows应用-首席AI分享圈

AI Dev Gallery: Windows Native AI Model Development Toolset, End-Side Model Integration into Windows Applications

Comprehensive Introduction AI Dev Gallery is an AI development tools application from Microsoft (currently in public preview) designed for Windows developers. It provides a comprehensive platform to help developers easily integrate AI features into their Windows applications. The most notable feature of the tool...

Edge TTS Worker:使用Cloudflare部署微软语音合成API,兼容OpenAI 格式并封装Web界面-首席AI分享圈

Edge TTS Worker: Deploying Microsoft Speech Synthesis APIs with Cloudflare, OpenAI Compatible Format and Wrapped Web Interface

General Introduction Edge TTS Worker (depends on edge-tts ) is a proxy service deployed on Cloudflare Worker that encapsulates the Microsoft Edge TTS service into an API interface compatible with the OpenAI format. With this project, users can easily use without Microsoft certification...

BetterWhisperX:自动语音识别与说话人分离,提供高精度单词级时间戳-首席AI分享圈

BetterWhisperX: Automated speech recognition separated from the speaker, providing highly accurate word-level timestamps

Comprehensive Introduction BetterWhisperX is an optimized version of the WhisperX-based project focused on providing efficient and accurate Automatic Speech Recognition (ASR) services. As an improved offshoot of WhisperX, the project is maintained by Federico Torrielli, who is committed to keeping the project continuously updated and improving performance...

Gemini Balance:Gemini模型API兼容OpenAI格式,解锁区域限制并支持多API Key轮询-首席AI分享圈

Gemini Balance: Gemini model API is compatible with OpenAI format, unlocks region restrictions and supports multi-API key polling.

Comprehensive Introduction Gemini Balance is an OpenAI API proxy service developed based on the FastAPI framework, aiming to provide efficient multi-API Key management and optimization features. The project supports Gemini model calls, and its main features include multi-API Key polling, authentication forensics, streaming response, CORS cross-domain support and...

AIaW:全功能、轻量级、可拓展插件的跨平台AI客户端-首席AI分享圈

AIaW: a full-featured, lightweight, cross-platform AI client with extensible plug-ins

Comprehensive Introduction AIaW (AI as Workspace) is a next-generation AI client designed to provide full-featured, lightweight and extensible solutions. The platform supports a wide range of service providers, including OpenAI, Anthropic and Google, and is capable of parsing documents and videos, supporting multiple workspaces and plugin systems,...

AI2SRT:利用 Gemini模型,一键为长视频创建解说短视频或视频总结-首席AI分享圈

AI2SRT: Create short narrated videos or video summaries for long videos with one click using Gemini models

Comprehensive Introduction AI2SRT is an open source project that utilizes the GeminiAI Big Model to generate short narrated videos and video summaries for long videos with one click, while supporting audio and video transcription subtitles. The project aims to simplify the video content creation process and provide efficient subtitle generation and translation functions. Users can simply operate...

CogAgent:智谱开源的智能视觉语言模型,实现图形界面自动化操作-首席AI分享圈

CogAgent: Smart Spectrum's open source intelligent visual language model for automating graphical interfaces

Comprehensive Introduction CogAgent is an open source visual language model developed by Tsinghua University Data Mining Research Group (THUDM), aiming to automate cross-platform graphical user interface (GUI) operations. The model is based on CogVLM (GLM-4V-9B), supports bilingual interactions in English and Chinese, and is able to automate GUI operations through screenshots and natural...

DisPose:生成人体姿态精准控制的视频,创作跳舞的小姐姐-首席AI分享圈

DisPose: generating videos with precise control of human posture, creating dancing ladies

General Introduction DisPose is an innovative open source artificial intelligence project focused on controlled character image animation generation. Developed by a team of researchers and open-sourced on GitHub, the project uses advanced deep learning techniques to achieve precise character animation control by decomposing skeletal pose information.The core of DisPose...

Smolagents:快速开发AI智能体,轻量级构建智能体的开源项目-首席AI分享圈

Smolagents: open source project for rapid development of AI intelligences and lightweight construction of intelligences

Comprehensive Introduction Smolagents is a lightweight intelligent agent library developed by HuggingFace that focuses on simplifying the development process of AI agent systems. The project is known for its clean design philosophy, with only about 1000 lines of core code, yet provides powerful feature integration capabilities. Its most notable feature is its support for code execution...

Vision Parse:使用视觉语言模型将PDF文档智能转换为Markdown格式-首席AI分享圈

Vision Parse: Intelligent Conversion of PDF Documents to Markdown Format Using Visual Language Models

Comprehensive Introduction Vision Parse is a revolutionary document processing tool that cleverly combines state-of-the-art Visual Language Models (Vision Language Models) technology to intelligently convert PDF documents into high-quality Markdown format content. The tool supports a wide range of top-notch visual language models, including o...

en_USEnglish