Latest AI Resources

Total 2976 articles posts
Midjourney:创造你想象中的图像|Midjourney中文官网介绍|官网开放免费测试

Midjourney: Create the images of your imagination|Midjourney Chinese website introduction|Official website open for free testing

Midjourney Introduction Midjourney is an independent research lab exploring new mediums of thought and expanding the imagination of the human species. It provides an AI service that generates images based on textual descriptions, allowing users to create a variety of art forms, from realistic to abstract wind...
1yrs ago
061.5K
ChatTTS:模仿真人说话声音的语音生成模型(ChatTTS一键加速包)

ChatTTS: a speech generation model that mimics the voice of a real person speaking (ChatTTS one-click acceleration package)

General Introduction ChatTTS is a generative speech model designed for conversational scenarios. It generates natural and expressive speech, supports multiple languages and multiple speakers, and is suitable for interactive conversations. The model does this by predicting and controlling fine-grained prosodic features such as laughter, pauses and interjections, sup...
1yrs ago
061.5K
飞桨 PP-TableMagic:复杂表格结构化信息提取神器

Flying Paddle PP-TableMagic: Structured Information Extraction for Complex Tables

The goal of table recognition is to parse tables in images, accurately identify table structures and cell locations, and reduce them to structured table formats (e.g., HTML). In today's information age, a large amount of important tabular data still exists in an unstructured state (e.g., scanned documents with pictures of statistical tables...).
1yrs ago
061.5K
录咖:一站式音视频处理平台|视频生成|AI字幕|提取音频|语音转文字

Record Cafe: One-stop Audio/Video Processing Platform|Video Generation|AI Subtitle|Audio Extraction|Speech to Text

Comprehensive Introduction Record Cafe is a one-stop audio/video processing platform that provides AI video dialog, AI subtitles and AI speech to text services. Functions include recording screen, editing video, converting GIF/audio, etc., and supports cloud storage and sharing. The interface is intuitive and easy to use, and it also supports multi-screen recording and multi-language smart...
1yrs ago
061.4K
YuE:将歌词转化为完整歌曲的基础模型,支持多种音乐风格

YuE: Transforms lyrics into a base model of a complete song, supporting a wide range of musical styles

General Introduction YuE is an open source full song generation base model that focuses on transforming lyrics into full songs. Unlike other models that can only generate short snippets of non-vocal music, YuE is capable of generating full songs with lead and backing vocals up to several minutes in length. The model addresses music generation in...
1yrs ago
061.3K
OpenAOE:大模型群聊框架:同时与多个大语言模型聊天

OpenAOE: Large Model Group Chat Framework: Chatting with Multiple Large Language Models Simultaneously

Comprehensive Introduction OpenAOE is an open source large model group chat framework, aiming to solve the problem of the lack of chat frameworks in the current market with multiple models responding in parallel. With OpenAOE, users can talk to multiple Large Language Models (LLMs) at the same time and get parallel output. The framework supports ...
1yrs ago
061.3K
AI2SRT:利用 Gemini模型,一键为长视频创建解说短视频或视频总结

AI2SRT: Create short narrated videos or video summaries for long videos with one click using Gemini models

Comprehensive Introduction AI2SRT is an open source project that utilizes the GeminiAI Big Model to generate short narrated videos and video summaries for long videos with one click, while supporting audio and video transcription subtitles. The project aims to simplify the video content creation process and provide efficient subtitle generation and translation functions. Users can pass...
1yrs ago
061.3K
GeekAI:自部署商业化多功能AI助手,完整接入多模型API运营后台

GeekAI: Self-deployed commercialized multi-functional AI assistant with complete access to multi-model API operation backend

Comprehensive introduction GeekAI is a full set of open source solutions for AI assistants based on AI big language model API implementation. The project comes with an operations management backend , out of the box , integrated with ChatGPT, Azure, ChatGLM, Xunfei Starfire, Wenxin Yiyin and many other p...
1yrs ago
061.2K
LazyLLM:商汤开源构建多智能体应用的低代码开发工具

LazyLLM: Shangtang's open source low-code development tool for building multi-intelligence body applications

Comprehensive Introduction LazyLLM is an open source tool developed by the LazyAGI team, focusing on simplifying the development process of multi-intelligence large model applications. It helps developers quickly build complex AI applications through one-click deployment and lightweight gateway mechanisms, saving tedious engineering configuration...
1yrs ago
061.2K
MMAudio:为视频画面生成同步音效与配乐,视频到音频的多模态联合训练工具

MMAudio: generating synchronized sound effects and soundtracks for video footage, video-to-audio multimodal co-training tool

General Introduction MMAudio is an open-source project aiming to generate high-quality synchronized audio through joint multimodal training. Developed by Ho Kei Cheng et al. at the Chinese University of Hong Kong, the project's main function is to generate synchronized audio based on video and/or text input.MM...
1yrs ago
061.1K
Sana:快速生成高分辨率图像,0.6B超小尺寸模型,低配笔记本GPU运行

Sana: fast generation of high-resolution images, 0.6B ultra-small size model, low-profile laptop GPU operation

General Introduction Sana is an efficient high-resolution image generation framework developed by NVIDIA Labs, capable of generating images up to 4096 × 4096 resolution in a matter of seconds.Sana utilizes a linear diffusion transformer and deep compression self-encoder technology to significantly...
1yrs ago
061.1K
Hibiki:实时语音翻译模型,保留原声特点的流式翻译

Hibiki: a real-time speech translation model, streaming translation that preserves the characteristics of the original voice

General Introduction Hibiki is a high-fidelity real-time speech translation model developed by Kyutai Labs. Unlike traditional offline translation, Hibiki is able to generate natural speech translation in the target language and provide text translation in real time while the user is speaking. The model...
1yrs ago
061K
Ultravox:实时端到端语音对话的音频多模态大模型,GPT-4o语音交互的开源实现

Ultravox: an audio multimodal macromodel for real-time end-to-end voice dialog, an open source implementation of GPT-4o voice interaction

Comprehensive Introduction Ultravox is an innovative multimodal Large Language Model (LLM) designed for real-time speech processing. Unlike traditional speech recognition systems, Ultravox eliminates the need for a separate Audio Speech Recognition (ASR) stage, and is able to directly convert audio into high-dimensional space in...
1yrs ago
060.9K
Tough Tongue AI:与AI对话练习面试与职场沟通技巧

Tough Tongue AI: Practice Interview and Workplace Communication Skills by Talking to an AI

General Introduction Tough Tongue AI is an artificial intelligence platform designed for practicing tough conversations. Users can simulate a variety of complex conversational situations, such as job interviews, salary negotiations, sales presentations, etc. by selecting preset scenarios or creating custom scenarios. The platform provides video and...
1yrs ago
060.9K
秘塔AI搜索:提供无广告的高效学术搜索服务,研究模式深度挖掘知识

Secreta AI Search: Providing ad-free and efficient academic search services, research model for deep knowledge mining

General Introduction Secreta AI Search is a technology company dedicated to improving productivity through artificial intelligence technology. The site provides ad-free and efficient academic search services, aiming to provide users with accurate and fast search results. Secret Tower AI Search has a self-developed large language model, MetaLLM, which can...
1yrs ago
060.8K
WebShaper - 阿里通义开源的AI训练数据合成系统

WebShaper - Ali Tongyi's open source AI training data synthesis system

WebShaper is an AI training data synthesis system launched by Alibaba's Tongyi Lab, which is based on formal modeling and intelligence expansion mechanism to generate high-quality and scalable training data to help AI intelligences improve complex information retrieval capabilities. The system introduces the concept of "knowledge projection"...
8mos ago
060.8K
VideoRAG:理解超长视频的RAG框架,支持多模态检索和知识图谱构建

VideoRAG: A RAG framework for understanding ultra-long videos with support for multimodal retrieval and knowledge graph construction

Comprehensive Introduction VideoRAG is a retrieval-enhanced generative framework designed for processing and understanding very long contextual videos. The tool combines a graph-driven textual knowledge base with hierarchical multimodal context encoding to efficiently process on a single NVIDIA RTX 3090 GPU...
1yrs ago
060.7K
佐糖:在线图片处理工具,一键抠图、去水印、照片修复、人像编辑

Zosugar: online photo processing tools, one-click keying, watermark removal, photo restoration, portrait editing

Comprehensive Introduction ZuoSugar (PicWish) is an intelligent AI image processing platform, providing a wealth of online photo editing tools, supporting the use of all platforms. Users can easily complete one-click keying, watermark removal, blurry photos become clear, lossless zoom, image cropping, image compression and black and white photo...
1yrs ago
060.6K
AutoAgent:通过自然语言快速创建并部署AI智能体的框架

AutoAgent: a framework for rapid creation and deployment of AI intelligences through natural language

General Introduction AutoAgent is an open source AI intelligences framework developed by the Data Intelligence Laboratory of the University of Hong Kong (HKUDS) and hosted on GitHub.It allows users to rapidly create and deploy customized AI intelligences by describing their requirements in purely natural language, without any programming base...
9mos ago
060.6K
LunaAI换脸:开源的秒鸭相机,部署前后端完整的企业级AI换脸小程序(算力服务付费,可二开)

LunaAI face swap: open source second duck camera, deploy front and back-end complete enterprise AI face swap applet (arithmetic service payment, can be two open)

Comprehensive Introduction LunaAI face swap applet is a face swap application developed based on uniapp and Vue framework. The application utilizes technologies such as PHP, MySQL, Nginx and Redis to achieve the function of the user's face changing operation through the applet. Users can use this small...
1yrs ago
060.5K
Hallo2:音频驱动生成口型/表情同步的肖像视频(Windows一键安装)

Hallo2: audio-driven generation of lip-synchronized/expression-synchronized portrait videos (Windows one-click installation)

General Introduction Hallo2 is an open-source project jointly developed by Fudan University and Baidu, aiming to generate high-resolution portrait animations through audio-driven generation. The project utilizes advanced Generative Adversarial Networks (GAN) and time alignment techniques to achieve 4K resolution and up to 1 hour long video generation...
1yrs ago
060.4K
Flow(Laminar):构建智能体的轻量级任务引擎,简化并灵活管理任务

Flow (Laminar): a lightweight task engine for building intelligences that simplifies and flexibly manages tasks

Comprehensive Introduction Flow is a lightweight task engine designed for building AI agents, emphasizing simplicity and flexibility. Unlike traditional node- and edge-based workflows, Flow uses a dynamic task queuing system that supports parallel execution, dynamic scheduling, and intelligent dependency management. Its core concept is ...
1yrs ago
060.4K
LTX Studio:拥有分镜管理工具的AI电影制作平台,可设置多人物保持面部一致

LTX Studio: AI movie-making platform with split-screen management tools to set up multiple characters to keep their faces consistent

General Introduction LTX Studio is an innovative AI-driven video creation platform designed for creators, marketers, filmmakers and studios. It provides full-process operation from story conceptualization, split-screen generation, kinetic effects addition to post-editing, helping users transform creative concepts into...
1yrs ago
060.3K
VideoLingo:视频转录单词级时间轴字幕,视频字幕翻译和本地化配音开源工具

VideoLingo: video transcription word-level timeline subtitles, video subtitle translation and localized dubbing open source tools

General Description VideoLingo is a one-stop video translation and localization dubbing tool designed to generate Netflix-grade, high-quality subtitles, eliminating raw machine translation and multi-line subtitles, and adding high-quality voiceovers that enable global knowledge to be shared across language barriers. By...
1yrs ago
060.3K
ModelBest(面壁智能):全球领先的轻量高性能端侧大模型

ModelBest: The World's Leading Lightweight, High-Performance End-Side Big Model

General Introduction ModelBest is a company specializing in developing lightweight and high-performance large models, dedicated to applying advanced AI technologies to mainstream consumer electronics and various end devices in daily life. Its MiniCPM series of end-side models are characterized by extreme arithmetic power and memory usage efficiency...
1yrs ago
060.2K