AI Personal Learning
and practical guidance
豆包Marscode1
Total 39 articles

Tags: ai voice cloning

Step-Audio:多模态语音交互框架,识别语音并使用克隆语音交流等功能-首席AI分享圈

Step-Audio: a multimodal voice interaction framework that recognizes speech and communicates using cloned speech, among other features

Comprehensive Introduction Step-Audio is an open source intelligent voice interaction framework designed to provide out-of-the-box speech understanding and generation capabilities for production environments. The framework supports multi-language dialog (e.g., Chinese, English, Japanese), emotional speech (e.g., happy, sad), regional dialects (e.g., Cantonese, Szechuan), and can...

Llasa 1~8B:高品质语音生成和克隆的开源文本转语音模型-首席AI分享圈

Llasa 1~8B: an open source text-to-speech model for high quality speech generation and cloning

General Introduction Llasa-3B is an open source text-to-speech (TTS) model developed by the Audio Lab of the Hong Kong University of Science and Technology (HKUST Audio). The model is based on the Llama 3.2B architecture, which has been carefully tuned to provide high-quality speech generation that not only supports multiple languages, but also enables emotional expression and personality...

Fish Agent:端到端AI语音克隆助手,实时语音对话助理,Fish Speech衍生项目-首席AI分享圈

Fish Agent: end-to-end AI voice cloning assistant, real-time voice conversation assistant, Fish Speech spin-off project

Comprehensive Introduction Fish Speech Derivative Project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on the V0.1 3B model architecture. As a fully end-to-end speech cloning processing system, its most important feature is that it is designed with an innovative semantic-free tagging architecture, which does not need to rely on Whisper...

ViiTor AI:音频/视频多语言翻译合成与语音克隆服务-首席AI分享圈

ViiTor AI: Audio/Video Multilingual Translation Synthesis and Speech Cloning Service

Comprehensive Introduction ViiTor AI is a powerful artificial intelligence platform focused on providing high-quality video translation, voice cloning, AI-generated avatar videos, and speech synthesis services. The platform supports multiple languages and is designed to help users easily realize multilingual content creation.ViiTor AI's video translation...

Amphion MaskGCT:零样本文本到语音克隆模型(本地一键部署包)-首席AI分享圈

Amphion MaskGCT: Zero-sample text-to-speech cloning model (local one-click deployment package)

Comprehensive Introduction MaskGCT (Masked Generative Codec Transformer) is a completely non-autoregressive Text-to-Speech (TTS) model jointly introduced by Funky Maru Technology and The Chinese University of Hong Kong. The model does not require explicit text-to-speech alignment information and adopts a two-stage generation approach, which first passes ...

趣丸千音:语音克隆并结合口型同步,一键翻译视频为多语言!-首席AI分享圈

Funky Maru Chiyo: Voice cloning and combined with mouth synchronization to translate videos into multiple languages with one click!

Comprehensive Introduction Funmaru Thousand Voices is a multilingual AI voice synthesis platform that provides realistic and natural voice generation solutions. Users can easily convert text content into professional-grade audio and support the creation of exclusive AI voices (voice clones) from zero samples to meet personalized needs. The platform also provides video translation features to help...

CosyVoice:阿里推出的3秒急速语音克隆开源项目,支持情感控制标签-首席AI分享圈

CosyVoice: 3-second rush voice cloning open source project launched by Ali with support for emotionally controlled tags

Comprehensive Introduction CosyVoice is a multilingual large-scale speech generation model that provides full-stack capabilities from inference, training to deployment. Developed by FunAudioLLM team, it aims to achieve high quality speech synthesis through advanced autoregressive transformers and ODE-based diffusion models.CosyVoice not only supports...

Coqui TTS(xTTS):文本到语音生成的深度学习工具包,支持多种语言和声音克隆功能-首席AI分享圈

Coqui TTS (xTTS): Deep Learning Toolkit for Text-to-Speech Generation with Multiple Language Support and Voice Cloning Capabilities

Comprehensive Introduction Coqui TTS is an open source advanced text-to-speech (TTS) generation toolkit based on deep learning techniques. It has been battle-tested in both research and production environments, and provides a rich set of features and models that support text-to-speech conversion in multiple languages.Coqui TTS not only supports pre-trained models...

en_USEnglish