AI Personal Learning
and practical guidance

Fish Agent: end-to-end AI voice cloning assistant, real-time voice conversation assistant, Fish Speech spin-off project

General Introduction

Fish Speech derivative project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on V0.1 3B model architecture. As a fully end-to-end speech cloning processing system, its most important feature is that it adopts an innovative semantic tagless architecture design, which does not need to rely on traditional semantic encoders/decoders such as Whisper, and can directly realize speech-to-speech conversion. With ultra-low latency (as low as 150 ms), the system is able to accurately capture and generate ambient audio information to achieve near real-time speech cloning effects.Fish Agent is open to pre-trained model downloads and supports local deployment for training and cloud service invocation, providing developers and users with a flexible usage scheme. With integrated speech recognition and speech synthesis functions, along with a precise tone control system, Fish Agent is able to create a natural and smooth voice interaction experience.

End-to-end architecture, zero-sample sound cloning, compact model with 3 billion parameters, support for multilingualism and fast response. Training data includes 700,000 hours of multilingual audio. Based on Qwen-2.5-3B-Instruct continued pre-training. The model, named Fish Agent version 3B, automatically integrates ASR and TTS components, eliminating the need for external models and enabling true end-to-end processing, distinguishing it from the traditional three-stage (ASR + LLM + TTS) process.

Fish Agent: Experience end-to-end AI voice cloning assistant, real-time voice conversation assistant (English)-1

Experience: https://huggingface.co/spaces/fishaudio/fish-agent

 

Function List

  • Ultra-low latency voice cloning: 150 ms response time, supports real-time voice conversion
  • Semantic-free markup architecture: an innovative end-to-end speech processing solution
  • Precision Tone Control: Precision tone adjustment via reference audio
  • Ambient audio processing: high-fidelity reproduction of environmental sound information
  • Open pre-trained models: support for localized deployment and training
  • Cloud Service API: Provide convenient cloud interface calls
  • Personalized training: supports custom sound model training

 

Using Help

1. System requirements

  • Python 3.8 or higher
  • NVIDIA GPU (recommended)
  • 8GB or more of system memory
  • CUDA support (recommended)

2. Installation steps

  1. environmental preparation
# Create a virtual environment
python -m venv fish-agent-env
source fish-agent-env/bin/activate # Linux/Mac
# or
fish-agent-env\Scripts\activate # Windows
  1. Installing Fish Agent
# Direct Installation
pip install fish-agent
# or install from source
git clone https://github.com/fishaudio/fish-agent
cd fish-agent
pip install -e .

3. Utilization process

3.1 Online service utilization

You can now try our SmartBody demo online by following the documentation for live English chat as well as local English and Chinese chat.


The demo is an early alpha test version, the inference speed needs to be optimized, and there are many bugs to be fixed. if you find a bug or want to fix it, we're happy to take questions or pull requests.

https://fish.audio/zh-CN/demo/live/

 

3.2 Local deployment

  1. service activation
from fish_agent import VoiceAgent
# Initialize Fish Agent
agent = VoiceAgent()
# Start the local service
agent.start_server(port=7860)
  1. Speech Cloning Example
# Load reference audio
reference_audio = "path/to/reference.wav"
agent.load_reference(reference_audio)
# Generate cloned voice
text = "This is a test voice"
output_path = "output.wav"
agent.generate_speech(text, output_path)
  1. Real-time conversion settings
# Start realtime voice conversion
agent.start_realtime_conversion(
input_device=0, # Input device ID
output_device=1, # output_device_id
reference_audio="path/to/reference.wav"
)

4. Advanced feature configuration

4.1 Tone Parameter Adjustment

  • Tone control parameters:
    • Pitch: -12 to 12
    • Speed of speech: 0.5 to 2.0
    • Emotion_intensity: 0 to 1.0

4.2 Batch processing

# Batch Text Processing
texts = ["text1", "text2", "text3"]
agent.batch_process(texts, output_dir="outputs/")

4.3 API calls

# API call example
import requests
url = "https://speech.fish.audio/api/v1/generate"
payload = {
"text": "Text to be converted",
"reference_audio": "base64 encoded audio file"
}
response = requests.post(url, json=payload)

5. Precautions for use

  • Reference audio quality has a significant impact on cloning results, and it is recommended to use clear recordings without background noise
  • It is recommended that the text be limited to 200 words in a single processing.
  • Real-time conversion requires a good microphone for better results
  • Commercial use requires specific authorization
  • It is recommended to update the model regularly for optimal performance

6. Resolution of common problems

  1. Audio output issues
    • Checking Audio Output Device Settings
    • Verify system volume configuration
    • Confirm audio format support
  2. performance optimization
    • Verify that the GPU is properly enabled
    • Adjusting batch parameters
    • Regular Cache Cleaning
  3. Installation Related
    • Verifying Python Version Compatibility
    • Confirm CUDA environment configuration
    • Consider a conda environment
  4. API Usage
    • Check network connection status
    • Confirming API Permission Configuration
    • Verify server response
May not be reproduced without permission:Chief AI Sharing Circle " Fish Agent: end-to-end AI voice cloning assistant, real-time voice conversation assistant, Fish Speech spin-off project

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish