General Introduction
Open-VoiceCanvas is an open source speech synthesis platform developed by the ItusiAI team. It supports more than 50 languages, and can convert text to natural speech, as well as clone personalized voices by uploading audio. The project integrates OpenAI TTS, AWS Polly, and MiniMax speech services, and offers a variety of timbre options and speech rate adjustment. The code 100% is open source and hosted on GitHub, where users can download and modify it for free. It also supports Google and GitHub logins, as well as Stripe payments for easy unlocking of advanced features. This tool is suitable for developers, content creators and regular users.
Function List
- Supports text-to-speech conversion in more than 50 languages.
- Provides a variety of voice services: OpenAI TTS (natural speech), AWS Polly (multi-language), MiniMax (Chinese optimized).
- Supports male and female voice selection with adjustable speech rate.
- Provides sound cloning function, users can upload audio to create personalized tones.
- Supports text file upload and audio file download, processing long text without pressure.
- Integrated Google and GitHub login, multi-language interface and dark/light color themes.
- Subscription services are available through Stripe, including free trials, monthly/yearly payments, and volume billing.
Using Help
Open-VoiceCanvas is a powerful open source tool. Here is a detailed installation and usage guide to help you get started quickly.
Installation process
- Preparing the environment
Before you begin, make sure the following tools are installed on your computer:- Git: for downloading code.
- Node.js (18.x or above recommended): Runs the front-end and back-end.
- npm: Package management tool for Node.js.
Check if it is installed:
git --version
node --version
npm --version
If it is missing, you can go to the official website to download and install it.
- Cloning Code
Open a terminal and enter the following command to download the project:
git clone https://github.com/ItusiAI/Open-VoiceCanvas.git
Go to the project catalog:
cd Open-VoiceCanvas
- Installation of dependencies
Run the following command to install the required libraries:
npm install
If the network is slow, use a domestic mirror:
npm install --registry=https://registry.npmmirror.com
- Configuring Environment Variables
In the project root directory, create the.env
file, add the following configuration (you need to replace it with your own key):
# OpenAI
OPENAI_API_KEY="your_openai_api_key"
# AWS Polly
NEXT_PUBLIC_AWS_REGION="us-east-1"
NEXT_PUBLIC_AWS_ACCESS_KEY_ID="your_aws_access_key_id"
NEXT_PUBLIC_AWS_SECRET_ACCESS_KEY="your_aws_secret_access_key"
# MiniMax
MINIMAX_API_KEY="your_minimax_api_key"
MINIMAX_GROUP_ID="your_minimax_group_id"
# 数据库
DATABASE_URL="your_neon_db_url"
# Stripe
STRIPE_SECRET_KEY="your_stripe_secret_key"
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY="your_stripe_publishable_key"
STRIPE_WEBHOOK_SECRET="your_stripe_webhook_secret"
# NextAuth
NEXTAUTH_URL="http://localhost:3000"
NEXTAUTH_SECRET="your_nextauth_secret"
# OAuth
GITHUB_ID="your_github_client_id"
GITHUB_SECRET="your_github_client_secret"
GOOGLE_ID="your_google_client_id"
GOOGLE_SECRET="your_google_client_secret"
These keys need to be obtained from the official websites of the corresponding services, such as OpenAI, AWS, MiniMax, Neon, Stripe and GitHub/Google OAuth.
- Running a database migration
Configure the database and run:
npx prisma migrate dev
This initializes the PostgreSQL database.
- triggering program
Enter the following command to start the development server:
npm run dev
After launching, the browser accesses the http://localhost:3000
You can see the interface.
Main Functions
text-to-speech
- Open the web page, log in and go to the main screen.
- Enter text in the "Text Input" box, e.g. "Hello, it's Wednesday".
- Select language (supports more than 50, such as Chinese, English, Japanese, etc.).
- Choose a voice service: OpenAI TTS, AWS Polly, or MiniMax.
- Pick a timbre (male or female, such as OpenAI's "nova" or AWS's "Joanna").
- Adjust the speed of speech (range 0.5-2.0, 1.0 is the normal speed).
- Click "Generate" to preview the audio in a few seconds.
- Click "Download" to save as an MP3 file.
sound cloning
- Go to the "Sound Cloning" page.
- Click "Upload Audio" and select a clear 10-20 second audio clip (WAV or MP3 format).
- Enter the name of the tone, e.g. "My Voice".
- Click "Clone" and wait 1-2 minutes for the process to complete.
- After successful cloning, the new tone will appear in the Tone List.
- Return to the Text-to-Speech page, select Clone Tone and enter text to generate speech.
Documents processing
- Click "Upload Text File" on the main screen.
- Select one
.txt
file, the content is automatically loaded into the input box. - Generate audio after setting the language, timbre and speech rate.
- Long text is automatically segmented to ensure smooth generation.
Subscribe and Login
- Click "Sign in" in the upper right corner and choose Google or GitHub account authorization.
- Log in to view character quotas and clone counts.
- Click on "Subscribe" and choose a free trial, monthly (pay monthly) or yearly (pay annually) plan.
- Enter your payment info via Stripe and unlock more features when you complete your subscription.
caveat
- Audio Requirements: The audio used for cloning should be clear and free of background noise.
- key security: Don't give it away.
.env
key in the file. - network requirement: Downloading the model is required for the first run to keep the network stable.
- Technical Support: You can file an issue on GitHub if you encounter problems.
With these steps, you can take full advantage of Open-VoiceCanvas' capabilities. Its open source design also supports developer customization, such as adding new voice services or tweaking the interface.
application scenario
- content creation
Anchors can use it to generate multi-language narration and save recording time.
Scenario Description: A YouTuber generates video commentary in Chinese and English, and directly downloads the audio for editing. - Educational support
Teachers convert textbooks to speech to create instructional audio.
Scenario Description: An English teacher uploads a text and generates American pronunciation audio for students to practice listening. - Personalized Applications
Developers clone their own voices to create unique voice assistants.
Scenario description: a programmer clones a voice and integrates it into a smart home system to broadcast the weather in his own voice. - recreational use
Users generate funny voice to share with friends.
Scenario Description: Someone generates a "Happy Birthday" audio in a friend's voice as a surprise gift.
QA
- What voice services are supported?
Support for OpenAI TTS (natural speech), AWS Polly (multi-language) and MiniMax (Chinese optimized). - What does it take to clone a voice?
Need 10-20 seconds of clear audio in WAV or MP3 format with as little background noise as possible. - What is the difference between the free version and the paid version?
The free version has character and clone limits, the paid version offers more quotas and tone options. - How do I fix a startup failure?
Check the Node.js version (18.x recommended) to make sure environment variables are configured correctly and dependencies are fully installed.