AI Personal Learning
and practical guidance

Edge TTS Worker: Deploying Microsoft Speech Synthesis APIs with Cloudflare, OpenAI Compatible Format and Wrapped Web Interface

General Introduction

Edge TTS Worker (dependencies) edge-tts Edge TTS Worker is a proxy service deployed on Cloudflare Worker that encapsulates Microsoft Edge TTS service into an API interface compatible with OpenAI format. With this project, users can easily use Microsoft's high-quality speech synthesis service without Microsoft's certification.Edge TTS Worker provides multi-language support, including Chinese, English, Japanese, Korean, etc., and is completely free, based on the Cloudflare Worker Free Plan. The service also supports custom API keys to ensure security and control, and can be deployed quickly, in minutes.

Test API: https://tts.aishare.us.kg/ KEY: aisharenet


A project to package a simple interface for the API

Edge TTS Worker: Deploying Microsoft Speech Synthesis API using Cloudflare, Compatible with OpenAI Format and Packaging Web Interface-1

 

Packaging APIs and interfaces are fully deployed using Cloudflare:

The complete code is generated by CHATGPT, the code is attached at the end of the article, you can choose to more AI programming Tools or use Trickle (One-click generation of online accessible speech generation services):

Edge TTS Worker: Deploying Microsoft Speech Synthesis APIs with Cloudflare, Compatible with OpenAI Format and Wrapped Web Interface-1

Experience: https://edgetts.aishare.us.kg/

 

Function List

  • Provides OpenAI-compatible interface formats
  • Bypass mainland access restrictions and eliminate Microsoft service authentication step
  • Multi-language support, including Chinese, English, Japanese, Korean, etc.
  • Totally free, based on Cloudflare Worker Free Plan
  • Support for customizing API keys to ensure security and control
  • Rapid deployment, ready in minutes
  • Provide test scripts for testing different voice effects

 

Using Help

Installation process

  1. Creating a Worker
    • Login to Cloudflare Dashboard
    • Go to Workers & Pages and click Create Worker.
    • Give the worker a name (e.g. edge-tts)
  2. Deployment Code
    • Remove the default code from the editor
    • make a copy of worker.js and paste the code in the
    • Click Save and deploy
  3. Setting the API Key (optional)
    • Find Settings -> Variables in the Worker's settings page.
    • Click Add variable, fill in API_KEY for the name and the value of the key you want.
    • Click Save and deploy
  4. Configure a custom domain name (optional)
    • Prerequisites: Your domain name is already hosted on Cloudflare and the DNS records for the domain name have been proxied through Cloudflare (proxy status is orange cloud)
    • Configuration Steps:
      • Click the Settings tab on the Worker details page.
      • Locate the Domain and Routing section and click the Add button.
      • Select Custom Domain and enter the domain name you want to use (e.g. tts.example.com)
      • Click Add Domain and wait for the certificate deployment to complete (usually within a few minutes)

Usage

  1. TTY (text-to-speech interface)
    • Chinese speech example:
     curl -X POST https://你的worker地址/v1/audio/speech \
    -H "Content-Type: application/json" \\
    -H "Authorization: Bearer your-api-key" \
    -d '{
    "model": "tts-1", "input": "Hello".
    "input": "Hello, world!" ,
    "voice": "zh-CN-XiaoxiaoNeural",
    "response_format": "mp3",
    "speed": 1.0, "response_format".
    "pitch": 1.0,
    "style": "general"
    }' --output chinese.mp3
    
    • Example of English speech:
     curl -X POST https://你的worker地址/v1/audio/speech \
    -H "Content-Type: application/json" \\
    -H "Authorization: Bearer your-api-key" \
    -d '{
    "model": "tts-1", "input": "Hello World".
    "input": "Hello, World!", "voice": "en-US".
    "voice": "en-US-JennyNeural",
    
    
    "pitch": 1.0, "style": "general".
    "style": "general"
    }' --output english.mp3
    
  2. Test Script Usage
    • Download Test Script test_voices.sh
    • Add execution permissions to the script: bash
      chmod +x test_voices.sh
    • Run the script: bash
      . /test_voices.sh [API key].
    • Example: bash
      # Use API key
      . /test_voices.sh https://your-worker.workers.dev your-api-key
      # does not use API keys
      . /test_voices.sh https://your-worker.workers.dev
    • The script generates test audio files for each supported voice, which you can play to select the most suitable voice.

API Parameter Description

  • model (string): Model name (fixed value), e.g. tts-1
  • input (string): text to be converted, e.g. "Hello, world!"
  • voice (string): name of the voice, e.g. zh-CN-XiaoxiaoNeural
  • response_format (string, optional): output format, default value is mp3
  • speed (number, optional): speed of speech (0.5-2.0), default is 1.0
  • pitch (number, optional): tone (0.5-2.0), default is 1.0
  • style (string, optional): emotion, default value is general

List of supported voices

Please make sure to use the language text that corresponds to the voice, for example, Chinese voice should be used with Chinese text. The following are examples of commonly used voices:

  • zh-CN-XiaoxiaoNeural: Xiaoxiao - Warm and Lively
  • zh-CN-XiaoyiNeural: Xiaoyi - Warm and Friendly
  • zh-CN-YunxiNeural:: Wunsch - male voice, steady
  • zh-CN-YunyangNeural:: Yun Yang - male voice, professional
  • zh-CN-XiaohanNeural: Xiaohan - Natural Flow
  • zh-CN-XiaomengNeural: Xiaomeng - Sweet and Vibrant
  • zh-CN-XiaochenNeural: Xiaochen - Gentle and Easy
  • Wait...

 

workers.js code

const API_KEY = 'aisharenet'; // replace with your actual API key
const TOKEN_REFRESH_BEFORE_EXPIRY = 3 * 60; // Token 刷新时间(秒)
const DEFAULT_VOICE = 'zh-CN-XiaoxiaoNeural'; // 默认语音
const DEFAULT_SPEED = 1.0; // 默认语速
const DEFAULT_PITCH = 1.0; // 默认音调
let tokenInfo = {
endpoint: null,
token: null,
expiredAt: null
};
// 处理请求
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request));
});
async function handleRequest(request) {
const url = new URL(request.url);
// 如果请求根路径,返回 Web UI
if (url.pathname === '/') {
return new Response(getWebUI(), {
headers: { 'Content-Type': 'text/html' },
});
}
// 处理 TTS 请求
if (url.pathname === '/v1/audio/speech') {
return handleTTSRequest(request);
}
// 默认返回 404
return new Response('Not Found', { status: 404 });
}
// 处理 TTS 请求
async function handleTTSRequest(request) {
if (request.method === 'OPTIONS') {
return handleOptions(request);
}
// 验证 API Key
const authHeader = request.headers.get('authorization') || request.headers.get('x-api-key');
const apiKey = authHeader?.startsWith('Bearer ') ? authHeader.slice(7) : null;
if (API_KEY && apiKey !== API_KEY) {
return createErrorResponse('Invalid API key. Use \'Authorization: Bearer your-api-key\' header', 'invalid_api_key', 401);
}
try {
const requestBody = await request.json();
const { 
model = "tts-1",
input,
voice = DEFAULT_VOICE,
speed = DEFAULT_SPEED,
pitch = DEFAULT_PITCH
} = requestBody;
// 验证参数范围
validateParameterRange('speed', speed, 0.5, 2.0);
validateParameterRange('pitch', pitch, 0.5, 2.0);
const rate = calculateRate(speed);
const numPitch = calculatePitch(pitch);
const audioResponse = await getVoice(
input,
voice,
rate >= 0 ? `+${rate}%` : `${rate}%`,
numPitch >= 0 ? `+${numPitch}%` : `${numPitch}%`,
'+0%',
'general', // 固定风格为通用
'audio-24khz-48kbitrate-mono-mp3' // 固定音频格式为 MP3
);
return audioResponse;
} catch (error) {
return createErrorResponse(error.message, 'edge_tts_error', 500);
}
}
// 处理 OPTIONS 请求
function handleOptions(request) {
return new Response(null, {
headers: {
...makeCORSHeaders(),
'Access-Control-Allow-Methods': 'GET,HEAD,POST,OPTIONS',
'Access-Control-Allow-Headers': request.headers.get('Access-Control-Request-Headers') || 'Authorization',
},
});
}
// 创建错误响应
function createErrorResponse(message, code, status) {
return new Response(
JSON.stringify({
error: { message, code }
}),
{
status,
headers: { 'Content-Type': 'application/json', ...makeCORSHeaders() },
}
);
}
// 验证参数范围
function validateParameterRange(name, value, min, max) {
if (value < min || value > max) {
throw new Error(`${name} must be between ${min} and ${max}`);
}
}
// 计算语速
function calculateRate(speed) {
return parseInt(String((parseFloat(speed) - 1.0) * 100));
}
// 计算音调
function calculatePitch(pitch) {
return parseInt(String((parseFloat(pitch) - 1.0) * 100));
}
// 生成 CORS 头
function makeCORSHeaders() {
return {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET,HEAD,POST,OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type, x-api-key',
'Access-Control-Max-Age': '86400',
};
}
// 获取语音
async function getVoice(text, voiceName, rate, pitch, volume, style, outputFormat) {
try {
const chunks = text.trim().split("\n");
const audioChunks = await Promise.all(chunks.map(chunk => getAudioChunk(chunk, voiceName, rate, pitch, volume, style, outputFormat)));
// 将音频片段拼接起来
const concatenatedAudio = new Blob(audioChunks, { type: `audio/${outputFormat.split('-').pop()}` });
return new Response(concatenatedAudio, {
headers: {
'Content-Type': `audio/${outputFormat.split('-').pop()}`,
...makeCORSHeaders(),
},
});
} catch (error) {
console.error("语音合成失败:", error);
return createErrorResponse(error.message, 'edge_tts_error', 500);
}
}
// 获取单个音频片段
async function getAudioChunk(text, voiceName, rate, pitch, volume, style, outputFormat) {
const endpoint = await getEndpoint();
const url = `https://${endpoint.r}.tts.speech.microsoft.com/cognitiveservices/v1`;
const slien = extractSilenceDuration(text);
const response = await fetch(url, {
method: "POST",
headers: {
"Authorization": endpoint.t,
"Content-Type": "application/ssml+xml",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0",
"X-Microsoft-OutputFormat": outputFormat,
},
body: getSsml(text, voiceName, rate, pitch, volume, style, slien),
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(`Edge TTS API error: ${response.status} ${errorText}`);
}
return response.blob();
}
// 提取静音时长
function extractSilenceDuration(text) {
const match = text.match(/\[(\d+)\]\s*?$/);
return match && match.length === 2 ? parseInt(match[1]) : 0;
}
// 生成 SSML
function getSsml(text, voiceName, rate, pitch, volume, style, slien) {
const slienStr = slien > 0 ? `` : '';
return ` 
${text} 
${slienStr}
`;
}
// 获取 Endpoint
async function getEndpoint() {
const now = Date.now() / 1000;
if (tokenInfo.token && tokenInfo.expiredAt && now < tokenInfo.expiredAt - TOKEN_REFRESH_BEFORE_EXPIRY) {
return tokenInfo.endpoint;
}
// 获取新 Token
const endpointUrl = "https://dev.microsofttranslator.com/apps/endpoint?api-version=1.0";
const clientId = crypto.randomUUID().replace(/-/g, "");
try {
const response = await fetch(endpointUrl, {
method: "POST",
headers: {
"Accept-Language": "zh-Hans",
"X-ClientVersion": "4.0.530a 5fe1dc6c",
"X-UserId": "0f04d16a175c411e",
"X-HomeGeographicRegion": "zh-Hans-CN",
"X-ClientTraceId": clientId,
"X-MT-Signature": await sign(endpointUrl),
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0",
"Content-Type": "application/json; charset=utf-8",
"Content-Length": "0",
"Accept-Encoding": "gzip",
},
});
if (!response.ok) {
throw new Error(`获取 Endpoint 失败: ${response.status}`);
}
const data = await response.json();
const jwt = data.t.split(".")[1];
const decodedJwt = JSON.parse(atob(jwt));
tokenInfo = {
endpoint: data,
token: data.t,
expiredAt: decodedJwt.exp,
};
return data;
} catch (error) {
console.error("获取 Endpoint 失败:", error);
if (tokenInfo.token) {
console.log("使用过期的缓存 Token");
return tokenInfo.endpoint;
}
throw error;
}
}
// 签名
async function sign(urlStr) {
const url = urlStr.split("://")[1];
const encodedUrl = encodeURIComponent(url);
const uuidStr = uuid();
const formattedDate = dateFormat();
const bytesToSign = `MSTranslatorAndroidApp${encodedUrl}${formattedDate}${uuidStr}`.toLowerCase();
const decode = await base64ToBytes("oik6PdDdMnOXemTbwvMn9de/h9lFnfBaCWbGMMZqqoSaQaqUOqjVGm5NqsmjcBI1x+sS9ugjB55HEJWRiFXYFw==");
const signData = await hmacSha256(decode, bytesToSign);
const signBase64 = await bytesToBase64(signData);
return `MSTranslatorAndroidApp::${signBase64}::${formattedDate}::${uuidStr}`;
}
// 格式化日期
function dateFormat() {
return (new Date()).toUTCString().replace(/GMT/, "").trim() + " GMT";
}
// HMAC SHA-256 签名
async function hmacSha256(key, data) {
const cryptoKey = await crypto.subtle.importKey(
"raw",
key,
{ name: "HMAC", hash: { name: "SHA-256" } },
false,
["sign"]
);
const signature = await crypto.subtle.sign("HMAC", cryptoKey, new TextEncoder().encode(data));
return new Uint8Array(signature);
}
// Base64 转字节数组
async function base64ToBytes(base64) {
const binaryString = atob(base64);
const bytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
return bytes;
}
// 字节数组转 Base64
async function bytesToBase64(bytes) {
return btoa(String.fromCharCode.apply(null, bytes));
}
// 生成 UUID
function uuid() {
return crypto.randomUUID().replace(/-/g, "");
}
// 获取 Web UI
function getWebUI() {
return `

 

Text-to-Speech Tool

 

 

 


 

 

 

 

Generating voice, please wait...

 

`; }
May not be reproduced without permission:Chief AI Sharing Circle " Edge TTS Worker: Deploying Microsoft Speech Synthesis APIs with Cloudflare, OpenAI Compatible Format and Wrapped Web Interface

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish