Edge TTS Worker: Deploying Microsoft Speech Synthesis APIs with Cloudflare, OpenAI Compatible Format and Wrapped Web Interface

Latest AI Resources7mos agoupdate AI Sharing Circle

2.3K 00

General Introduction

Edge TTS Worker (dependencies) edge-tts Edge TTS Worker is a proxy service deployed on Cloudflare Worker that encapsulates Microsoft Edge TTS service into an API interface compatible with OpenAI format. With this project, users can easily use Microsoft's high-quality speech synthesis service without Microsoft's certification.Edge TTS Worker provides multi-language support, including Chinese, English, Japanese, Korean, etc., and is completely free, based on the Cloudflare Worker Free Plan. The service also supports custom API keys to ensure security and control, and can be deployed quickly, in minutes.

Test API: https://tts.aishare.us.kg/ KEY: aisharenet

A project to package a simple interface for the API

Edge TTS Worker：使用Cloudflare部署微软语音合成API，兼容OpenAI 格式并封装Web界面

Packaging APIs and interfaces are fully deployed using Cloudflare:

The complete code is generated by CHATGPT, the code is attached at the end of the article, you can choose to more AI programming Tools or use Trickle (One-click generation of online accessible speech generation services):

Experience: https://edgetts.aishare.us.kg/

Function List

Provides OpenAI-compatible interface formats
Bypass mainland access restrictions and eliminate Microsoft service authentication step
Multi-language support, including Chinese, English, Japanese, Korean, etc.
Totally free, based on Cloudflare Worker Free Plan
Support for customizing API keys to ensure security and control
Rapid deployment, ready in minutes
Provide test scripts for testing different voice effects

Using Help

Installation process

Creating a Worker
- Login to Cloudflare Dashboard
- Go to Workers & Pages and click Create Worker.
- Give the worker a name (e.g. edge-tts)
Deployment Code
- Remove the default code from the editor
- make a copy of worker.js and paste the code in the
- Click Save and deploy
Setting the API Key (optional)
- Find Settings -> Variables in the Worker's settings page.
- Click Add variable, fill in API_KEY for the name and the value of the key you want.
- Click Save and deploy
Configure a custom domain name (optional)
- Prerequisites: Your domain name is already hosted on Cloudflare and the DNS records for the domain name have been proxied through Cloudflare (proxy status is orange cloud)
- Configuration Steps:
  - Click the Settings tab on the Worker details page.
  - Locate the Domain and Routing section and click the Add button.
  - Select Custom Domain and enter the domain name you want to use (e.g. tts.example.com)
  - Click Add Domain and wait for the certificate deployment to complete (usually within a few minutes)

Usage

TTY (text-to-speech interface)

Chinese speech example:

 curl -X POST https://你的worker地址/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "tts-1",
"input": "你好，世界！",
"voice": "zh-CN-XiaoxiaoNeural",
"response_format": "mp3",
"speed": 1.0,
"pitch": 1.0,
"style":"general"
}' --output chinese.mp3

Example of English speech:

 curl -X POST https://你的worker地址/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "tts-1",
"input": "Hello, World!",
"voice": "en-US-JennyNeural",
"response_format": "mp3",
"speed": 1.0,
"pitch": 1.0,
"style":"general"
}' --output english.mp3

Test Script Usage
- Download Test Script test_voices.sh
- Add execution permissions to the script: bash chmod +x test_voices.sh
- Run the script: bash ./test_voices.sh <Worker地址> [API密钥]
- Example: bash # 使用 API 密钥 ./test_voices.sh https://your-worker.workers.dev your-api-key # 不使用 API 密钥 ./test_voices.sh https://your-worker.workers.dev
- The script generates test audio files for each supported voice, which you can play to select the most suitable voice.

API Parameter Description

model (string): Model name (fixed value), e.g. tts-1
input (string): text to be converted, e.g. "你好，世界！"
voice (string): name of the voice, e.g. zh-CN-XiaoxiaoNeural
response_format (string, optional): output format, default value is mp3
speed (number, optional): speed of speech (0.5-2.0), default is 1.0
pitch (number, optional): tone (0.5-2.0), default is 1.0
style (string, optional): emotion, default value is general

List of supported voices

Please make sure to use the language text that corresponds to the voice, for example, Chinese voice should be used with Chinese text. The following are examples of commonly used voices:

zh-CN-XiaoxiaoNeural: Xiaoxiao - Warm and Lively
zh-CN-XiaoyiNeural: Xiaoyi - Warm and Friendly
zh-CN-YunxiNeural:: Wunsch - male voice, steady
zh-CN-YunyangNeural:: Yun Yang - male voice, professional
zh-CN-XiaohanNeural: Xiaohan - Natural Flow
zh-CN-XiaomengNeural: Xiaomeng - Sweet and Vibrant
zh-CN-XiaochenNeural: Xiaochen - Gentle and Easy
Wait...

workers.js code

const API_KEY = 'aisharenet'; // 替换为你的实际 API key
const TOKEN_REFRESH_BEFORE_EXPIRY = 3 * 60; // Token 刷新时间（秒）
const DEFAULT_VOICE = 'zh-CN-XiaoxiaoNeural'; // 默认语音
const DEFAULT_SPEED = 1.0; // 默认语速
const DEFAULT_PITCH = 1.0; // 默认音调

let tokenInfo = {
    endpoint: null,
    token: null,
    expiredAt: null
};

// 处理请求
addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
    const url = new URL(request.url);

    // 如果请求根路径，返回 Web UI
    if (url.pathname === '/') {
        return new Response(getWebUI(), {
            headers: { 'Content-Type': 'text/html' },
        });
    }

    // 处理 TTS 请求
    if (url.pathname === '/v1/audio/speech') {
        return handleTTSRequest(request);
    }

    // 默认返回 404
    return new Response('Not Found', { status: 404 });
}

// 处理 TTS 请求
async function handleTTSRequest(request) {
    if (request.method === 'OPTIONS') {
        return handleOptions(request);
    }

    // 验证 API Key
    const authHeader = request.headers.get('authorization') || request.headers.get('x-api-key');
    const apiKey = authHeader?.startsWith('Bearer ') ? authHeader.slice(7) : null;

    if (API_KEY && apiKey !== API_KEY) {
        return createErrorResponse('Invalid API key. Use \'Authorization: Bearer your-api-key\' header', 'invalid_api_key', 401);
    }

    try {
        const requestBody = await request.json();
        const { 
            model = "tts-1",
            input,
            voice = DEFAULT_VOICE,
            speed = DEFAULT_SPEED,
            pitch = DEFAULT_PITCH
        } = requestBody;

        // 验证参数范围
        validateParameterRange('speed', speed, 0.5, 2.0);
        validateParameterRange('pitch', pitch, 0.5, 2.0);

        const rate = calculateRate(speed);
        const numPitch = calculatePitch(pitch);
        const audioResponse = await getVoice(
            input,
            voice,
            rate >= 0 ? `+${rate}%` : `${rate}%`,
            numPitch >= 0 ? `+${numPitch}%` : `${numPitch}%`,
            '+0%',
            'general', // 固定风格为通用
            'audio-24khz-48kbitrate-mono-mp3' // 固定音频格式为 MP3
        );

        return audioResponse;
    } catch (error) {
        return createErrorResponse(error.message, 'edge_tts_error', 500);
    }
}

// 处理 OPTIONS 请求
function handleOptions(request) {
    return new Response(null, {
        headers: {
            ...makeCORSHeaders(),
            'Access-Control-Allow-Methods': 'GET,HEAD,POST,OPTIONS',
            'Access-Control-Allow-Headers': request.headers.get('Access-Control-Request-Headers') || 'Authorization',
        },
    });
}

// 创建错误响应
function createErrorResponse(message, code, status) {
    return new Response(
        JSON.stringify({
            error: { message, code }
        }),
        {
            status,
            headers: { 'Content-Type': 'application/json', ...makeCORSHeaders() },
        }
    );
}

// 验证参数范围
function validateParameterRange(name, value, min, max) {
    if (value < min || value > max) {
        throw new Error(`${name} must be between ${min} and ${max}`);
    }
}

// 计算语速
function calculateRate(speed) {
    return parseInt(String((parseFloat(speed) - 1.0) * 100));
}

// 计算音调
function calculatePitch(pitch) {
    return parseInt(String((parseFloat(pitch) - 1.0) * 100));
}

// 生成 CORS 头
function makeCORSHeaders() {
    return {
        'Access-Control-Allow-Origin': '*',
        'Access-Control-Allow-Methods': 'GET,HEAD,POST,OPTIONS',
        'Access-Control-Allow-Headers': 'Content-Type, x-api-key',
        'Access-Control-Max-Age': '86400',
    };
}

// 获取语音
async function getVoice(text, voiceName, rate, pitch, volume, style, outputFormat) {
    try {
        const chunks = text.trim().split("\n");
        const audioChunks = await Promise.all(chunks.map(chunk => getAudioChunk(chunk, voiceName, rate, pitch, volume, style, outputFormat)));

        // 将音频片段拼接起来
        const concatenatedAudio = new Blob(audioChunks, { type: `audio/${outputFormat.split('-').pop()}` });
        return new Response(concatenatedAudio, {
            headers: {
                'Content-Type': `audio/${outputFormat.split('-').pop()}`,
                ...makeCORSHeaders(),
            },
        });
    } catch (error) {
        console.error("语音合成失败:", error);
        return createErrorResponse(error.message, 'edge_tts_error', 500);
    }
}

// 获取单个音频片段
async function getAudioChunk(text, voiceName, rate, pitch, volume, style, outputFormat) {
    const endpoint = await getEndpoint();
    const url = `https://${endpoint.r}.tts.speech.microsoft.com/cognitiveservices/v1`;
    const slien = extractSilenceDuration(text);

    const response = await fetch(url, {
        method: "POST",
        headers: {
            "Authorization": endpoint.t,
            "Content-Type": "application/ssml+xml",
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0",
            "X-Microsoft-OutputFormat": outputFormat,
        },
        body: getSsml(text, voiceName, rate, pitch, volume, style, slien),
    });

    if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`Edge TTS API error: ${response.status} ${errorText}`);
    }

    return response.blob();
}

// 提取静音时长
function extractSilenceDuration(text) {
    const match = text.match(/\[(\d+)\]\s*?$/);
    return match && match.length === 2 ? parseInt(match[1]) : 0;
}

// 生成 SSML
function getSsml(text, voiceName, rate, pitch, volume, style, slien) {
    const slienStr = slien > 0 ? `` : '';
    return ` 
                 
                     
                        ${text} 
                     
                    ${slienStr}
                 
            `;
}

// 获取 Endpoint
async function getEndpoint() {
    const now = Date.now() / 1000;
    if (tokenInfo.token && tokenInfo.expiredAt && now < tokenInfo.expiredAt - TOKEN_REFRESH_BEFORE_EXPIRY) {
        return tokenInfo.endpoint;
    }

    // 获取新 Token
    const endpointUrl = "https://dev.microsofttranslator.com/apps/endpoint?api-version=1.0";
    const clientId = crypto.randomUUID().replace(/-/g, "");

    try {
        const response = await fetch(endpointUrl, {
            method: "POST",
            headers: {
                "Accept-Language": "zh-Hans",
                "X-ClientVersion": "4.0.530a 5fe1dc6c",
                "X-UserId": "0f04d16a175c411e",
                "X-HomeGeographicRegion": "zh-Hans-CN",
                "X-ClientTraceId": clientId,
                "X-MT-Signature": await sign(endpointUrl),
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0",
                "Content-Type": "application/json; charset=utf-8",
                "Content-Length": "0",
                "Accept-Encoding": "gzip",
            },
        });

        if (!response.ok) {
            throw new Error(`获取 Endpoint 失败: ${response.status}`);
        }

        const data = await response.json();
        const jwt = data.t.split(".")[1];
        const decodedJwt = JSON.parse(atob(jwt));

        tokenInfo = {
            endpoint: data,
            token: data.t,
            expiredAt: decodedJwt.exp,
        };

        return data;
    } catch (error) {
        console.error("获取 Endpoint 失败:", error);
        if (tokenInfo.token) {
            console.log("使用过期的缓存 Token");
            return tokenInfo.endpoint;
        }
        throw error;
    }
}

// 签名
async function sign(urlStr) {
    const url = urlStr.split("://")[1];
    const encodedUrl = encodeURIComponent(url);
    const uuidStr = uuid();
    const formattedDate = dateFormat();
    const bytesToSign = `MSTranslatorAndroidApp${encodedUrl}${formattedDate}${uuidStr}`.toLowerCase();
    const decode = await base64ToBytes("oik6PdDdMnOXemTbwvMn9de/h9lFnfBaCWbGMMZqqoSaQaqUOqjVGm5NqsmjcBI1x+sS9ugjB55HEJWRiFXYFw==");
    const signData = await hmacSha256(decode, bytesToSign);
    const signBase64 = await bytesToBase64(signData);
    return `MSTranslatorAndroidApp::${signBase64}::${formattedDate}::${uuidStr}`;
}

// 格式化日期
function dateFormat() {
    return (new Date()).toUTCString().replace(/GMT/, "").trim() + " GMT";
}

// HMAC SHA-256 签名
async function hmacSha256(key, data) {
    const cryptoKey = await crypto.subtle.importKey(
        "raw",
        key,
        { name: "HMAC", hash: { name: "SHA-256" } },
        false,
        ["sign"]
    );
    const signature = await crypto.subtle.sign("HMAC", cryptoKey, new TextEncoder().encode(data));
    return new Uint8Array(signature);
}

// Base64 转字节数组
async function base64ToBytes(base64) {
    const binaryString = atob(base64);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
        bytes[i] = binaryString.charCodeAt(i);
    }
    return bytes;
}

// 字节数组转 Base64
async function bytesToBase64(bytes) {
    return btoa(String.fromCharCode.apply(null, bytes));
}

// 生成 UUID
function uuid() {
    return crypto.randomUUID().replace(/-/g, "");
}

// 获取 Web UI
function getWebUI() {
    return `