AI Personal Learning
and practical guidance
TRAE

OpenAI.fm: an interactive demo tool showcasing the OpenAI speech APIs

General Introduction

openai-fm is an open source project hosted on GitHub dedicated to demonstrating the capabilities of the OpenAI Text-to-Speech (TTS) API. This project allows developers to visualize OpenAI's speech generation capabilities through an interactive web application. It was developed using the NextJS framework, combined with TailwindCSS and ShadcnUI to create a clean and modern interface. Users can enter text , select different voice and emotional style to generate high-quality voice output. The project code is completely open source , follow the MIT license , developers are encouraged to clone , modify and contribute to the code . openai-fm is suitable for developers to quickly understand and test the OpenAI speech API , especially suitable for application development scenarios that require speech functions .

OpenAI Releases New Generation of Audio Modeling APIs, Speech Interaction Technology Gets Major Upgrade-1

Demo address: https://www.openai.fm/


 

Function List

  • Text-to-Speech Conversion: Convert the input text into natural and smooth speech.
  • Multiple Voice Options: Provides multiple voice options to meet the needs of different scenarios.
  • Emotional style control: supports adjusting the emotional tone of the voice, such as friendly, serious, etc.
  • Real-time Interactive Presentation: Generate and playback voice in real time through a web interface.
  • Database Sharing Function: Supports connecting to PostgreSQL database to save and share the generated speech.
  • Open Source Support: Provides full source code, allowing developers to customize and extend functionality.

 

Using Help

Installation process

To use openai-fm, you first need to clone the project and configure the environment. Here are the detailed steps:

  1. Getting the API key
    Visit the OpenAI website and register or login to your account. In your account dashboard, navigate to the API Key Management page and click on "Create a new key" to generate and save your OPENAI_API_KEYThis key is used to call the OpenAI Speech API. This key is used to call OpenAI's speech API. note: the key needs to be kept secret to avoid disclosure.
  2. clone warehouse
    Open a terminal and run the following command to clone the openai-fm repository:

    git clone https://github.com/openai/openai-fm.git

Go to the project catalog:

cd openai-fm
  1. Setting environment variables
    You can set it up in two ways OPENAI_API_KEY::

    • global setting: Add the following to your system environment variables OPENAI_API_KEYThe
      • Linux/MacOS Example:
        export OPENAI_API_KEY=<你的API密钥>
        
      • Windows users can add environment variables in the system settings.
    • Setting within the project: Create the .env Documentation, reference .env.example, add the following:
      OPENAI_API_KEY=<你的API密钥>
      
  2. Installation of dependencies
    The project uses Node.js and npm to manage dependencies. Make sure you have Node.js installed (recommended version 16 or higher). Run it from the project root directory:

    npm install
    

    This will install the necessary dependencies such as NextJS, TailwindCSS, ShadcnUI and so on.

  3. (Optional) Configuration database
    If you need to use the sharing feature, you need to connect to the PostgreSQL database. You can find a link to the PostgreSQL database in the .env file to add database connection information, refer to the .env.example::

    POSTGRES_URL="postgresql://用户名:密码@主机:端口/数据库名"
    

    Ensure that the PostgreSQL service is running and that the appropriate databases are created. If you are not using the sharing feature, you can skip this step.

  4. Running Projects
    After the installation is complete, run the following command to start the development server:

    npm run dev
    

    Open your browser and visit http://localhost:3000You can see the interactive interface of openai-fm.

Main Functions

The core of openai-fm is the interactive text-to-speech demo. Below is the detailed operation flow:

  • input text
    Enter the text you want to convert to speech in the text box of the web interface. Multi-line text is supported, suitable for long dialogs or scripts. Example:

    你好!这是一个测试,展示如何将文本转为自然语音。
    
  • Selecting Voice and Emotion
    The interface provides drop-down menus that list the available voice options (e.g., male, female) and emotional styles (e.g., friendly, serious). These options are based on the data/voices.json cap (a poem) data/vibes.json File Configuration. After selecting it, click the "Generate" button and the system will call the OpenAI Speech API to generate the audio.
  • Playback and download
    The generated audio is automatically played on the page. You can also download the audio file, which is saved in WAV format by default and stored in the project directory in the output/ folder, with file names starting with openaifm_ Beginning and timestamped.
  • Share function
    If a PostgreSQL database is configured, the generated voice can be saved to the database and a sharing link can be generated. Clicking the "Share" button will return an accessible URL where other users can view and play your voice.

Developer Customization

openai-fm is an open source project , developers can modify the code as needed . For example:

  • Add new voice:: Editorial data/voices.json, adding new voice configurations.
  • Adjustment of the interface: Modify NextJS components (e.g. pages/index.js) or TailwindCSS styles.
  • Extended functionality: Add new API calls or integrate other services.

To contribute code, fork the repository, create a branch, and submit a pull request; read the project's contribution guidelines before committing to make sure the code is compliant. [](https://github.com/openai/openai-fm)[](https://github.com/fairy-root/ComfyUI-OpenAI-FM)

caveat

  • API fees: Use of the OpenAI Speech API incurs a fee, depending on usage. Please monitor your API quota in the OpenAI dashboard.
  • safety: If deployed to a public server, ensure that .env file is not made public to prevent API key leakage.
  • Community Support: If you encounter problems, you can submit an issue on GitHub and the community will help.

 

application scenario

  1. Developers test the Voice API
    Developers can use openai-fm to quickly test the effectiveness of the OpenAI Speech API, verify the performance of different speech and emotion styles, and optimize application integration solutions.
  2. Education and training content production
    Teachers or trainers can convert course scripts to speech to generate natural and smooth audio for online courses or instructional videos.
  3. Accessibility aids
    openai-fm generates voice readings for visually impaired users to help them access text information.
  4. Creative Content Creation
    Podcast producers or content creators can use openai-fm to generate personalized voices and quickly create demo samples.

 

QA

  1. Do I need to pay for openai-fm?
    The project itself is free, but using the OpenAI Speech API requires a valid API key and a fee based on usage. We recommend checking the official OpenAI website for pricing details.
  2. How do I add a new voice option?
    Edit the project directory under the data/voices.json file to add the new voice configuration. After restarting the server, the new voice appears in the drop-down menu.
  3. Do I have to use a database for the sharing function?
    Yes, the sharing feature requires PostgreSQL database support. If you don't configure the database, you can still generate and play speech normally.
  4. Is it possible to use openai-fm on mobile?
    The web interface of openai-fm supports responsive design and can be accessed in mobile browsers, provided that you have a stable internet connection.
May not be reproduced without permission:Chief AI Sharing Circle " OpenAI.fm: an interactive demo tool showcasing the OpenAI speech APIs
en_USEnglish