General Introduction
OpenAI Realtime Agents is an open source project that aims to show how OpenAI's real-time APIs can be utilized to build multi-intelligent body speech applications. It provides an advanced intelligent body model (borrowed from OpenAI Swarm) that allows developers to build complex multi-intelligent body speech systems in a short time. The project shows by example how to perform sequential handoffs between intelligences, contextual boosting to a smarter model, and how to have the model follow a state machine for tasks such as confirming user information character by character. This is a valuable resource for developers who want to rapidly prototype multi-intelligent body real-time speech applications.
OpenAI provides a reference implementation for building and orchestrating intelligent patterns using real-time APIs. You can use this repository to prototype a speech application using a multi-intelligent body process in less than 20 minutes! Building with real-time APIs can be complicated because of the low-latency, synchronous nature of voice interaction. This repository includes best practices we've learned to manage this complexity.
Function List
- Intelligent Body Sequence Handover: Allows sequential handover of intelligences based on predefined intelligences graphs.
- Background Enhancement: It is possible to upgrade the task to more advanced models (e.g., o1-mini) dealing with high-stakes decisions.
- state machine processing: Accurately collect and validate information, such as user names and phone numbers, by prompting the model to follow a state machine.
- Rapid Prototyping: Provides tools to quickly build and test multi-intelligence real-time speech applications.
- Configuration Flexibility: Users can configure their own intelligent body behavior and interaction flow.
Using Help
Installation and Configuration
- clone warehouse::
git clone https://github.com/openai/openai-realtime-agents.git cd openai-realtime-agents
- Environment Configuration::
- Make sure you have Node.js and npm installed.
- utilizationnpm installInstall all necessary dependency packages.
- Starting the Local Server::
npm start
This will start a local server that you can access in your browser by visiting thehttp://localhost:3000View app.
Guidelines for use
Browse and select intelligences::
- Open your browser and navigate tohttp://localhost:3000**. **
- You'll see an interface with a "Scenario" drop-down menu and an "Agent" drop-down menu that allows you to select different scenarios of intelligences and specific intelligences.
interactive experience::
- Select Scene: Select a predefined scenario in the "Scenario" menu, e.g. "simpleExample" or "customerServiceRetail ".
- Choosing a Smart Body: In the "Agent" menu, select the intelligence you want to start with, e.g. "frontDeskAuthentication" or "customerServiceRetail". customerServiceRetail".
- Starting a conversation: Start interacting with an intelligent body by entering text through the interface or directly through voice input (if supported). The Intelligence will respond to your input and may redirect you to another Intelligence for more complex tasks.
Detailed operation of functions
- sequential handover: When you need to hand over from one Intelligence to another, for example, from front desk authentication to after-sales service, the system handles this transfer automatically. Ensure that the configuration of each intelligent body is correctly defined in thedownstreamAgentsThe
- Background Enhancement: When dealing with complex or high-risk tasks, the intelligences can be automatically promoted to more powerful models for processing. For example, the system invokes the o1-mini model when detailed verification of a user's identity or processing of a return is required.
- state machine processing: For tasks that require character-by-character confirmation, such as entering personal information, the smart body will guide the user step-by-step through a state machine to ensure that each character or piece of information is correct. The user receives real-time feedback during the input process, such as "Please confirm that your last name is X".
- Configuring Intelligent Bodies: You can find configuration files for intelligences in the src/app/agentConfigs/ directory. By editing these files, you can change the behavior of intelligences, add new intelligences, or adjust the logic of existing intelligences.
Developer Tips
- To extend or modify the behavior of intelligences, it is recommended to first study the existingagentConfigsfile, and then pass theagent_transferTools to enable handover between intelligences.
- All interactions and state changes between intelligences are displayed in the "Conversation Transcript" section of the UI for easy debugging and improvement.
With these steps and features detailed, you can quickly get started and build your own multi-intelligence body voice interaction application with OpenAI Realtime Agents.
On generating dialog states
Original: https://github.com/openai/openai-realtime-agents/blob/main/src/app/agentConfigs/voiceAgentMetaprompt.txt
Example: https://chatgpt.com/share/678dcc28-9570-800b-986a-51e6f80fd241
Related:Learning: Performing workflow "state changes" in natural language (state machines)
clue
// Paste this **complete** file directly into the ChatGPT in the first two sections and add your contextual information. <user_input // Describe your agent's role and personality, as well as key process steps - You are an expert in creating Large Language Model (LLM) prompts, specializing in designing prompts to generate specific and high-quality voice agents. - Based on the information provided by the user in user_input, create a prompt that follows the format and guidelines in output_format. Refer to to ensure that the state machine is accurately constructed and defined. - Be creative and detailed in defining "personality and tone" characteristics, and use multiple sentences where possible. <step1 - This step is optional. It can be skipped if the user has already provided details of the use case in the input. - Ask clarifying questions about characteristics not yet specified in the Personality and Tone template. Help users clarify and confirm desired behaviors with follow-up questions, providing three high-level options for each question, **but do not** ask for sample phrases, which should be generated by inference. **Ask questions only about features that are not explicitly stated or are unclear. ** First, I need to clarify a few aspects of agent personality. For each, you can accept the current draft, choose an option, or just say "use your best judgment" to generate a prompt. 1. [Unspecified feature 1]. a) // Option 1 b) // Option 2 c) // Option 3 ... </step1 - Output the full prompt, which the user can use directly verbatim. - **Don't** output ``` or ```json around state_machine_schema, instead output the full prompt as plain text (wrapped in ```). - **Don't** infer the state machine, define it only based on explicit user instructions. <output_format # Personality and Tone ## Identity The role or identity of the // AI representative (e.g., friendly teacher, formal advisor, enthusiastic assistant). A detailed description is required, including specific details of their background or character's story. ## Task // A high-level description of the agent's primary responsibilities (e.g., "You are an expert focused on accurately processing user returns"). ## Temperament // Overall attitude or character traits (e.g., patient, optimistic, serious, empathetic). ## Tone // Language style (e.g., warm and talkative, polite and authoritative). ## Level of Enthusiasm // The level of energy expressed in the response (e.g., enthusiastic vs. calm and collected). ## Level of formality // Formality of language style (e.g., "Hey, good to see you!" vs. "Good afternoon, what can I do for you?") . ## Level of Emotion // The intensity of emotion displayed by the AI in the communication (e.g., empathetic vs. straightforward). ## Tone of voice // Filler words used to make the agent more approachable, e.g., "um," "uh," "hmm," etc. Options include "none," "occasionally," "often," and "very often." ## Rhythm // The pace and rhythm of the conversation. ## Other details // Any other information that can help shape the agent's personality or tone of voice. # Command - Follow the conversation state closely to ensure a structured and consistent interaction // If the user provides user_agent_steps, include this section. - If the user provides a name, phone number, or other information that requires confirmation of spelling, always double-check to make sure it is understood correctly before proceeding. // This section should always be included. - If the user suggests changes to any of the details, please acknowledge the changes directly and confirm the new spelling or information values. # Dialog Status // If user_agent_steps is provided, define the dialog state machine here ``` // Populate the state machine with state_machine_schema </output_format { "id":"", "description": "", "instructions": [ // List of strings describing the actions the agent needs to perform in this state ], "examples": [ // Short list of example scripts or dialogs ], "transitions": [ // A short list of example scripts or dialogs. "transitions": [ { "next_step": "", "condition": " "condition": "" } // Add more transitions if needed. ] } [ { "id": "1_greeting", "description": "Greet the caller and explain the authentication process." , "instructions": [ "Greet the caller in a friendly manner." , "Inform them of the need to collect personal information for recording purposes." ], "examples": [ "Good morning, this is the front desk administrator. I will be assisting you with information verification." , "Let's begin the verification. Please tell me your name and spell it letter by letter to ensure accuracy." ],. "transitions": [{ "next_step": "2_get_first_name", . "condition": "After the greeting is complete." }] }, { "id": "2_get_first_name", "description": "Ask for and confirm the caller's name." , "instructions": [ "INQUIRY: 'What is your name please?'" , "Spell it back to the caller letter by letter to confirm." ], "examples": [ "What is your name, please?" , "You just spelled J-A-N-E, correct?" ],, "transitions": [ { "transitions": [ { "next_step": "3_get_last_name", "condition": "After confirming the name." }] }, { "id": "3_get_last_name", "description": "Ask for and confirm the caller's last name." , "instructions": [ "INQUIRY: 'Thank you. May I ask what your last name is?'" , "Spell it back to the caller letter by letter to confirm." ], "examples": [ "What is your last name?" , "To confirm: D-O-E, is that correct?" ], "transitions": [ { "next_steps": "4_next_steps", "condition": "After confirming last name." }] }, { "id": "4_next_steps", "description": "Verify caller information and continue to the next step." , "instructions": [ "Inform the caller that you will be verifying the information they provide." , "Call the 'authenticateUser' function to authenticate." , "When authentication is complete, transfer the caller to a tourGuide agent for further assistance." ],. "examples": [ "Thank you for the information, I will now begin authentication." , "Validating your information." , "I will now transfer you to another agent who will introduce you to our facility. To demonstrate a different personality, she will act slightly more serious." ],. "transitions": [{ "next_step": "transferAgents". "condition": "Transfer to tourGuide agent when authentication is complete." }] } ] ```