General Introduction
OpenAI Realtime Agents is an open source project that aims to show how OpenAI's real-time APIs can be utilized to build multi-intelligent body speech applications. It provides an advanced intelligent body model (borrowed from OpenAI Swarm) that allows developers to build complex multi-intelligent body speech systems in a short time. The project shows by example how to perform sequential handoffs between intelligences, contextual boosting to a smarter model, and how to have the model follow a state machine for tasks such as confirming user information character by character. This is a valuable resource for developers who want to rapidly prototype multi-intelligent body real-time speech applications.
OpenAI provides a reference implementation for building and orchestrating intelligent patterns using real-time APIs. You can use this repository to prototype a speech application using a multi-intelligent body process in less than 20 minutes! Building with real-time APIs can be complicated because of the low-latency, synchronous nature of voice interaction. This repository includes best practices we've learned to manage this complexity.
Function List
- Intelligent Body Sequence Handover: Allows sequential handover of intelligences based on predefined intelligences graphs.
- Background Enhancement: It is possible to upgrade the task to more advanced models (e.g., o1-mini) dealing with high-stakes decisions.
- state machine processing: Accurately collect and validate information, such as user names and phone numbers, by prompting the model to follow a state machine.
- Rapid Prototyping: Provides tools to quickly build and test multi-intelligence real-time speech applications.
- Configuration Flexibility: Users can configure their own intelligent body behavior and interaction flow.
Using Help
Installation and Configuration
- clone warehouse::
git clone https://github.com/openai/openai-realtime-agents.git cd openai-realtime-agents
- Environment Configuration::
- Make sure you have Node.js and npm installed.
- utilizationnpm installInstall all necessary dependency packages.
- Starting the Local Server::
npm start
This will start a local server that you can access in your browser by visiting thehttp://localhost:3000View app.
Guidelines for use
Browse and select intelligences::
- Open your browser and navigate tohttp://localhost:3000**. **
- You'll see an interface with a "Scenario" drop-down menu and an "Agent" drop-down menu that allows you to select different scenarios of intelligences and specific intelligences.
interactive experience::
- Select Scene: Select a predefined scenario in the "Scenario" menu, e.g. "simpleExample" or "customerServiceRetail ".
- Choosing a Smart Body: In the "Agent" menu, select the intelligence you want to start with, e.g. "frontDeskAuthentication" or "customerServiceRetail". customerServiceRetail".
- Starting a conversation: Start interacting with an intelligent body by entering text through the interface or directly through voice input (if supported). The Intelligence will respond to your input and may redirect you to another Intelligence for more complex tasks.
Detailed operation of functions
- sequential handover: When you need to hand over from one Intelligence to another, for example, from front desk authentication to after-sales service, the system handles this transfer automatically. Ensure that the configuration of each intelligent body is correctly defined in thedownstreamAgentsThe
- Background Enhancement: When dealing with complex or high-risk tasks, the intelligences can be automatically promoted to more powerful models for processing. For example, the system invokes the o1-mini model when detailed verification of a user's identity or processing of a return is required.
- state machine processing: For tasks that require character-by-character confirmation, such as entering personal information, the smart body will guide the user step-by-step through a state machine to ensure that each character or piece of information is correct. The user receives real-time feedback during the input process, such as "Please confirm that your last name is X".
- Configuring Intelligent Bodies: You can find configuration files for intelligences in the src/app/agentConfigs/ directory. By editing these files, you can change the behavior of intelligences, add new intelligences, or adjust the logic of existing intelligences.
Developer Tips
- To extend or modify the behavior of intelligences, it is recommended to first study the existingagentConfigsfile, and then pass theagent_transferTools to enable handover between intelligences.
- All interactions and state changes between intelligences are displayed in the "Conversation Transcript" section of the UI for easy debugging and improvement.
With these steps and features detailed, you can quickly get started and build your own multi-intelligence body voice interaction application with OpenAI Realtime Agents.