General Introduction
TEN Agent is an open source real-time multimodal intelligences framework that integrates OpenAI Realtime API and RTC to support multiple functions such as weather querying, web searching, visual processing and RAG (Retrieval Augmented Generation). The framework aims to provide high-performance, low-latency audio and video interaction solutions for complex AI application scenarios.
The second most mature real-time interactive multimodal intelligence seen so far has a very smooth voice communication process.
Function List
- Real-time multimodal interaction: Supports real-time processing and interaction of audio, video and text.
- OpenAI Realtime API Integration: Provides low-latency voice-to-voice dialog capabilities.
- RTC AI noise suppression: Noise elimination through AI algorithms to improve audio quality.
- Weather Enquiry: Integrate weather query function to provide real-time weather information.
- Internet search: Supports access to information through web searches.
- visual processing: Supports image recognition and processing functions.
- RAG Functions: Provide answers using local documents through retrieval-enhanced generation techniques.
- Multi-language support: Supports extended development in multiple programming languages such as C++, Go, Python, etc.
- Cross-platform support: Compatible with Windows, Mac, Linux and mobile devices.
Using Help
Installation process
- Preparing the environment::
- Ensure that Docker and Docker Compose are installed.
- Obtain the Agora App ID and App Certificate (if certificates are enabled in the Agora console).
- Get OpenAI API keys, as well as API keys for Deepgram ASR and FishAudio TTS.
- Configuring Environment Variables::
- In the project root directory, use the
cp .env.example .env
command to create.env
Documentation. - show (a ticket)
.env
file, fill in the required API key and configuration.
- In the project root directory, use the
- Launch Container::
- Run it in the project root directory
docker compose up
command to start the container. - Or use
docker compose up -d
command to start the container in detached mode.
- Run it in the project root directory
- Building Intelligence::
- Open a new terminal window, enter the container and build the intelligences.
- Once the build is complete, run the server on port 8080:
make run-server
The
- access interface::
- Open in your browser
localhost:3000
The TEN Agent will be used for the first time in the future. - Open another tab and visit
localhost:3001
, create, connect, and edit extensions using Graph Designer.
- Open in your browser
Function Operation Guide
- Real-time multimodal interaction::
- Low-latency voice-to-speech conversations through the integrated OpenAI Realtime API.
- Use the RTC's AI noise suppression function to ensure clear and stable audio quality.
- Weather Enquiry::
- Enter the name of the city you want to check in the interface to get real-time weather information.
- Internet search::
- Enter keywords in the search box and the system will search through the web to get relevant information.
- visual processing::
- Upload image files and the system will automatically perform image recognition and processing.
- RAG Functions::
- With retrieval-enhanced generation techniques, questions are entered and the system will provide answers using local documents.
- Multi-language support::
- Supports extended development using C++, Go, Python and other programming languages.
- Cross-platform support::
- Compatible with Windows, Mac, Linux and mobile devices, users can seamlessly use TEN Agent on different platforms.