Vexa: a real-time meeting transcription and intelligent knowledge extraction tool

Latest AI Resources4mos agorelease AI Sharing Circle

1.2K 00

General Introduction

Vexa is an open source real-time meeting transcription and knowledge management platform designed to provide efficient meeting recording and intelligent knowledge extraction services for enterprises and individuals. It automatically joins Google Meet, Zoom and other platforms through API-driven meeting robots, transcribes voice to text in real-time, and supports 99 languages.Vexa utilizes a microservices architecture with high scalability, making it suitable for handling a large number of concurrent transcription tasks. It emphasizes enterprise-grade data security and offers local deployment options to ensure compliance. Currently in closed beta and available for free through the official website, Vexa aims to be an enterprise-grade alternative to recall.ai, combining high performance with rich functionality.

Function List

Real-time conference transcription: Automatically join Google Meet, Zoom, Microsoft Teams meetings and transcribe speech to text in real time.
Multi-language support: Support for transcription in 99 languages for global teams.
conference robot: Simplify operations by controlling bots to join meetings via an API.
knowledge extraction: Using RAG (Retrieval Augmented Generation) technology, key information is extracted from transcripts to generate a searchable knowledge base.
Enterprise Security: Supports local deployment, protects data privacy, and meets compliance needs.
high scalability: Microservices architecture to support massively concurrent transcription tasks.
direct streaming: Support for capturing audio directly from web pages or mobile apps (in development).
open source contribution: Developers can participate in development and extend functionality through GitHub.

Using Help

Installation and Deployment

Vexa is an open source project suitable for local deployment by users or organizations with technical skills. Below is the detailed installation process:

clone warehouse
Open a terminal and run the following command to clone the Vexa repository:
```
git clone https://github.com/Vexa-ai/vexa.git
cd vexa
```
Initializing Submodules
Vexa uses Git submodules to manage dependencies (such as services/vexa-bot and services/WhisperLive). Run:
```
make submodules
```
Configuring Environment Variables
Create and edit the environment configuration file:
```
make env
```
Set parameters in the .env file, such as ADMIN_API_TOKEN (Administrator API Key). Adjust the Whisper Model path or database configuration.
Download Whisper Model
Vexa uses the Whisper model for voice transcription. Run the following command:
```
make download-model
```
The model will be stored in the . /hub directory and mounted to the WhisperLive container.

Building a Conference Robot Mirror
Building Docker images for Vexa robots:

docker build -t vexa-bot:latest -f services/vexa-bot/core/Dockerfile ./services/vexa-bot/core

Starting services
Use Docker Compose to build and run the service:
```
docker compose build
docker compose up -d
```
When the service starts, the API gateway runs at http://localhost:8056 and the management interface at http://localhost:8057.

Core Function Operation

Real-time conference transcription

The core feature of Vexa is the real-time transcription of meeting voice through a meeting robot. The procedure is described below:

Requesting an API Key
Visit https://api.dev.vexa.ai/pricing to request an API key for closed testing. After submitting the request, get the X-API-Key.

Send a robot to join a meeting
Use an API request to have the bot join a meeting. For example, join a Google Meet:

curl -X POST https://gateway.dev.vexa.ai/bots \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_CLIENT_API_KEY" \
-d '{"native_meeting_id": "xxx-xxxx-xxx", "platform": "google_meet"}'

On success, JSON is returned containing the meeting_id and robot status.

Access to transcription data
Use the session ID to obtain transcripts:

curl -H "X-API-Key: YOUR_CLIENT_API_KEY" \
https://gateway.dev.vexa.ai/transcripts/google_meet/xxx-xxxx-xxx

Return to Example:

{
"data": {
"meeting_id": "meet_abc123",
"transcripts": [
{"time": "00:01:15", "speaker": "John Smith", "text": "Let's discuss the quarterly results."},
{"time": "00:01:23", "speaker": "Sarah Johnson", "text": "The Q3 revenue exceeded our projections by 15%."}
]
}
}

Multi-language support

Vexa supports real-time transcription in 99 languages. Setup method:

Specify the language in the .env file, e.g. LANGUAGE=en.
Add a language parameter to the API request:
```
curl -X POST -H "Content-Type: application/octet-stream" \
-d '{"language": "es"}' \
http://localhost:8033/
```
The system will automatically transcribe the meeting in the specified language.

knowledge extraction

Vexa's RAG Functions extract key information from transcripts to generate a structured knowledge base:

View Knowledge Base
The generated knowledge entries are accessed through the management interface (http://localhost:8057) or an API.
Search Information
Search the knowledge base using keywords and RAG will return relevant minutes and context.
Export data
Export knowledge entries via the API to JSON or CSV format for analysis or archiving.

Direct streaming (under development)

Vexa plans to support capturing audio directly from web or mobile applications. Users will upload audio streams via an SDK or API and the system will transcribe them in real time. This feature is expected to go live in 2025.

Other Functions

Enterprise Security: Locally deployed segregated data, with management interfaces protected using X-Admin-API-Key. Enterprises can configure access rights based on compliance needs.
high scalability: Microservices architecture automatically assigns tasks. Without manual intervention, the system can handle thousands of concurrent transcriptions.
Community Contributions: Visit https://github.com/Vexa-ai/vexa for CONTRIBUTING.md. Developers can discuss tasks or submit code via Discord (https://discord.gg/Ga9duGkVz9).

caveat

hardware requirement: NVIDIA GPU-equipped servers are recommended, with 16GB of RAM and 4-core CPUs.
Update Maintenance: Run git pull and docker compose up --build periodically to get the latest features.
closed test: API access requires a key, and there are a limited number of test slots available.
development progress: Speaker recognition is in development, and Microsoft Teams and Zoom bots are expected to go live in April and May 2025, respectively.

application scenario

Conference on Multinational Enterprises
Multinational teams use Vexa to transcribe multilingual meetings, translate into English in real time, extract decision points, and generate a searchable knowledge base for easy global collaboration.
project management
The development team records technical meetings, and Vexa extracts task assignments and timelines, generating automated reports and reducing manual organization.
Customer Support Optimization
The customer service team transcribes customer calls, extracts common problems and solutions, and builds a knowledge base to improve response speed and consistency.
Academic Research Record
Researchers record interviews or workshops, and Vexa transcribes and analyzes the content, generating structured data to assist in the writing of papers.

QA

What platforms does Vexa support?
Current support for Google Meet, Microsoft Teams and Zoom bots is expected to go live in 2025.
How do I request a test key?
Visit https://api.dev.vexa.ai/pricing to submit a request for a free test X-API-Key.
What resources are required for local deployment?
Recommended servers with NVIDIA GPUs, minimum 16GB RAM and 4-core CPUs.
Does Vexa support real-time translation?
Currently 99 languages are supported for transcription, with real-time translation scheduled to go live in 2025.
How can I participate in the development?
Join Discord (https://discord.gg/Ga9duGkVz9), check out CONTRIBUTING.md, and submit the Pull Request.