General Introduction
Deepgram is a company focused on speech recognition and natural language processing technologies, offering powerful Speech-to-Text and Text-to-Speech APIs.The platform utilizes advanced artificial intelligence technologies to help developers integrate speech transcription and comprehension capabilities into their applications and services. Deepgram's solutions are used in a wide range of fields, including medical transcription, automated customer service, podcast transcription, and more, and are dedicated to improving the efficiency and experience of human-computer interaction.
Function List
- Speech-to-Text (Speech-to-Text): Provides high-precision, low-latency speech-to-text services that support multiple languages and accents.
- Text-to-Speech (TTS): Generates natural and smooth speech output for real-time AI and high-throughput applications.
- Audio Intelligence (AI): Provides audio analysis and comprehension capabilities to help organizations analyze audio data at scale.
- Voice Agent API (Voice Agent API): A unified speech API that supports natural human-machine dialog for a variety of automation application scenarios.
Using Help
Installation and use
- register an account: Visit Deepgram's official website and sign up for a new account.
- Getting the API key: After logging into your account, get the API key in the console.
- Integrated API::
- Speech to text (STT)::
Python
import requests url = "https://api.deepgram.com/v1/listen" headers = { "Authorization": "Token YOUR_API_KEY", "Content-Type": "application/json" } data = { "url": "https://path.to/your/audio/file.wav" } response = requests.post(url, headers=headers, json=data) print(response.json())
- Text-to-speech (TTS)::
Python
import requests url = "https://api.deepgram.com/v1/speak" headers = { "Authorization": "Token YOUR_API_KEY", "Content-Type": "application/json" } data = { "text": "Hello, this is a test.", "voice": "en_us_male" } response = requests.post(url, headers=headers, json=data) with open("output.wav", "wb") as f. f.write(response.content)
- Speech to text (STT)::
- Real-Time Speech Processing: Real-time speech recognition using WebSocket connections.
Python
import websocket import json def on_message(ws, message): print(json.loads(message)) ws = websocket.WebSocketApp( "wss://api.deepgram.com/v1/listen", header={"Authorization": "Token YOUR_API_KEY"}, on_message=on_message ) ws.run_forever()
Speech-to-Text User Guide
- Integrated API: Integrate Deepgram's Speech-to-Text API in your application. you can refer to the sample code in the official documentation for integration.
- Uploading audio files: Upload audio files to be transcribed via API, support multiple audio formats.
- Get Transcription Results: The API returns transcribed text results that you can further process and display in your application.
Text-to-Speech User's Guide
- Integrated API: Integrate Deepgram's Text-to-Speech API in your application.
- input text: Enter text content to be converted to speech via the API.
- Getting Voice Output: The API returns the generated speech file, which you can play or store in your application.
Audio Intelligence User's Guide
- Integrated API: Integrate Deepgram's Audio Intelligence API in your application.
- Uploading audio files: Upload audio files to be analyzed through the API.
- Get analysis results: The API returns audio analysis results, including sentiment analysis, keyword extraction, and other information.
Voice Agent API (Voice Agent API) User Guide
- Integrated API: Integrate Deepgram's Voice Agent API in your application.
- Configuring the dialog model: Configure the appropriate dialog model according to the application scenario.
- Realization of man-machine dialogue: Enable natural and smooth human-machine dialog through APIs to enhance user experience.
Sign up and get a 200 knife credit to call the full range of APIs.