AI Personal Learning
and practical guidance
Ali-painted frog

AssemblyAI: High-precision Speech-to-Text and Audio Intelligence Analysis Platform

General Introduction

AssemblyAI is a platform focused on speech AI technology, providing developers and enterprises with efficient speech-to-text and audio analysis tools. The core highlight is the Universal series of models, especially the newly released Universal-2, which is AssemblyAI's most advanced speech-to-text model to date. Universal-2 builds on the foundation of Universal-1, with more than 12.5 million hours of multi-language audio training data, and is able to accurately capture the complexity of real conversations, providing highly accurate audio data. Universal-2 builds on Universal-1 with more than 12.5 million hours of multilingual audio training data to accurately capture the complexity of real conversations and provide highly accurate audio data. Compared to Universal-1, Universal-2 improves 241 TP3T in proper noun recognition (e.g., names, brands), 211 TP3T in mixed numeric-alphabetic content (e.g., phone numbers, mailboxes), and 151 TP3T in text formatting (e.g., punctuation, capitalization), significantly reducing the "last mile" accuracy of the traditional model. "AssemblyAI opens up these cutting-edge technologies to global users through easy-to-use APIs, and has been used by Spotify, Fireflies, and other companies to build intelligent speech products covering areas such as meeting recording and content analysis.

AssemblyAI: High-precision Speech-to-Text and Audio Intelligence Analysis Platform-1


 

Function List

  • speech-to-text: Convert audio files or live audio streams to high-precision text, supporting multiple languages and multiple audio formats.
  • Speaker Detection: Automatically recognizes the identity of different speakers in the audio for multi-person conversation scenarios.
  • emotional analysis: Analyze emotional tendencies in speech, such as positive, negative or neutral, to enhance the user experience.
  • real-time transcription: Provides low-latency real-time speech-to-text functionality suitable for voice agents or live captioning.
  • Audio Intelligence Model: Includes advanced features such as content review, topic detection, keyword search, and more.
  • LeMUR framework: Processing transcribed text using large-scale language models, with support for summary generation, Q&A, and more.
  • Subtitle Generation: Supports exporting subtitle files in SRT or VTT format for easy video content creation.
  • PII Privacy: Automatically recognizes and blocks sensitive information in audio, such as names or phone numbers.

 

Using Help

AssemblyAI is a cloud-based API service that requires no local installation to access its powerful features. Here's a detailed guide to help you get started and dig deeper into its capabilities.

Registering and Getting API Keys

  1. Visit the official website: Open your browser and type https://www.assemblyai.com/, go to the home page.
  2. Register for an accountClick on "Sign Up" in the upper right corner and enter your email address and password to complete the registration process. After registering, you will be automatically entered into the Dashboard.
  3. Get the key: Find the "API Key" area in the dashboard and click "Copy" to copy the key. This is the only credential for calling the API and should be kept in a safe place.
  4. Free Trial: Free credits for new users, no need to bind payment methods immediately.

Core Function Operation

The core of AssemblyAI is its API integration. The following is an example of how to use the Universal family of models using Python. You can also use other languages (e.g. Java, Node.js) by referring to the documentation on the website.

Speech to text (Universal-2)

  • preliminary: Make sure there is an audio file (e.g. sample.mp3) or URL link.
  • Installing the SDK: Runs in the terminal:
pip install assemblyai
  • code example::
import assemblyai as aai
aai.settings.api_key = "your API key" # Replace with your key
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("sample.mp3")
print(transcript.text) # Output text such as "It's a beautiful day."
  • Universal-2 Strengths: By default, the Universal-2 model is used, which recognizes proper nouns (e.g., "Zhang Wei") and formatted numbers (e.g., "March 6, 2025") more accurately than the Universal-1 model, and is typically processed in a few seconds. It can recognize proper nouns (e.g., "Zhang Wei") and formatted numbers (e.g., "March 6, 2025") more accurately, often in seconds.

real-time transcription

  • Applicable Scenarios: Live streaming, teleconferencing, and other real-time needs.
  • code example::
    from assemblyai import RealtimeTranscriber
    import asyncio
    async def on_data(data).
    print(data.text) # realtime text output
    transcriber = RealtimeTranscriber(
    api_key="Your API key",
    sample_rate=16000, on_data=on_data
    on_data=on_data
    )
    async def start():
    await transcriber.connect()
    await transcriber.stream() # start receiving audio streams
    asyncio.run(start())
    
  • workflow: Speak into the microphone after the run and the text is displayed in real time. the Universal-2's low latency feature ensures fast and accurate results.

Speaker Detection

  • Enabling method::
    config = aai.TranscriptionConfig(speaker_labels=True)
    transcript = transcriber.transcribe("sample.mp3", config=config)
    for utterance in transcript.
    print(f "speaker {utterance.speaker}: {utterance.text}")
    
  • Examples of results::
    Speaker A: Hello, what time is the meeting today?
    Speaker B: Two o'clock in the afternoon.
    
  • draw attention to sth.: Universal-2 performs more consistently in multi-person conversations and reduces confusion.

emotional analysis

  • Enabling method::
    config = aai.TranscriptionConfig(sentiment_analysis=True)
    transcript = transcriber.transcribe("sample.mp3", config=config)
    for result in transcript.sentiment_analysis: print(f "sample.mp3", config=config)
    print(f "Text: {result.text}, sentiment: {result.sentiment}")
    
  • Examples of results::
    Text: I really like this product, Sentiment: POSITIVE
    Text: Service is a bit slow, Sentiment: NEGATIVE
    

Subtitle Generation

  • operating code::
    transcript = transcriber.transcribe("sample.mp3")
    with open("captions.srt", "w") as f.
    f.write(transcript.export_subtitles_srt())
    
  • in the end: Generate .srt file, which can be directly imported into video editing software.

Features: LeMUR Framework

  • Function Introduction: LeMUR combines large-scale language models to process transcription results, e.g. to generate summaries.
  • procedure::
    1. Obtain a transcript ID:
      transcript = transcriber.transcribe("sample.mp3")
      transcript_id = transcript.id
      
    2. Generate a summary:
      from assemblyai import Lemur
      lemur = Lemur(api_key="your API key")
      summary = lemur.summarize(transcript_id)
      print(summary.response)
      
    3. Sample Output: "Progress on the project was discussed at the meeting and it is scheduled to be completed next week."

caveat

  • Supported formats: Compatible with 33 audio/video formats such as MP3, WAV, etc.
  • Language Settings: 99+ languages are supported and can be accessed via language_code="zh" Specify Chinese.
  • billing: Billed per audio hour, see the official website for pricing.

By following the steps above, you can fully utilize the powerful features of Universal-2 to build efficient voice applications.

CDN1
May not be reproduced without permission:Chief AI Sharing Circle " AssemblyAI: High-precision Speech-to-Text and Audio Intelligence Analysis Platform

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish