MedASR - Google's open source medical speech recognition model

Latest AI Resources4mos agorelease AI Sharing Circle

31.6K 00

What is MedASR?

MedASR is a 105 million parameter medical speech recognition model open-sourced by Google, fine-tuned on 5000 hours of desensitized clinical corpus, optimized for drug, dosage, and anatomical terminology, with a built-in 6-gram medical language model, and a word error rate of only 4.6% on the private radiology dataset RAD-DICT, which is about 60% lower than Whisper v3 Large. The model adopts the Conformer architecture, which can be fine-tuned by a single consumer-grade GPU, supports 16kHz mono input, and provides one-click download of Hugging Face, online deployment of Vertex AI, and local fine-tuning notebook, which follows the Google Health AI compliance terms, and the output needs to be manually reviewed, making it a good choice for the current healthcare scenario. It is the preferred ASR solution for current medical scenarios, taking into account both accuracy and ease of use.

MedASR's functional features

Medical-specific lightweight models: 105 million parameter Conformer architecture, fine-tunable on a single consumer GPU, 16kHz mono input, streaming/batch inference latency below 300ms.
Precise medical vocabulary recognition: Built-in 6-gram medical language model, fine-tuned on 5,000 hours of desensitized clinical speech (radiology, internal medicine, family doctor), with significant improvement in drug name, dose, and anatomical term recognition accuracy.
Leading recognition accuracy: The private radiology dataset RAD-DICT has a word error rate of only 4.61 TP3T, which is about 601 TP3T lower compared to Whisper v3 Large, and is firmly at the forefront of healthcare ASR.
Zero Threshold Open Source Experience: Weighted hosting Hugging Face, 5 lines of code local inference; official Colab notebook, one-click audition effect, no need to configure complex environment.
One-click deployment in the cloud: Highly available online services are released directly through Vertex AI Model Garden, with automatic elastic scaling to meet the hospital's high concurrency and low latency needs.
Privatization fine-tuning support: Open source comes with fine-tuning notebook, hospitals can use their own data to continue training, the entire offline operation, to protect patient privacy and data security.
Compliance security framework: Follow the Google Health AI Developer Foundations protocol, which explicitly prohibits direct clinical decision-making and requires output to be reviewed by a professional to reduce medical risk.

MedASR's core strengths

Extreme Lightweight: 105 million parameter Conformer, fine-tuning can be done on a single consumer GPU with inference latency below 300ms.
Data Deep Dive: Based on 5,000 hours of desensitized medical speech special training, covering real-life scenarios in multiple departments such as radiology, internal medicine, and family doctors.
leading precision: The word error rate on the private radiology test set RAD-DICT is only 4.61 TP3T, a reduction of about 601 TP3T compared to Whisper v3 Large, which is among the highest in the industry.
lexical specialization: Built-in 6-gram medical language model, drug name, dosage, and anatomical terminology recognition accuracy is significantly improved.
Input Friendly: Supports 16kHz mono waveforms, streaming and batch inference with one-click switching, without complex pre- and post-processing.

What is the official website for MedASR

Project website:: https://developers.google.com/health-ai-developer-foundations/medasr
GitHub repository:: https://github.com/google-health/medasr
HuggingFace Model Library:: https://huggingface.co/google/medasr

Who MedASR is for

Hospital Information Section: Need to quickly go live with a high-accuracy voice entry system that reduces the burden of physician keyboarding and improves the timeliness of medical record completion.
clinician: Practitioners in radiology, internal medicine, and family physicians dictate examination reports, prescriptions, and medical records in pursuit of a low rate of typos.
Healthcare AI Startup Teams: Lack of self-research ASR capability, and want to secondary develop products based on open-source models for vertical scenarios such as image reports, surgery records, and so on.
Remote Consultation Platform: The need to transcribe doctor-patient conversations into structured text in real time for subsequent QA, search and big data analysis.
Medical education researchers: Using high-quality medical speech transcription results to build knowledge graphs, train downstream NLP models, or conduct speech data mining research.