MedASR - Google's open source medical speech recognition model

堆友AI

What is MedASR?

MedASR is a 105 million parameter medical speech recognition model open-sourced by Google, fine-tuned on 5000 hours of desensitized clinical corpus, optimized for drug, dosage, and anatomical terminology, with a built-in 6-gram medical language model, and a word error rate of only 4.6% on the private radiology dataset RAD-DICT, which is about 60% lower than Whisper v3 Large. The model adopts the Conformer architecture, which can be fine-tuned by a single consumer-grade GPU, supports 16kHz mono input, and provides one-click download of Hugging Face, online deployment of Vertex AI, and local fine-tuning notebook, which follows the Google Health AI compliance terms, and the output needs to be manually reviewed, making it a good choice for the current healthcare scenario. It is the preferred ASR solution for current medical scenarios, taking into account both accuracy and ease of use.

MedASR - 谷歌开源的医疗语音识别模型

MedASR's functional features

  • Medical-specific lightweight models: 105 million parameter Conformer architecture, fine-tunable on a single consumer GPU, 16kHz mono input, streaming/batch inference latency below 300ms.
  • Precise medical vocabulary recognition: Built-in 6-gram medical language model, fine-tuned on 5,000 hours of desensitized clinical speech (radiology, internal medicine, family doctor), with significant improvement in drug name, dose, and anatomical term recognition accuracy.
  • Leading recognition accuracy: The private radiology dataset RAD-DICT has a word error rate of only 4.61 TP3T, which is about 601 TP3T lower compared to Whisper v3 Large, and is firmly at the forefront of healthcare ASR.
  • Zero Threshold Open Source Experience: Weighted hosting Hugging Face, 5 lines of code local inference; official Colab notebook, one-click audition effect, no need to configure complex environment.
  • One-click deployment in the cloud: Highly available online services are released directly through Vertex AI Model Garden, with automatic elastic scaling to meet the hospital's high concurrency and low latency needs.
  • Privatization fine-tuning support: Open source comes with fine-tuning notebook, hospitals can use their own data to continue training, the entire offline operation, to protect patient privacy and data security.
  • Compliance security framework: Follow the Google Health AI Developer Foundations protocol, which explicitly prohibits direct clinical decision-making and requires output to be reviewed by a professional to reduce medical risk.

MedASR's core strengths

  • Extreme Lightweight: 105 million parameter Conformer, fine-tuning can be done on a single consumer GPU with inference latency below 300ms.
  • Data Deep Dive: Based on 5,000 hours of desensitized medical speech special training, covering real-life scenarios in multiple departments such as radiology, internal medicine, and family doctors.
  • leading precision: The word error rate on the private radiology test set RAD-DICT is only 4.61 TP3T, a reduction of about 601 TP3T compared to Whisper v3 Large, which is among the highest in the industry.
  • lexical specialization: Built-in 6-gram medical language model, drug name, dosage, and anatomical terminology recognition accuracy is significantly improved.
  • Input Friendly: Supports 16kHz mono waveforms, streaming and batch inference with one-click switching, without complex pre- and post-processing.

What is the official website for MedASR

  • Project website:: https://developers.google.com/health-ai-developer-foundations/medasr
  • GitHub repository:: https://github.com/google-health/medasr
  • HuggingFace Model Library:: https://huggingface.co/google/medasr

Who MedASR is for

  • Hospital Information Section: Need to quickly go live with a high-accuracy voice entry system that reduces the burden of physician keyboarding and improves the timeliness of medical record completion.
  • clinician: Practitioners in radiology, internal medicine, and family physicians dictate examination reports, prescriptions, and medical records in pursuit of a low rate of typos.
  • Healthcare AI Startup Teams: Lack of self-research ASR capability, and want to secondary develop products based on open-source models for vertical scenarios such as image reports, surgery records, and so on.
  • Remote Consultation Platform: The need to transcribe doctor-patient conversations into structured text in real time for subsequent QA, search and big data analysis.
  • Medical education researchers: Using high-quality medical speech transcription results to build knowledge graphs, train downstream NLP models, or conduct speech data mining research.
© Copyright notes

Related articles

No comments

You must be logged in to leave a comment!
Login immediately
none
No comments...