Towards an Accurate Domain-Specific ASR: Transcription for Pathology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A known problem with Automatic Speech Recognition (ASR) systems is their struggle to recognise specific medical terms. Biopsy medical terms are not only rare, but also require knowing how to pronounce them correctly, which is challenging for both medical and non-medical people. Additionally, because of the sensitivity of the content and the preservation of privacy, it is preferable to utilise ASR systems that work on a hospital’s internal infrastructure. This study evaluated state-of-the-art open-source ASR systems using anonymised audio recordings of biopsy examinations conducted by pathologists. We assessed the performance of models suitable for deployment within hospital infrastructure, including various sizes of OpenAI’s Whisper models, and compared them with Meta’s Wav2vec 2.0 model. Additionally, we investigated two approaches for adapting these models under limited data conditions: providing contextual input to Whisper and fine-tuning both Whisper and Wav2vec 2.0. Finally, we examined the models’ ability to recognise medical terminology used in pathology reports, focusing on two categories: anatomical and pathology terms. Our findings indicate that providing contextual information to Whisper models significantly improves both the overall average word error rate (WER) and the term error rate (TER), with reductions ranging from 17% to 48% compared to default and fine-tuned models. The best overall performance was demonstrated by the Whisper large-v2 model, which achieved an average WER of 0.06.

Original languageEnglish
Title of host publicationText, Speech, and Dialogue - 28th International Conference, TSD 2025, Proceedings
EditorsKamil Ekštein, Miloslav Konopík, Ondrej Pražák, František Pártl
PublisherSpringer Science and Business Media Deutschland GmbH
Pages309-318
Number of pages10
ISBN (Print)9783032025470
DOIs
Publication statusPublished - 2026
Event28th International Conference on Text, Speech, and Dialogue, TSD 2025 - Erlangen, Germany
Duration: 25 Aug 202528 Aug 2025

Publication series

NameLecture Notes in Computer Science
Volume16029 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Text, Speech, and Dialogue, TSD 2025
Country/TerritoryGermany
CityErlangen
Period25/08/2528/08/25

Keywords

  • Automatic Speech Recognition
  • Medical Terminology
  • Pathology Reports
  • Wav2vec
  • Whisper

Fingerprint

Dive into the research topics of 'Towards an Accurate Domain-Specific ASR: Transcription for Pathology'. Together they form a unique fingerprint.

Cite this