Skip to main navigation Skip to search Skip to main content

Synthesising Cross-Speaker Data for Low-Resource Pathological Speech Recognition with PEFT

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Dysarthric speech recognition is essential for enhancing communication and accessibility for individuals with speech impairments, yet its development is hindered by a scarcity of robust, speaker-specific datasets. This study explores low-resource dysarthric speech recognition through cross-speaker transfer using synthetic data and parameter-efficient fine-tuning (PEFT). We integrate SpeechT5 text-to-speech (TTS) synthesis with x-vector speaker embeddings to generate speaker-specific dysarthric speech, enabling model adaptation while preserving pathological speech characteristics such as prosodic irregularities. Experiments on the TORGO dataset show that mixed cross-synthetic data with LoRA fine-tuning achieves a WER of 0.17, representing a 71.7% improvement over the standard model (0.60 WER) without fine-tuning the TTS model. However, cross-dataset generalisation remains challenging, yielding higher WERs on MINDS-14 (4.69) and AMI (0.96–3.83) datasets. Whilst synthetic data enhances in-domain recognition, further research is needed to improve cross-dataset generalisation and speaker adaptation, particularly for low-resource pathological speech settings.

Original languageEnglish
Title of host publicationText, Speech, and Dialogue - 28th International Conference, TSD 2025, Proceedings
EditorsKamil Ekštein, Miloslav Konopík, Ondrej Pražák, František Pártl
PublisherSpringer Science and Business Media Deutschland GmbH
Pages182-193
Number of pages12
ISBN (Print)9783032025470
DOIs
Publication statusPublished - 2026
Event28th International Conference on Text, Speech, and Dialogue, TSD 2025 - Erlangen, Germany
Duration: 25 Aug 202528 Aug 2025

Publication series

NameLecture Notes in Computer Science
Volume16029 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Text, Speech, and Dialogue, TSD 2025
Country/TerritoryGermany
CityErlangen
Period25/08/2528/08/25

Keywords

  • Cross-Speaker Transfer
  • Dysarthric Speech Recognition
  • Parameter-Efficient Fine-Tuning
  • Synthetic Data Generation

Fingerprint

Dive into the research topics of 'Synthesising Cross-Speaker Data for Low-Resource Pathological Speech Recognition with PEFT'. Together they form a unique fingerprint.

Cite this