Exploring composite dataset biases for heart sound classification

Davoud Shariat Panah, Andrew Hines, Susan McKeever

Research output: Contribution to journalConference articlepeer-review

5 Citations (Scopus)

Abstract

In the last few years, the automatic classification of heart sounds has been widely studied as a screening method for heart disease. Some of these studies have achieved high accuracies in heart abnormality prediction. However, for such models to assist clinicians in the detection of heart abnormalities, it is of critical importance that they are generalisable, working on unseen real-world data. Despite the importance of generalisability, the presence of bias in the leading heart sound datasets used in these studies has remained unexplored. In this paper, we explore the presence of potential bias in heart sound datasets. Using a small set of spectral features for heart sound representation, we demonstrate experimentally that it is possible to detect sub-datasets of PhysioNet, the leading dataset of the field, with 98% accuracy. We also show that sensors which have been used to capture recordings of each dataset are likely the main cause of the bias in these datasets. Lack of awareness of this bias works against generalised models for heart sound diagnostics. Our findings call for further research on the bias issue in heart sound datasets and its impact on the generalisability of heart abnormality prediction models.

Original languageEnglish
Pages (from-to)145-156
Number of pages12
JournalCEUR Workshop Proceedings
Volume2771
Publication statusPublished - 2020
Event28th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2020 - Dublin, Ireland
Duration: 7 Dec 20208 Dec 2020

Keywords

  • Bias
  • Heart Sound
  • Machine Learning
  • PhysioNet Dataset

Fingerprint

Dive into the research topics of 'Exploring composite dataset biases for heart sound classification'. Together they form a unique fingerprint.

Cite this