Abstract
Convolutional Neural Networks (CNNs) have enabled significant improvements across a number of applications in computer vision such as object detection, face recognition and image classification. An audio signal can be visually represented as a spectrogram that captures the time-varying frequency content of the signal. This paper describes how a CNN can be applied to the spectrogram of an audio signal to distinguish pathological from healthy speech. We propose a CNN structure and implement it using Keras to test the approach. A classification accuracy of over 95% is obtained in experiments on two public pathological speech datasets.
Original language | English |
---|---|
DOIs | |
Publication status | Published - 1 Jan 2019 |
Externally published | Yes |
Event | IMVIP 2019: Irish Machine Vision & Image Processing - Technological University Dublin, Dublin, Ireland Duration: 28 Aug 2019 → 30 Aug 2019 |
Conference
Conference | IMVIP 2019: Irish Machine Vision & Image Processing |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 28/08/19 → 30/08/19 |
Keywords
- Convolutional Neural Networks
- CNNs
- computer vision
- object detection
- face recognition
- image classification
- spectrogram
- audio signal
- pathological speech
- healthy speech
- Keras
- classification accuracy