Speech intelligibility from image processing

Andrew Hines, Naomi Harte

Research output: Contribution to journalArticlepeer-review

Abstract

Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. The responses of the model used in this study have been previously shown to be consistent with a wide range of physiological data from both normal and impaired ears for stimuli presentation levels spanning the dynamic range of hearing. The model output can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated in this study using two types of neurograms: temporal fine structure (TFS) and average discharge rate or temporal envelope. A new systematic way of assessing phonemic degradation is proposed using the outputs of an auditory nerve model for a range of SNHLs. The mean structured similarity index (MSSIM) is an objective measure originally developed to assess perceptual image quality. The measure is adapted here for use in measuring the phonemic degradation in neurograms derived from impaired auditory nerve outputs. A full evaluation of the choice of parameters for the metric is presented using a large amount of natural human speech. The metric's boundedness and the results for TFS neurograms indicate it is a superior metric to standard point to point metrics of relative mean absolute error and relative mean squared error. MSSIM as an indicative score of intelligibility is also promising, with results similar to those of the standard speech intelligibility index metric.

Original languageEnglish
Pages (from-to)736-752
Number of pages17
JournalSpeech Communication
Volume52
Issue number9
DOIs
Publication statusPublished - Sep 2010
Externally publishedYes

Keywords

  • Auditory periphery model
  • Hearing aids
  • MSSIM
  • Sensorineural hearing loss
  • Speech intelligibility
  • Structural similarity

Fingerprint

Dive into the research topics of 'Speech intelligibility from image processing'. Together they form a unique fingerprint.

Cite this