Abstract
Precise relative delay estimation is important for speech demixing in Assisted Living (AL) scenarios, where large inter-microphone spacing can introduce phase-wraparound ambiguity. This survey comprehensively examines state-of-the-art methods for relative delay estimation and TF masking, focusing on methods that leverage relative delays to compute accurate speech demixing masks. These methods demonstrate that TF masks can be robustly developed using estimates of spatial, spectral, and auditory features. This paper provides a critical analysis of relative delay estimation and speech demixing techniques in AL environments. By examining existing methods, we aim to contribute to the development of effective and robust solutions for speech separation in challenging AL listening conditions. The findings of this survey highlight the importance of considering phase-wraparound and the potential benefits of incorporating auditory features into TF masking algorithms. It also considers the concept of the Internet-of-Auditory-Things (IoAudiT) as a promising future technology for AL and a framework for future research. This work offers valuable insights for researchers and practitioners in the fields of speech processing, audio signal processing, and assistive technology. The findings and recommendations presented in this survey will contribute to the advancement of speech demixing techniques and their application in real-world AL scenarios.
| Original language | English |
|---|---|
| Pages (from-to) | 193626-193666 |
| Number of pages | 41 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Assisted living
- deep learning (DL)
- hearing aid (HA)
- inter-aural intensity differences (IIDs)
- inter-aural time differences (ITDs)
- machine learning (ML)
- source separation (SS)
- spatial covariance matrix (SCM)
- steered-response power (SRP)