Big-Delay Estimation for Speech Separation in Assisted Living Environments

Research output: Contribution to journalArticlepeer-review

Abstract

Phase wraparound due to large inter-sensor spacings in multi-channel demixing renders the DUET and AdRess source separation algorithms—known for their low computational complexity and effective speech demixing performance—unsuitable for hearing-assisted living applications, where such configurations are needed. DUET is limited to relative delays of up to 7 samples, given a sampling rate of (Formula presented.) kHz in anechoic scenarios, while the AdRess algorithm is constrained to instantaneous mixing problems. The task of this paper is to improve the performance of DUET-type time–frequency (TF) masks when microphones are placed far apart. A significant challenge in assistive hearing scenarios is phase wraparound caused by large relative delays. We evaluate the performance of a large relative delay estimation method, called the Elevatogram, in the presence of significant phase wraparound. We present extensions of DUET and AdRess, termed Elevato-DUET and Elevato-AdRess, which are effective in scenarios with relative delays of up to 200 samples. The findings demonstrate that Elevato-AdRess not only outperforms Elevato-DUET in terms of objective separation quality metrics—BSS_Eval and PEASS—but also achieves higher intelligibility scores, as measured by the Perceptual Evaluation of Speech Quality (PESQ) Mean Opinion Score (MOS) scores. These findings suggest that the phase wraparound limitations of DUET and AdRess algorithms in assistive hearing scenarios involving large inter-microphone spacing can be addressed by introducing the Elevatogram-based Elevato-DUET and Elevato-AdRess algorithms. These algorithms improve separation quality and intelligibility, with Elevato-AdRess demonstrating the best overall performance.

Original languageEnglish
Article number184
JournalFuture Internet
Volume17
Issue number4
DOIs
Publication statusPublished - Apr 2025

Keywords

  • assisted living (AL)
  • binary mask
  • interaural intensity difference (IID)
  • interaural phase difference (IPD)
  • relative delay estimation
  • relative transfer function (RTF)
  • remote microphone (RM)
  • single source point (SSP)
  • source separation (SS)
  • time–frequency (TF)
  • windowed-disjoint orthogonal (WDO)

Fingerprint

Dive into the research topics of 'Big-Delay Estimation for Speech Separation in Assisted Living Environments'. Together they form a unique fingerprint.

Cite this