TY - GEN
T1 - APHONIC
T2 - 13th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2017
AU - de Fréin, Ruairí
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/20
Y1 - 2017/11/20
N2 - We propose a signal-channel, adaptive threshold selection technique for binary mask construction, namely APHONIC, (AdaPtive tHreshOlding for NoIse Cancellation) for smart mobile environments. Using this mask, we introduce two noise cancellation techniques that perform robustly in the presence of real-world interfering signals that are typically encountered by mobile users: a violin busker, a subway and busy city square sounds. We demonstrate that when the power of the time-frequency components of the voice of a mobile user does not significantly overlap with the components of the interference signal, the threshold learning and noise cancellation techniques significantly improve the Signal-to-Interference Ratio (SIR) and the Signal-Distortion Ratio (SDR) of the recovered voice. When a mobile user's speech is mixed with music or with the sounds of a city square, or subway station, the speech energy is captured by a few large magnitude coefficients and APHONIC improves the SIR by greater than 20dB and the SDR by up to 5dB. The robustness of the threshold selection step and the noise cancellation algorithms is evaluated using environments typically experienced by mobile phone users. Listening tests indicate that the interference signal is no longer audible in the denoised signals. We outline how this approach could be used in many mobile voice-driven applications.
AB - We propose a signal-channel, adaptive threshold selection technique for binary mask construction, namely APHONIC, (AdaPtive tHreshOlding for NoIse Cancellation) for smart mobile environments. Using this mask, we introduce two noise cancellation techniques that perform robustly in the presence of real-world interfering signals that are typically encountered by mobile users: a violin busker, a subway and busy city square sounds. We demonstrate that when the power of the time-frequency components of the voice of a mobile user does not significantly overlap with the components of the interference signal, the threshold learning and noise cancellation techniques significantly improve the Signal-to-Interference Ratio (SIR) and the Signal-Distortion Ratio (SDR) of the recovered voice. When a mobile user's speech is mixed with music or with the sounds of a city square, or subway station, the speech energy is captured by a few large magnitude coefficients and APHONIC improves the SIR by greater than 20dB and the SDR by up to 5dB. The robustness of the threshold selection step and the noise cancellation algorithms is evaluated using environments typically experienced by mobile phone users. Listening tests indicate that the interference signal is no longer audible in the denoised signals. We outline how this approach could be used in many mobile voice-driven applications.
KW - Blind Source Separation
KW - Human Computer Interaction
KW - Mobile Computing
KW - Mobile Voice-driven Applications
KW - Noise Cancellation
UR - http://www.scopus.com/inward/record.url?scp=85041392352&partnerID=8YFLogxK
U2 - 10.1109/WiMOB.2017.8115847
DO - 10.1109/WiMOB.2017.8115847
M3 - Conference contribution
AN - SCOPUS:85041392352
T3 - International Conference on Wireless and Mobile Computing, Networking and Communications
SP - 285
EP - 292
BT - 2017 IEEE 13th International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2017
PB - IEEE Computer Society
Y2 - 9 October 2017 through 11 October 2017
ER -