Abstract
The ARX-LF model interprets voiced speech as the an LF derivative glottal pulse exciting an all-pole vocal tract filter with an additional exogenous residual signal. It fully parameterizes the voice and has been shown to be useful for voice modification. Because time domain methods to determine the ARX-LF parameters from speech are very sensitive to the time placement of the analysis frame and not robust to phase distortion from e.g. recording equipment, a magnitude-only spectral approach to ARX-LF parameterization was recently developed. This paper describes extensions to this frequency domain approach to obtain continuous robust ARX-LF parameters for voiced speech segments. A listening test of 50 participants comparing synthetic speech produced by this method with a time domain ARX-LF parameterization approach under real and simulated recording conditions was conducted and it was found that the frequency domain approach was generally preferred.
Original language | English |
---|---|
Pages (from-to) | 57-60 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2011 |
Event | 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy Duration: 27 Aug 2011 → 31 Aug 2011 |
Keywords
- ARX-LF model
- Speech synthesis
- Voice coding