Abstract
For many applications in human-computer interaction, it is desirable to predict between-(gaps) and within-(pauses) speaker silences independently of automatic speech recognition (ASR). In this study, we focus a dataset of 6 dyadic task-based interactions and aim at automatic discrimination of gaps and pauses based on f0, energy and glottal parameters derived from the speech just preceding the silence. Initial manual annotation reveals strong discriminative power of intonation tune types. In a subsequent automatic analysis using descriptive statistics of parameter contours, as well as a modelling of such contours using principal component analysis, we are able to speakerindependently predict pauses and gaps at an accuracy of 70 % compared to a 56 % baseline.
| Original language | English |
|---|---|
| Pages (from-to) | 333-337 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Publication status | Published - 2014 |
| Event | 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore Duration: 14 Sep 2014 → 18 Sep 2014 |
Keywords
- Glottal source
- Prosody
- Speech timing
- Spoken interaction
- Turn-taking