Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions

John Kane, Irena Yanushevskaya, Céline De Looze, Brian Vaughan, Ailbhe Ní Chasaide

Research output: Contribution to journalConference articlepeer-review

Abstract

For many applications in human-computer interaction, it is desirable to predict between-(gaps) and within-(pauses) speaker silences independently of automatic speech recognition (ASR). In this study, we focus a dataset of 6 dyadic task-based interactions and aim at automatic discrimination of gaps and pauses based on f0, energy and glottal parameters derived from the speech just preceding the silence. Initial manual annotation reveals strong discriminative power of intonation tune types. In a subsequent automatic analysis using descriptive statistics of parameter contours, as well as a modelling of such contours using principal component analysis, we are able to speakerindependently predict pauses and gaps at an accuracy of 70 % compared to a 56 % baseline.

Original languageEnglish
Pages (from-to)333-337
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2014
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 14 Sep 201418 Sep 2014

Keywords

  • Glottal source
  • Prosody
  • Speech timing
  • Spoken interaction
  • Turn-taking

Fingerprint

Dive into the research topics of 'Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions'. Together they form a unique fingerprint.

Cite this