Automatic Alignment of Long Syllables in a Cappella Beijing Opera

Georgi Dzhambazov, Yile Yang, Rafael Caro Repetto, Xavier Serra

Research output: Contribution to conferencePaperpeer-review

Abstract

In this study we propose how to modify a standard approach for text-to-speech alignment to apply in the case of alignment of lyrics and singing voice. We model phoneme durations by means of a duration-explicit hidden Markov model (DHMM) phonetic recognizer based on MFCCs. The phoneme durations are empirically set in a probabilistic way, based on prior knowledge about the lyrics structure and metric principles, specific for the Beijing opera music tradition. Phoneme models are GMMs trained directly on a small corpus of annotated singing voice. The alignment is evaluated on a cappella material from Beijing opera, which is characterized by its particularly long syllable durations. Results show that the incorporation of music-specific knowledge results in a very high alignment accuracy, outperforming significantly a baseline HMM-based approach.
Original languageEnglish
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event6th International Workshop on Folk Music Analysis - Dublin, Ireland
Duration: 15 Jun 201617 Jun 2016

Conference

Conference6th International Workshop on Folk Music Analysis
Country/TerritoryIreland
CityDublin
Period15/06/1617/06/16

Keywords

  • text-to-speech alignment
  • lyrics and singing voice
  • duration-explicit hidden Markov model
  • phoneme durations
  • Beijing opera
  • GMMs
  • alignment accuracy

Fingerprint

Dive into the research topics of 'Automatic Alignment of Long Syllables in a Cappella Beijing Opera'. Together they form a unique fingerprint.

Cite this