Interpretable timbre synthesis using variational autoencoders regularized on timbre descriptors

Anastasia Natsiou, Luca Longo, Seán O’Leary

Research output: Contribution to journalConference articlepeer-review

Abstract

Controllable timbre synthesis has been a subject of research for several decades, and deep neural networks have been the most successful in this area. Deep generative models such as Variational Autoencoders (VAEs) have the ability to generate a high-level representation of audio while providing a structured latent space. Despite their advantages, the interpretability of these latent spaces in terms of human perception is often limited. To address this limitation and enhance the control over timbre generation, we propose a regularized VAE-based latent space that incorporates timbre descriptors. Moreover, we suggest a more concise representation of sound by utilizing its harmonic content, in order to minimize the dimensionality of the latent space.

Original languageEnglish
Pages (from-to)359-362
Number of pages4
JournalProceedings of the International Conference on Digital Audio Effects, DAFx
DOIs
Publication statusPublished - 2023
Event26th International Conference on Digital Audio Effects, DAFx 2023 - Copenhagen, Denmark
Duration: 4 Sep 20237 Sep 2023

Keywords

  • Controllable timbre synthesis
  • deep neural networks
  • Variational Autoencoders
  • latent space
  • timbre descriptors
  • harmonic content

Fingerprint

Dive into the research topics of 'Interpretable timbre synthesis using variational autoencoders regularized on timbre descriptors'. Together they form a unique fingerprint.

Cite this