An Exploration of the Latent Space of a Convolutional Variational Autoencoder for the Generation of Musical Instrument Tones

Anastasia Natsiou

Research output: Contribution to conferencePaperpeer-review

Abstract

Variational Autoencoders (VAEs) constitute one of the most significant deep generative models for the creation of synthetic samples. In the field of audio synthesis, VAEs have been widely used for the generation of natural and expressive sounds, such as music or speech. However, VAEs are often considered black boxes and the attributes that contribute to the synthesis of a sound are yet unsolved. Existing research focused on the way input data can influence the generation of latent space, and how this latent space can create synthetic data, is still insufficient. In this manuscript, we investigate the interpretability of the latent space of VAEs and the impact of each attribute of this space on the generation of synthetic instrumental notes. The contribution to the body of knowledge of this research is to offer, for both the XAI and sound community, an approach for interpreting how the latent space generates new samples. This is based on sensitivity and feature ablation analyses, and descriptive statistics.
Original languageEnglish
DOIs
Publication statusPublished - 2023
EventxAI2023 Conference -
Duration: 1 Jan 2023 → …

Conference

ConferencexAI2023 Conference
Period1/01/23 → …

Keywords

  • Variational Autoencoders
  • deep generative models
  • audio synthesis
  • natural sounds
  • expressive sounds
  • music
  • speech
  • black boxes
  • latent space
  • synthetic data
  • interpretability
  • sensitivity analysis
  • feature ablation
  • descriptive statistics

Fingerprint

Dive into the research topics of 'An Exploration of the Latent Space of a Convolutional Variational Autoencoder for the Generation of Musical Instrument Tones'. Together they form a unique fingerprint.

Cite this