TY - GEN
T1 - An Exploration of the Latent Space of a Convolutional Variational Autoencoder for the Generation of Musical Instrument Tones
AU - Natsiou, Anastasia
AU - O’Leary, Seán
AU - Longo, Luca
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Variational Autoencoders (VAEs) constitute one of the most significant deep generative models for the creation of synthetic samples. In the field of audio synthesis, VAEs have been widely used for the generation of natural and expressive sounds, such as music or speech. However, VAEs are often considered black boxes and the attributes that contribute to the synthesis of a sound are yet unsolved. Existing research focused on the way input data can influence the generation of latent space, and how this latent space can create synthetic data, is still insufficient. In this manuscript, we investigate the interpretability of the latent space of VAEs and the impact of each attribute of this space on the generation of synthetic instrumental notes. The contribution to the body of knowledge of this research is to offer, for both the XAI and sound community, an approach for interpreting how the latent space generates new samples. This is based on sensitivity and feature ablation analyses, and descriptive statistics.
AB - Variational Autoencoders (VAEs) constitute one of the most significant deep generative models for the creation of synthetic samples. In the field of audio synthesis, VAEs have been widely used for the generation of natural and expressive sounds, such as music or speech. However, VAEs are often considered black boxes and the attributes that contribute to the synthesis of a sound are yet unsolved. Existing research focused on the way input data can influence the generation of latent space, and how this latent space can create synthetic data, is still insufficient. In this manuscript, we investigate the interpretability of the latent space of VAEs and the impact of each attribute of this space on the generation of synthetic instrumental notes. The contribution to the body of knowledge of this research is to offer, for both the XAI and sound community, an approach for interpreting how the latent space generates new samples. This is based on sensitivity and feature ablation analyses, and descriptive statistics.
KW - Audio Representations
KW - Audio Synthesis
KW - Explainable Artificial Intelligence (XAI)
KW - Latent Feature Importance
KW - Variational Autoencoders (VAE)
UR - http://www.scopus.com/inward/record.url?scp=85175964521&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-44070-0_24
DO - 10.1007/978-3-031-44070-0_24
M3 - Conference contribution
AN - SCOPUS:85175964521
SN - 9783031440694
T3 - Communications in Computer and Information Science
SP - 470
EP - 486
BT - Explainable Artificial Intelligence - 1st World Conference, xAI 2023, Proceedings
A2 - Longo, Luca
PB - Springer Science and Business Media Deutschland GmbH
T2 - 1st World Conference on eXplainable Artificial Intelligence, xAI 2023
Y2 - 26 July 2023 through 28 July 2023
ER -