TY - JOUR
T1 - Synthesising tabular data using wasserstein conditional gans with gradient penalty (WCGAN-GP)
AU - Walia, Manhar
AU - Tierney, Brendan
AU - McKeever, Susan
N1 - Publisher Copyright:
© 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2020
Y1 - 2020
N2 - Deep learning based methods based on Generative Adversarial Networks (GANs) have seen remarkable success in data synthesis of images and text. This study investigates the use of GANs for the generation of tabular mixed dataset. We apply Wasserstein Conditional Generative Adversarial Network (WCGAN-GP) to the task of generating tabular synthetic data that is indistinguishable from the real data, without incurring information leakage. The performance of WCGAN-GP is compared against both the ground truth datasets and SMOTE using three labelled real-world datasets from different domains. Our results for WCGAN-GP show that the synthetic data preserves distributions and relationships of the real data, outperforming the SMOTE approach on both class preservation and data protection metrics. Our work is a contribution towards the automated synthesis of tabular mixed data.
AB - Deep learning based methods based on Generative Adversarial Networks (GANs) have seen remarkable success in data synthesis of images and text. This study investigates the use of GANs for the generation of tabular mixed dataset. We apply Wasserstein Conditional Generative Adversarial Network (WCGAN-GP) to the task of generating tabular synthetic data that is indistinguishable from the real data, without incurring information leakage. The performance of WCGAN-GP is compared against both the ground truth datasets and SMOTE using three labelled real-world datasets from different domains. Our results for WCGAN-GP show that the synthetic data preserves distributions and relationships of the real data, outperforming the SMOTE approach on both class preservation and data protection metrics. Our work is a contribution towards the automated synthesis of tabular mixed data.
KW - Euclidean Distance
KW - GAN
KW - Generative Adversarial Network
KW - Synthetic Data
KW - Tabular Data Generation
KW - WCGAN-GP
UR - http://www.scopus.com/inward/record.url?scp=85099336478&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85099336478
SN - 1613-0073
VL - 2771
SP - 325
EP - 336
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 28th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2020
Y2 - 7 December 2020 through 8 December 2020
ER -