Synthesising tabular data using wasserstein conditional gans with gradient penalty (WCGAN-GP)

Research output: Contribution to journalConference articlepeer-review

Abstract

Deep learning based methods based on Generative Adversarial Networks (GANs) have seen remarkable success in data synthesis of images and text. This study investigates the use of GANs for the generation of tabular mixed dataset. We apply Wasserstein Conditional Generative Adversarial Network (WCGAN-GP) to the task of generating tabular synthetic data that is indistinguishable from the real data, without incurring information leakage. The performance of WCGAN-GP is compared against both the ground truth datasets and SMOTE using three labelled real-world datasets from different domains. Our results for WCGAN-GP show that the synthetic data preserves distributions and relationships of the real data, outperforming the SMOTE approach on both class preservation and data protection metrics. Our work is a contribution towards the automated synthesis of tabular mixed data.

Original languageEnglish
Pages (from-to)325-336
Number of pages12
JournalCEUR Workshop Proceedings
Volume2771
Publication statusPublished - 2020
Event28th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2020 - Dublin, Ireland
Duration: 7 Dec 20208 Dec 2020

Keywords

  • Euclidean Distance
  • GAN
  • Generative Adversarial Network
  • Synthetic Data
  • Tabular Data Generation
  • WCGAN-GP

Fingerprint

Dive into the research topics of 'Synthesising tabular data using wasserstein conditional gans with gradient penalty (WCGAN-GP)'. Together they form a unique fingerprint.

Cite this