Synthesising Tabular Datasets Using Wasserstein Conditional GANS with Gradient Penalty (WCGAN-GP)

Susan McKeever, Manhar Singh Walia

Research output: Contribution to conferencePaperpeer-review

Abstract

Deep learning based methods based on Generative Adversarial Networks (GANs) have seen remarkable success in data synthesis of images and text. This study investigates the use of GANs for the generation of tabular mixed dataset. We apply Wasserstein Conditional Generative Adversarial Network (WCGAN-GP) to the task of generating tabular synthetic data that is indistinguishable from the real data, without incurring information leakage. The performance of WCGAN-GP is compared against both the ground truth datasets and SMOTE using three labelled real-world datasets from different domains. Our results for WCGAN-GP show that the synthetic data preserves distributions and relationships of the real data, outperforming the SMOTE approach on both class preservation and data protection metrics. Our work is a contribution towards the automated synthesis of tabular mixed data
Original languageEnglish
DOIs
Publication statusPublished - 2020
EventAICS 2020 - Dublin, Ireland
Duration: 1 Jan 202031 Dec 2020

Conference

ConferenceAICS 2020
Country/TerritoryIreland
CityDublin
Period1/01/2031/12/20
Other28th Irish Conference on Artificial Intelligence and Cognitive Science

Keywords

  • Deep learning
  • Generative Adversarial Networks
  • data synthesis
  • tabular data
  • Wasserstein Conditional Generative Adversarial Network
  • synthetic data
  • information leakage
  • SMOTE
  • class preservation
  • data protection

Fingerprint

Dive into the research topics of 'Synthesising Tabular Datasets Using Wasserstein Conditional GANS with Gradient Penalty (WCGAN-GP)'. Together they form a unique fingerprint.

Cite this