Skip to main navigation Skip to search Skip to main content

Synthetic Blood Data Generation for Fair Machine Learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Synthetic data generation is a promising solution to privacy concerns in healthcare analytics, particularly when working with sensitive patient information. This study evaluates the performance and fairness of generative models in creating realistic synthetic datasets derived from blood test and demographic data. Three distinct medical datasets are used, including anonymised clinical data from a public hospital in Ireland, a public UCI medical dataset, and a Korean National Health Insurance dataset from 2002. Seven generative models - CTGAN, TVAE, Gaussian Copula, CopulaGAN, MedGAN, DiscGAN, and FASTML - are applied to generate synthetic records. The quality of these records is assessed using statistical similarity metrics, including the Kolmogorov-Smirnov test, Wasserstein distance, and Jensen-Shannon Divergence, complemented by visual distribution plots. Fairness is evaluated by examining the representation of sensitive attributes such as gender and age group in both real and synthetic datasets. Results indicate that while some models maintain statistical distributions effectively, they may introduce or amplify demographic imbalances. The analysis further highlights how model performance varies across datasets of different sizes and complexities. This work provides a comprehensive evaluation of bias-aware synthetic data generation techniques and reflects on their broader implications for human-centred AI in healthcare. The findings aim to support the responsible adoption of synthetic data in privacy-sensitive environments without compromising fairness, inclusivity, or clinical reliability.

Original languageEnglish
Title of host publicationHCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
PublisherAssociation for Computing Machinery (ACM)
Pages86-92
Number of pages7
ISBN (Electronic)9798400721533
DOIs
Publication statusPublished - 16 Feb 2026
Event3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026 - Kildare, Ireland
Duration: 21 Jan 202622 Jan 2026

Publication series

NameHCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice

Conference

Conference3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026
Country/TerritoryIreland
CityKildare
Period21/01/2622/01/26

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Fairness
  • Generative models
  • Healthcare AI
  • Human centred AI
  • Synthetic data

Fingerprint

Dive into the research topics of 'Synthetic Blood Data Generation for Fair Machine Learning'. Together they form a unique fingerprint.

Cite this