Abstract
Synthetic data generation is a promising solution to privacy concerns in healthcare analytics, particularly when working with sensitive patient information. This study evaluates the performance and fairness of generative models in creating realistic synthetic datasets derived from blood test and demographic data. Three distinct medical datasets are used, including anonymised clinical data from a public hospital in Ireland, a public UCI medical dataset, and a Korean National Health Insurance dataset from 2002. Seven generative models - CTGAN, TVAE, Gaussian Copula, CopulaGAN, MedGAN, DiscGAN, and FASTML - are applied to generate synthetic records. The quality of these records is assessed using statistical similarity metrics, including the Kolmogorov-Smirnov test, Wasserstein distance, and Jensen-Shannon Divergence, complemented by visual distribution plots. Fairness is evaluated by examining the representation of sensitive attributes such as gender and age group in both real and synthetic datasets. Results indicate that while some models maintain statistical distributions effectively, they may introduce or amplify demographic imbalances. The analysis further highlights how model performance varies across datasets of different sizes and complexities. This work provides a comprehensive evaluation of bias-aware synthetic data generation techniques and reflects on their broader implications for human-centred AI in healthcare. The findings aim to support the responsible adoption of synthetic data in privacy-sensitive environments without compromising fairness, inclusivity, or clinical reliability.
| Original language | English |
|---|---|
| Title of host publication | HCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice |
| Publisher | Association for Computing Machinery (ACM) |
| Pages | 86-92 |
| Number of pages | 7 |
| ISBN (Electronic) | 9798400721533 |
| DOIs | |
| Publication status | Published - 16 Feb 2026 |
| Event | 3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026 - Kildare, Ireland Duration: 21 Jan 2026 → 22 Jan 2026 |
Publication series
| Name | HCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice |
|---|
Conference
| Conference | 3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026 |
|---|---|
| Country/Territory | Ireland |
| City | Kildare |
| Period | 21/01/26 → 22/01/26 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Fairness
- Generative models
- Healthcare AI
- Human centred AI
- Synthetic data
Fingerprint
Dive into the research topics of 'Synthetic Blood Data Generation for Fair Machine Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver