Skip to main navigation Skip to search Skip to main content

Data Preprocessing Methods for Automating MLOps Pipelines: A Comparative Study

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Preprocessing tools for data are increasingly being utilized in MLOps pipelines to develop models automatically. However, the fairness and reliability of automated processes are inadequately researched, risking causing performance degradation or bias. This discrepancy is addressed in this thesis with an evaluation of automated data preprocessing methods compared to a baseline approach, designed for integration into a TensorFlow Extended (TFX) pipeline. The performance of each method was compared in terms of classification measures and subgroup fairness to determine potential bias. Significance tests were employed to compare the performance of each automated method against the baseline. The results indicate that around half the automated methods had performance comparable to the baseline model, while the others performed much worse; more crucially, none of the automated methods significantly outperformed the baseline. These results show that not all preprocessing methods in automation can be used without manual validation.

Original languageEnglish
Title of host publicationHCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
PublisherAssociation for Computing Machinery (ACM)
Pages40-45
Number of pages6
ISBN (Electronic)9798400721533
DOIs
Publication statusPublished - 16 Feb 2026
Event3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026 - Kildare, Ireland
Duration: 21 Jan 202622 Jan 2026

Publication series

NameHCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice

Conference

Conference3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026
Country/TerritoryIreland
CityKildare
Period21/01/2622/01/26

Keywords

  • Data Preprocessing
  • Fairness in Machine Learning
  • MLOps Pipelines
  • Outlier Detection and Imputation
  • TensorFlow Data Validation (TFDV)
  • TensorFlow Extended (TFX)

Fingerprint

Dive into the research topics of 'Data Preprocessing Methods for Automating MLOps Pipelines: A Comparative Study'. Together they form a unique fingerprint.

Cite this