TY - GEN
T1 - Data Preprocessing Methods for Automating MLOps Pipelines
T2 - 3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026
AU - Ayinavilli, Surya Teja Gowd
AU - Quille, Keith
AU - Singh, Tarry
N1 - Publisher Copyright:
© 2026 Copyright held by the owner/author(s).
PY - 2026/2/16
Y1 - 2026/2/16
N2 - Preprocessing tools for data are increasingly being utilized in MLOps pipelines to develop models automatically. However, the fairness and reliability of automated processes are inadequately researched, risking causing performance degradation or bias. This discrepancy is addressed in this thesis with an evaluation of automated data preprocessing methods compared to a baseline approach, designed for integration into a TensorFlow Extended (TFX) pipeline. The performance of each method was compared in terms of classification measures and subgroup fairness to determine potential bias. Significance tests were employed to compare the performance of each automated method against the baseline. The results indicate that around half the automated methods had performance comparable to the baseline model, while the others performed much worse; more crucially, none of the automated methods significantly outperformed the baseline. These results show that not all preprocessing methods in automation can be used without manual validation.
AB - Preprocessing tools for data are increasingly being utilized in MLOps pipelines to develop models automatically. However, the fairness and reliability of automated processes are inadequately researched, risking causing performance degradation or bias. This discrepancy is addressed in this thesis with an evaluation of automated data preprocessing methods compared to a baseline approach, designed for integration into a TensorFlow Extended (TFX) pipeline. The performance of each method was compared in terms of classification measures and subgroup fairness to determine potential bias. Significance tests were employed to compare the performance of each automated method against the baseline. The results indicate that around half the automated methods had performance comparable to the baseline model, while the others performed much worse; more crucially, none of the automated methods significantly outperformed the baseline. These results show that not all preprocessing methods in automation can be used without manual validation.
KW - Data Preprocessing
KW - Fairness in Machine Learning
KW - MLOps Pipelines
KW - Outlier Detection and Imputation
KW - TensorFlow Data Validation (TFDV)
KW - TensorFlow Extended (TFX)
UR - https://www.scopus.com/pages/publications/105031784006
U2 - 10.1145/3777490.3777501
DO - 10.1145/3777490.3777501
M3 - Conference contribution
AN - SCOPUS:105031784006
T3 - HCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
SP - 40
EP - 45
BT - HCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
PB - Association for Computing Machinery (ACM)
Y2 - 21 January 2026 through 22 January 2026
ER -