TY - GEN
T1 - Explainability in Action: A Metric-Driven Assessment of Five XAI Methods for Healthcare Tabular Models
AU - Qureshi, M. Atif
AU - Noor, Abdul Aziz
AU - Manzoor, Awais
AU - Qureshi, Muhammad Deedahwar Mazhar
AU - Younus, Arjumand
AU - Rashwan, Wael
PY - 2025/5/21
Y1 - 2025/5/21
N2 - As explainable AI (XAI) becomes increasingly important in healthcare machine learning (ML) applications, there is a growing need for reproducible frameworks that quantitatively assess the quality of explanations. In this study, we conduct a comparative evaluation of five widely used XAI methods, LIME, SHAP, Anchors, EBM, and TABNET, on multiple healthcare tabular datasets using six well-established metrics: fidelity, simplicity, consistency, robustness, precision, and coverage. While the metrics are derived from existing literature, we formalize and implement them mathematically, providing open-source code to support standardized benchmarking. Empirically, our experiments confirm that SHAP (with TreeSHAP) achieves perfect fidelity in approximating probability outputs for tree-based models, consistent with its theoretical design. LIME offers simpler explanations but sacrifices fidelity. EBM and TABNET demonstrate strong robustness to input perturbations, while Anchors produces precise rule-based explanations with limited data coverage. These results offer practical guidance for selecting XAI methods based on application priorities such as fidelity, robustness, or simplicity. Our open-source framework enables reproducible, quantitative evaluation of XAI techniques in clinical ML workflows. Although evaluated in a clinical context, the proposed framework and metrics are broadly applicable and generalizable to other domains involving tabular data. The source codes are available at https://github.com/matifq/XAI_Tab_Health.
AB - As explainable AI (XAI) becomes increasingly important in healthcare machine learning (ML) applications, there is a growing need for reproducible frameworks that quantitatively assess the quality of explanations. In this study, we conduct a comparative evaluation of five widely used XAI methods, LIME, SHAP, Anchors, EBM, and TABNET, on multiple healthcare tabular datasets using six well-established metrics: fidelity, simplicity, consistency, robustness, precision, and coverage. While the metrics are derived from existing literature, we formalize and implement them mathematically, providing open-source code to support standardized benchmarking. Empirically, our experiments confirm that SHAP (with TreeSHAP) achieves perfect fidelity in approximating probability outputs for tree-based models, consistent with its theoretical design. LIME offers simpler explanations but sacrifices fidelity. EBM and TABNET demonstrate strong robustness to input perturbations, while Anchors produces precise rule-based explanations with limited data coverage. These results offer practical guidance for selecting XAI methods based on application priorities such as fidelity, robustness, or simplicity. Our open-source framework enables reproducible, quantitative evaluation of XAI techniques in clinical ML workflows. Although evaluated in a clinical context, the proposed framework and metrics are broadly applicable and generalizable to other domains involving tabular data. The source codes are available at https://github.com/matifq/XAI_Tab_Health.
UR - https://doi.org/10.1101/2025.05.20.25327976
U2 - 10.1101/2025.05.20.25327976
DO - 10.1101/2025.05.20.25327976
M3 - Other contribution
ER -