TY - JOUR
T1 - An investigation of microbial groundwater contamination seasonality and extreme weather event interruptions using “big data”, time-series analyses, and unsupervised machine learning
AU - Petculescu, Ioan
AU - Hynds, Paul
AU - Brown, R. Stephen
AU - McDermott, Kevin
AU - Majury, Anna
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/3/1
Y1 - 2025/3/1
N2 - Temporal studies of groundwater potability have historically focused on E. coli detection rates, with non-E. coli coliforms (NEC) and microbial concentrations remaining understudied by comparison. Additionally, “big data” (i.e., large, diverse datasets that grow over time) have yet to be employed for assessing the effects of high return-period extreme weather events on groundwater quality. The current investigation employed ≈1.1 million Ontarian private well samples collected between 2010 and 2021, seeking to address these knowledge gaps via applying time-series decomposition, interrupted time-series analysis (ITSA), and unsupervised machine learning to five microbial contamination parameters: E. coli and NEC concentrations (CFU/100 mL) and detection rates (%), and the calculated NEC:E. coli ratio. Time-series decompositions revealed E. coli concentrations and the NEC:E. coli ratio as complementary metrics, with concurrent interpretation of their seasonal signals indicating that localized contamination mechanisms dominate during winter months. ITSA findings highlighted the importance of hydrogeological time lags: for example, a significant E. coli detection rate increase (2.4% vs 1.8%, p = 0.02) was identified 12 weeks after the May 2017 flood event. Unsupervised machine learning spatially classified annual contamination cycles across Ontarian subregions (n = 27), with the highest inter-cluster variability identified among E. coli detection rates and the lowest among NEC detection rates and the NEC:E. coli ratio. Given the spatiotemporal consistency identified for NEC and the NEC:E. coli ratio, associated interpretations and recommendations are likely transferable across large, heterogeneous regions. The presented study may serve as a methodological blueprint for future temporal investigations employing “big” groundwater quality data.
AB - Temporal studies of groundwater potability have historically focused on E. coli detection rates, with non-E. coli coliforms (NEC) and microbial concentrations remaining understudied by comparison. Additionally, “big data” (i.e., large, diverse datasets that grow over time) have yet to be employed for assessing the effects of high return-period extreme weather events on groundwater quality. The current investigation employed ≈1.1 million Ontarian private well samples collected between 2010 and 2021, seeking to address these knowledge gaps via applying time-series decomposition, interrupted time-series analysis (ITSA), and unsupervised machine learning to five microbial contamination parameters: E. coli and NEC concentrations (CFU/100 mL) and detection rates (%), and the calculated NEC:E. coli ratio. Time-series decompositions revealed E. coli concentrations and the NEC:E. coli ratio as complementary metrics, with concurrent interpretation of their seasonal signals indicating that localized contamination mechanisms dominate during winter months. ITSA findings highlighted the importance of hydrogeological time lags: for example, a significant E. coli detection rate increase (2.4% vs 1.8%, p = 0.02) was identified 12 weeks after the May 2017 flood event. Unsupervised machine learning spatially classified annual contamination cycles across Ontarian subregions (n = 27), with the highest inter-cluster variability identified among E. coli detection rates and the lowest among NEC detection rates and the NEC:E. coli ratio. Given the spatiotemporal consistency identified for NEC and the NEC:E. coli ratio, associated interpretations and recommendations are likely transferable across large, heterogeneous regions. The presented study may serve as a methodological blueprint for future temporal investigations employing “big” groundwater quality data.
KW - E. coli
KW - Interrupted time-series
KW - Machine learning
KW - Private wells
KW - Seasonal decomposition
KW - Total coliforms
UR - https://www.scopus.com/pages/publications/85217089365
U2 - 10.1016/j.envpol.2025.125790
DO - 10.1016/j.envpol.2025.125790
M3 - Article
C2 - 39922413
AN - SCOPUS:85217089365
SN - 0269-7491
VL - 368
JO - Environmental Pollution
JF - Environmental Pollution
M1 - 125790
ER -