A graph-based machine learning framework for river water quality management under data limitations
- Authors
- Choi, Sueryun; Ullah, Zahid; Son, Moon
- Issue Date
- 2026-01
- Publisher
- Academic Press
- Citation
- Journal of Environmental Management, v.398
- Abstract
- Accurate prediction of riverine water quality is often hindered by sparse sampling and limited streamflow data, a common outcome of resource-constrained watershed monitoring. To address this, we propose a three-module machine-learning framework—prediction (graph neural networks or recurrent networks), interpretation (explainable AI), and management (counterfactual analysis)—and apply it to chromaticity prediction in the Hantan River Basin, Republic of Korea. The dataset includes 1667 monthly observations from 59 monitoring sites (December 2021–October 2024) covering 37 hydro-environmental variables. Performance was assessed using independent training, validation, and test sets. Graph-based models outperformed the recurrent baseline, with the enhanced Graph Sample-and-Aggregate model achieving a test R2 of 0.82, demonstrating that representing pollution-source characteristics and transport pathways improves prediction. Interpretability analyses revealed management-relevant insights: PGExplainer highlighted strong upstream influences from the SC sub-watershed, identifying it as the primary intervention region. Feature attribution distinguished long-term influences (e.g., TOC near major WWTPs) from short-term episodic drivers associated with facility-specific effluent spikes. Counterfactual analyses quantified the reductions in effluent chromaticity and proximal indicators required to achieve downstream targets at site HT Y4. Counterfactual success rates—defined as the proportion of model-generated cases meeting the target—were 26 % and 40 % for chromaticity targets of 14 and 15 color units (CU), respectively. Given these outcomes and considering that 14–15 CU is generally acceptable for basin-scale management, a downstream target of 14–15 CU was proposed as feasible and practical. Overall, the framework serves as a cost-effective and interpretable decision-support tool for watershed management under data-limited monitoring conditions.
- Keywords
- NEURAL-NETWORKS; Water quality prediction; Sparse sampling data; Graph neural network (GNN); Explainable artificial intelligence (XAI); Counterfactual analysis; River basin management
- ISSN
- 0301-4797
- URI
- https://pubs.kist.re.kr/handle/201004/154126
- DOI
- 10.1016/j.jenvman.2026.128575
- Appears in Collections:
- KIST Article > 2026
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.