Coastal water quality management is a public health concern, as poor coastal
water quality can harbor pathogens that are dangerous to human health.
Tourism-oriented countries need to actively monitor the condition of coastal
water at tourist popular sites during the summer season. In this study, routine
monitoring data of
Escherichia Coli and enterococci across 15 public beaches
in the city of Rijeka, Croatia, were used to build machine learning models for
predicting their levels based on environmental parameters as well as to
investigate their relationships with environmental stressors. Gradient Boosting
(Catboost, Xgboost), Random Forests, Support Vector Regression and Artificial
Neural Networks were trained with measurements from all sampling sites and used
to predict
E. Coli and enterococci values based on environmental features.
The evaluation of stability and generalizability with 10-fold cross validation
analysis of the machine learning models, showed that the Catboost algorithm
performed best with R
2 values of 0.71 and 0.68 for predicting
E. Coli and
enterococci, respectively, compared to other evaluated ML algorithms including
Xgboost, Random Forests, Support Vector Regression and Artificial Neural
Networks. We also use the SHapley Additive exPlanations technique to identify
and interpret which features have the most predictive power. The results show
that site salinity measured is the most important feature for forecasting both
E. Coli and enterococci levels. Finally, the spatial and temporal accuracy
of both ML models were examined at sites with the lowest coastal water quality.
The spatial
E.Coli and enterococci models achieved strong R
2 values of
0.85 and 0.83, while the temporal models achieved R
2 values of 0.74 and
0.67. The temporal model also achieved moderate R
2 values of 0.44 and 0.46
at a site with high coastal water quality.