M.S. Applied Data Science - Capstone Chronicles 2025

11

Figure 7 Combined Model Performance Heatmap

modeling or exceedance risk classification. These methods would be integrated in a two-stage pipeline first forecasting numerical concentrations, then classifying MCL exceedances pending further data enrichment and cross ‑ validation. Properly tuned hyperparameters maximize generalization performance and prevent overfitting. We conducted a grid search over penalty strengths (C = [0.01, 0.1, 1, 10]) for L1 ‑ regularized logistic regression and over tree depths (max_depth = [5, 10, 15]), number of trees (n_estimators = [100, 300, 500]), and minimum samples per leaf (min_samples_leaf = [1, 5, 10]) for random forests, using 5 ‑ fold cross ‑ validation on the 70% training split. The optimal logistic model used C = 1.0, yielding an average CV ROC ‑ AUC of 0.84; the best random forest used 300 trees, max_depth = 10, and min_samples_leaf = 5, achieving CV ROC ‑ AUC of 0.87. 5 Results and Findings To demonstrate our modeling pipeline, we present the SARIMAX forecast for manganese as a representative analyte. We trained a SARIMAX(0,0,1)(1,1,1) model on monthly median manganese concentrations from January 2020 through June 2025 and projected the next ten years (July 2025–June 2035). Table 3 Test-Set Forecast Error Comparison M ODEL MAE (µg/L) RMSE (µg/L) SARIMAX 66.0 90.5 P ROPHET 72.4 95.2 SARIMAX achieved the lowest mean absolute error (MAE = 66 µg/L) and root ‑ mean ‑ squared error (RMSE = 90.5 µg/L), reflecting its ability to capture both trend and seasonality with a parsimonious linear structure. Prophet, while more flexible around trend changepoints, showed slightly higher error (MAE = 72.4, RMSE = 95.2). Both models maintained 95%

4.4.4.1 Test design, i.e., training and validation datasets. To ensure reliable evaluation, the cleaned dataset was split into 70% training and 30% hold ‑ out test sets, stratified by excess outcome to maintain class balance. Within the training set, we implemented 5 ‑ fold cross ‑ validation for hyperparameter tuning and model selection, with each fold preserving the original exceedance ratio to prevent sampling bias. To complement SARIMAX’s linear structure, we leverage Facebook’s Prophet model for its flexible handling of trend changepoints and seasonality: Configuration enables yearly seasonality and disables weekly/daily effects, matching the typical monthly sampling scheme. Training fits an additive growth model with automatic detection of trend shifts. Prediction extends the series 12 months beyond the training end, providing both median forecasts and upper/lower uncertainty bands. Performance is assessed using the same MAE and RMSE metrics, allowing direct comparison with the SARIMAX approach. Models were assessed on the test set using ROC ‑ AUC, precision, recall, and F ₁ score, with results summarized. 4.5 Hyperparameter Tuning While our core results use SARIMAX and Prophet, the framework is designed to accommodate machine ‑ learning regressors (e.g., random forests, XGBoost, MLP) for residual

83

Made with FlippingBook flipbook maker