M.S. Applied Data Science - Capstone Chronicles 2025
10
Figure 6 Model Performance compersion
robustness across California’s diverse water systems. Table 2 Forecasting and Exceedance-Risk Model Performance Summary (AIC and ROC-AUC) M ODEL T ASK M ETRI C V ALUE
SARIMA outperformed its non-seasonal counterpart by approximately 115 AIC points, confirming strong annual cycles in contaminant levels. Likewise, Random Forest achieved the highest ROC AUC (≈ 0.97), slightly above XGBoost (≈ 0.94), demonstrating that nonlinear interactions among system features carry predictive power for exceedance events. Normalized performance: SARIMA vs ARIMA (AIC scale) and Random Forest vs XGBoost (ROC AUC scale). In raw metric terms, SARIMA’s AIC of ~ 530 versus ARIMA’s ~ 645 underscores the value of modeling seasonality explicitly. On the classification side, Random Forest’s ROC AUC of 0.97 outstrips XGBoost’s 0.94, making it our preferred excessance-risk detector; logistic regression remains useful for its transparent feature coefficients (AUC ≈ 0.90). AIC for ARIMA and SARIMA; ROC AUC for Random Forest and XGBoost. Taken together, these results guided our choice of SARIMA for concentration forecasting capturing trend and seasonality parsimoniously and Random Forest for exceedance ‐ risk classification, balancing predictive accuracy with
ARIMA ( P , D , Q ) = (0,0,1)
Concentration Forecasting Concentration Forecasting Exceedance Risk Classification Exceedance Risk Classification Exceedance Risk Classification
AIC
645.3
SARIMA (0,0,1)(1,1, 1,12) L OGISTIC R EGRESSION (L1) R ANDOM F OREST XGB OOST
AIC
529.7
ROC AUC ROC AUC ROC AUC
0.90
0.97
0.94
82
Made with FlippingBook flipbook maker