M.S. Applied Data Science - Capstone Chronicles 2025

18

5.3 Hypothesis Testing and Actionable Insights To guard against overfitting beyond cross-validation, we reserved a completely unseen 10% hold-back split before any modeling and evaluated the final models only at the end. Performance on this hold-back closely tracked the test set: the random forest achieved ROC-AUC = 0.87 and F ₁ = 0.72, supporting genuine generalizability. We then tested our working hypotheses about system size, geography, and method. Using logistic-regression coefficients for standardized population served and service connections, we conducted Wald Z-tests and found larger systems had significantly lower odds of exceedance (OR = 0.88 per additional 1,000 people; p < .001). Spatially, Local Indicators of Spatial Association (LISA) highlighted county clusters where exceedance rates deviated from the statewide baseline; Kern and Siskiyou, for example, exhibited rates \~2.5× the state average (p < .01). Temporal and analyte-specific patterns also emerged: persistent seasonal iron spikes in Amador suggest targeted treatment upgrades during specific months, while a declining fluoride trend in Alameda warrants investigation of dosing operations around 2023, when the downward drift begins. Finally, pre-/post-2024 hypothesis tests in high-risk counties (e.g., Tulare) showed significant arsenic reductions (p < .05), consistent with the rollout of stricter treatment regulations in early 2024. Together, these results validate the models, quantify drivers of risk, and pinpoint where focused monitoring and interventions can yield the greatest public-health payoff. 6 Discussion Our statewide time series analysis of water quality in California, encompassing all 58 counties, reveals both common and distinct contaminant dynamics across regions. While many counties share a low, stable baseline for arsenic and fluoride reflecting uniform regulatory

contaminant will exceed its regulatory limit, then, if so, forecast the magnitude. (3) AUC helps identify which counties and analytes yield reliable exceedance predictions—and where data scarcity leads to poor classification. (4) 5.1 Evaluation of Results We repeated the same approach for each of California’s 58 counties. While the SARIMAX hyperparameters remained fixed for comparability, performance varied with data density: counties with ≥ 50 monthly observations (e.g., Los Angeles, San Diego) achieved MAE/RMSE like Alameda, but small counties (≤ 20 observations) saw 30–50 % higher errors due to over-smoothing and wide confidence intervals. Choosing appropriate metrics (ROC ‑ AUC, precision, recall, F ₁ ) ensures balanced evaluation of detection accuracy and false ‑ alarm rates. On the 30% hold ‑ out test set, we computed ROC ‑ AUC to measure discrimination ability, precision to quantify the proportion of true exceedances among positive predictions, recall (TPR) to capture sensitivity to actual exceedances, and F ₁ to balance these two. The random forest achieved ROC ‑ AUC = 0.88, precision = 0.76, recall = 0.71, and F ₁ = 0.73, surpassing our acceptability threshold of 0.70 on all metrics. 5.2 Iterative Improvement Iteration mattered: as we moved from initial defaults to grid-search batch 1 and then batch 2, we saw clear, measurable gains. ROC-AUC rose from 0.83 → 0.86 → 0.88, while F ₁ improved from 0.68 → 0.71 → 0.73, confirming that targeted hyperparameter adjustments consistently enhanced predictive power.

90

Made with FlippingBook flipbook maker