M.S. Applied Data Science - Capstone Chronicles 2025
20
categories had notably lower SHAP values. This suggests that, although medication indicators contributed to predictions, their influence was secondary to lifestyle-related variables in determining risk. 5.4 Interpretation of Model Results The comparative modeling results provide insight into the relative importance of lifestyle versus medication-related variables in predicting metabolic syndrome. Across both groups, the top-performing models—Random Forest, XGBoost, and MLP—consistently demonstrated high and balanced weighted precision, recall, and F1-scores, suggesting robust predictive capacity for both positive and negative classes. Notably, the minimal drop in performance from Group A (with medication variables) to Group B (without medication variables) challenges our initial hypothesis that including medication data would markedly improve predictive accuracy. However, the SHAP analyses offer a more nuanced perspective. In Group A, medication-related indicators appeared among the top predictors in both Random Forest and XGBoost, though they did not dominate the rankings. The most influential medication variable (med_class_None) in Group A Random Forest ranked first, while in XGBoost it appeared sixth. In contrast, dietary intake variables (avg_kcal, avg_fat, avg_sugar, avg_fiber) and physically_active consistently held top positions across all models and groups, underscoring the central role of lifestyle factors in the prediction task. This pattern suggests that while medication use carries some predictive signal—potentially acting as a proxy for clinical diagnosis or disease severity—its contribution may be less
direct than that of lifestyle measures. The fact that high performance was maintained without medication variables implies that lifestyle and behavioral data alone can capture much of the risk profile. Nevertheless, the modest gains observed in some Group A models indicate that medication information still adds complementary predictive value, particularly for complex decision boundaries in non-linear models. These findings directly address the research question. The hypothesis was that models including medication variables (Model A) would outperform those excluding them (Model B). While the top-performing models in Model A achieved slightly higher metrics in some cases, the differences were modest, indicating that lifestyle and behavioral indicators alone retained substantial predictive power. This suggests that, although medication data provide an additional predictive signal, they are not strictly necessary for high-performing classification in this context. 6 Discussion The results of this study underscore the importance of lifestyle and behavioral indicators as strong predictors of metabolic syndrome, with or without the inclusion of medication-related variables. While Group A models (lifestyle + medication features) achieved slightly higher performance in several cases—most notably with Random Forest and XGBoost—the narrow performance gap between Group A and Group B suggests that lifestyle and behavioral variables alone can capture much of the risk profile. These results fully support the first part of our hypothesis—that lifestyle and behavioral
171
Made with FlippingBook flipbook maker