M.S. Applied Data Science - Capstone Chronicles 2025

19

leading the list, followed by physically_active. Medication class indicators (med_class_None, med_class_Other, med_class_MetSyn-related) remained important but generally ranked below key lifestyle variables. Figure 8 SHAP feature importance for the Group A XGBoost model.

Figure 9 SHAP feature importance for the Group B Random Forest model.

Figure 10 SHAP feature importance for the Group B XGBoost model.

Without medication data, the Random Forest model (Figure 9) emphasized physically_active, avg_kcal, avg_fiber, avg_fat, and avg_sugar as the top predictors. Socioeconomic status (income_level) maintained moderate influence, while reduced-sugar intake, lifestyle effort, and eating-out frequency also contributed. The likely_dieting feature remained near the lower end of the ranking, and eats_out_rarely contributed negligibly. The XGBoost model for Group B (Figure 10) also placed avg_kcal, avg_fat, avg_sugar, and avg_fiber at the top, followed by physically_active. Income level and lifestyle effort were mid-ranked, while reduced-calorie and reduced-sugar indicators appeared lower in importance.

Overall, dietary intake variables (avg_kcal, avg_fat, avg_sugar, avg_fiber) and physically_active consistently ranked among the top predictors in both groups, highlighting the central role of lifestyle factors in model decisions. In Group A, where medication data were included, the highest-ranked medication-related variable (med_class_None) appeared first in the Random Forest model and sixth in XGBoost, while other medication

170

Made with FlippingBook flipbook maker