M.S. Applied Data Science - Capstone Chronicles 2025
19
leading the list, followed by physically_active. Medication class indicators (med_class_None, med_class_Other, med_class_MetSyn-related) remained important but generally ranked below key lifestyle variables. Figure 8 SHAP feature importance for the Group A XGBoost model.
Figure 9 SHAP feature importance for the Group B Random Forest model.
Figure 10 SHAP feature importance for the Group B XGBoost model.
Without medication data, the Random Forest model (Figure 9) emphasized physically_active, avg_kcal, avg_fiber, avg_fat, and avg_sugar as the top predictors. Socioeconomic status (income_level) maintained moderate influence, while reduced-sugar intake, lifestyle effort, and eating-out frequency also contributed. The likely_dieting feature remained near the lower end of the ranking, and eats_out_rarely contributed negligibly. The XGBoost model for Group B (Figure 10) also placed avg_kcal, avg_fat, avg_sugar, and avg_fiber at the top, followed by physically_active. Income level and lifestyle effort were mid-ranked, while reduced-calorie and reduced-sugar indicators appeared lower in importance.
Overall, dietary intake variables (avg_kcal, avg_fat, avg_sugar, avg_fiber) and physically_active consistently ranked among the top predictors in both groups, highlighting the central role of lifestyle factors in model decisions. In Group A, where medication data were included, the highest-ranked medication-related variable (med_class_None) appeared first in the Random Forest model and sixth in XGBoost, while other medication
170
Made with FlippingBook flipbook maker