M.S. Applied Data Science - Capstone Chronicles 2025
18
Table 5 Model performance metrics for group B. Model Accuracy
Precision
Recall
F1-score
ROC-AUC
Logistic Regression
0.57
0.60
0.57
0.58
0.60
Random Forest
0.90
0.90
0.90
0.90
0.97
XGBoost
0.89
0.89
0.89
0.89
0.96
SVM
0.73
0.73
0.73
0.73
0.77
MLP
0.90
0.89
0.90
0.90
0.90
(avg_sugar, reduced_sugar) were mid-ranked, while features such as likely_dieting and eats_out_rarely contributed minimally. Figure 7 SHAP feature importance for the Group A Random Forest model.
5.3 Feature Importance Feature importance was evaluated for the top two performing models in each group—Random Forest and XGBoost—using SHAP values to quantify the contribution of each predictor to model output (Figures 7–10). SHAP values provide a consistent, model-agnostic measure of feature influence by estimating the change in predicted probability when a given feature shifts from its baseline value. Positive SHAP values indicate a higher likelihood of metabolic syndrome, while negative values indicate a lower likelihood. For the Random Forest model (Figure 7), the most influential features included med_class_None, physically_active, and dietary measures such as avg_kcal, avg_fiber, and avg_fat. Medication indicators such as med_class_MetSyn-related and med_class_Other also ranked highly, suggesting that medication status played a substantial role in prediction. Socioeconomic status (income_level) and sugar-related variables
In the XGBoost model (Figure 8), dietary variables dominated the top ranks, with avg_kcal, avg_sugar, avg_fat, and avg_fiber
169
Made with FlippingBook flipbook maker