M.S. Applied Data Science - Capstone Chronicles 2025

18

Table 5 Model performance metrics for group B. Model Accuracy

Precision

Recall

F1-score

ROC-AUC

Logistic Regression

0.57

0.60

0.57

0.58

0.60

Random Forest

0.90

0.90

0.90

0.90

0.97

XGBoost

0.89

0.89

0.89

0.89

0.96

SVM

0.73

0.73

0.73

0.73

0.77

MLP

0.90

0.89

0.90

0.90

0.90

(avg_sugar, reduced_sugar) were mid-ranked, while features such as likely_dieting and eats_out_rarely contributed minimally. Figure 7 SHAP feature importance for the Group A Random Forest model.

5.3 Feature Importance Feature importance was evaluated for the top two performing models in each group—Random Forest and XGBoost—using SHAP values to quantify the contribution of each predictor to model output (Figures 7–10). SHAP values provide a consistent, model-agnostic measure of feature influence by estimating the change in predicted probability when a given feature shifts from its baseline value. Positive SHAP values indicate a higher likelihood of metabolic syndrome, while negative values indicate a lower likelihood. For the Random Forest model (Figure 7), the most influential features included med_class_None, physically_active, and dietary measures such as avg_kcal, avg_fiber, and avg_fat. Medication indicators such as med_class_MetSyn-related and med_class_Other also ranked highly, suggesting that medication status played a substantial role in prediction. Socioeconomic status (income_level) and sugar-related variables

In the XGBoost model (Figure 8), dietary variables dominated the top ranks, with avg_kcal, avg_sugar, avg_fat, and avg_fiber

169

Made with FlippingBook flipbook maker