M.S. Applied Data Science - Capstone Chronicles 2025

30

Table 4 Cross-Validation Performance Metrics for Each Model

Model

F1-score (CV)

Rank

Random forest

0.9215

1

Decision tree

0.8986

2

MLPClassifier

0.8810

3

XGBoost

0.8781

4

Logistic regression

0.6894

5

Note. F1-scores represent the weighted average across all classes during 5-fold cross-validation.

Table 5 Test Set Performance Metrics for Each Model

Model

Accuracy

Precision

Recall

F1-score

Random rorest

0.9324

0.9302

0.9324

0.9308

Decision tree

0.9106

0.9122

0.9106

0.9113

MLPClassifier

0.8799

0.8951

0.8799

0.8855

XGBoost

0.8759

0.8790

0.8759

0.8772

Logistic regression

0.6562

0.7711

0.6562

0.6931

Note. All metrics represent weighted averages across all classes. Model precision evaluation indicated systematic differences in the ability to minimize false positive predictions among the five machine learning approaches. The tree-based ensemble method emerged as particularly effective at correctly classifying positive instances. Table 3 illustrates that random forest achieved the highest

precision score (0.9308), with decision tree following closely (0.9137), demonstrating these models’ superior ability to minimize false positive classifications. MLP and XGBoost demonstrated comparable precision metrics (0.8930 and 0.8794, respectively), positioning them as moderately effective classifiers regarding

34

Made with FlippingBook flipbook maker