M.S. Applied Data Science - Capstone Chronicles 2025

First page Table of contents Previous page 34 Next page Last page

Table 4 Cross-Validation Performance Metrics for Each Model

Model

F1-score (CV)

Rank

Random forest

0.9215

Decision tree

0.8986

MLPClassifier

0.8810

XGBoost

0.8781

Logistic regression

0.6894

Note. F1-scores represent the weighted average across all classes during 5-fold cross-validation.

Table 5 Test Set Performance Metrics for Each Model

Model

Accuracy

Precision

Recall

F1-score

Random rorest

0.9324

0.9302

0.9324

0.9308

Decision tree

0.9106

0.9122

0.9106

0.9113

MLPClassifier

0.8799

0.8951

0.8799

0.8855

XGBoost

0.8759

0.8790

0.8759

0.8772

Logistic regression

0.6562

0.7711

0.6562

0.6931

Note. All metrics represent weighted averages across all classes. Model precision evaluation indicated systematic differences in the ability to minimize false positive predictions among the five machine learning approaches. The tree-based ensemble method emerged as particularly effective at correctly classifying positive instances. Table 3 illustrates that random forest achieved the highest

precision score (0.9308), with decision tree following closely (0.9137), demonstrating these models’ superior ability to minimize false positive classifications. MLP and XGBoost demonstrated comparable precision metrics (0.8930 and 0.8794, respectively), positioning them as moderately effective classifiers regarding

Made with FlippingBook flipbook maker