M.S. Applied Data Science - Capstone Chronicles 2025

34

Table 6 Class-Specific Performance Metrics for the Top Ranked Model (Random Forest)

Class

Precision

Recall

F1-score

Support

Class I

0.9368

0.9144

0.9255

1,671

Class II

0.9450

0.9671

0.9560

5,655

Class III

0.7566

0.6296

0.6873

548

Weighted Avg

0.9302

0.9324

0.9308

7,874

Note. Support indicates the number of test samples in each class.

Figure 18 Receiver Operating Characteristic Curves for All Classes

random forest classifier. out-of-bag(OOB) error estimation was employed as a measure of generalization error. Three configurations were explored: max_features = “sqrt,” max_features = “log2,” and max_features = none (i.e., using all features). For each setting, models were trained using tree counts ranging from 15 to 150, increasing in increments of 5. The trends illustrated in Figure 19 reveal that increasing the number of trees consistently reduced OOB error across all configurations. However, diminishing returns were observed beyond approximately 100 trees. Among the three settings, the square root strategy (max_features = “sqrt”) resulted in the lowest OOB error, indicating superior generalization performance. Conversely, using all available features (max_features=None) produced the highest OOB error, potentially due to increased variance and overfitting.

An auxiliary analysis was conducted to examine how the number of trees and the method of feature selection affect model performance in a

38

Made with FlippingBook flipbook maker