M.S. Applied Data Science - Capstone Chronicles 2025
34
Table 6 Class-Specific Performance Metrics for the Top Ranked Model (Random Forest)
Class
Precision
Recall
F1-score
Support
Class I
0.9368
0.9144
0.9255
1,671
Class II
0.9450
0.9671
0.9560
5,655
Class III
0.7566
0.6296
0.6873
548
Weighted Avg
0.9302
0.9324
0.9308
7,874
Note. Support indicates the number of test samples in each class.
Figure 18 Receiver Operating Characteristic Curves for All Classes
random forest classifier. out-of-bag(OOB) error estimation was employed as a measure of generalization error. Three configurations were explored: max_features = “sqrt,” max_features = “log2,” and max_features = none (i.e., using all features). For each setting, models were trained using tree counts ranging from 15 to 150, increasing in increments of 5. The trends illustrated in Figure 19 reveal that increasing the number of trees consistently reduced OOB error across all configurations. However, diminishing returns were observed beyond approximately 100 trees. Among the three settings, the square root strategy (max_features = “sqrt”) resulted in the lowest OOB error, indicating superior generalization performance. Conversely, using all available features (max_features=None) produced the highest OOB error, potentially due to increased variance and overfitting.
An auxiliary analysis was conducted to examine how the number of trees and the method of feature selection affect model performance in a
38
Made with FlippingBook flipbook maker