M.S. Applied Data Science - Capstone Chronicles 2025
31
precision. while underperforming relative to other models, achieved its highest comparative metric in precision (0.7711), suggesting that when this model predicts positive classes, it maintains reasonable reliability despite its overall limitations. The precision metric reveals that all models maintain stronger performance in avoiding false positives than in other evaluation dimensions. The evaluation of recall performance revealed important insights regarding each model’s ability to identify true positive cases within the dataset. The hierarchical methods demonstrated remarkable sensitivity across the classification task. Table 3 presents evidence that random forest demonstrated exceptional recall performance (0.9331), identical to its accuracy score, indicating balanced classification capabilities across positive instances. Decision tree maintained strong recall (0.9124), while MLP and XGBoost showed nearly identical recall values (0.8772 and 0.8760, respectively). Logistic regression exhibited its poorest comparative performance in recall (0.6562), indicating substantial difficulty in identifying true positive cases. This pronounced deficiency in recall compared to precision (0.7711) reveals that logistic regression’s primary weakness lies in its tendency toward false negative classifications, failing to identify a significant proportion of positive instances in the dataset. The F1-score analysis provided a comprehensive evaluation of balanced performance across precision and recall dimensions, confirming the overall ranking of model effectiveness. The integration of both metrics revealed consistent patterns across algorithms. Table 3 displays that random forest achieved the highest F1-score Logistic regression,
(0.9314), confirming its superior balanced performance in both precision and recall dimensions. Decision tree maintained competitive performance with an F1-score of 0.9129, while MLP and XGBoost produced moderately effective F1-scores of 0.8830 and 0.8774, respectively. Logistic regression’s F1-score (0.6931) reflects its compromised balance between precision and recall, with the metric value predictably falling between its precision and recall scores but closer to the lower recall value. The F1-score pattern across models reinforces the finding that tree-based methods, particularly random forest, provide optimal classification performance for this particular dataset and problem domain. 5.2 Model Performance Across Classification Classes The analysis of model performance for Class I revealed consistent and robust classification capabilities across most algorithms, with particularly strong results from tree-based methods. All models except logistic regression demonstrated high precision, recall, and F1-scores exceeding 0.90, indicating effective discrimination of Class I instances. random forest exhibited marginally superior performance with near-perfect metric values, followed closely by decision tree and XGBoost with comparable effectiveness. Figure 16 demonstrates this pattern clearly, showing the balanced performance across metrics for the top four models, while highlighting logistic regression’s notably inferior performance (approximately 0.65-0.75 across metrics). This substantial performance gap suggests that Class I instances likely feature complex, non-linear decision boundaries that linear models struggle to capture effectively, while hierarchical approaches excel at
35
Made with FlippingBook flipbook maker