M.S. Applied Data Science - Capstone Chronicles 2025

27

Figure 13

(lower continuous line) and validation loss (upper dashed line). A substantial discrepancy is observed between the training and validation performance metrics across all evaluated training data fractions. The training loss consistently manifests at significantly lower values compared to the validation loss, with an approximate difference ranging from 0.15 to 0.30 log loss units. This persistent performance differential constitutes compelling evidence of a model that has acquired representations highly specific to the training dataset at the expense of generalizability (James et al., 2021). Further analysis reveals that while the training loss demonstrates a consistent downward trajectory as the training data fraction increases, the validation loss does not exhibit commensurate improvement. This pattern suggests that additional training data enables the model to further optimize its performance on known examples without proportional enhancement of its predictive capacity on novel data. Such behavior is characteristic of algorithmic overfitting, wherein a model’s complexity permits the encoding of training data idiosyncrasies rather than underlying generalizable patterns (Hastie et al., 2020). In conclusion, the presented visualization provides substantive evidence that the random forest implementation has exceeded optimal complexity for the given task, resulting in compromised generalization performance despite favorable training metrics.

F1-score VS Training Size (Random Forest)

As the training size increases, the log loss plot shows that the training loss steadily decreases, indicating that the model benefits from additional data. Validation loss improves noticeably up to around 40% of the data, after which the improvements become more gradual. This trend suggests that early gains from adding data are significant, but further improvements require more substantial increases in training size. Similarly, the F1-score plot demonstrates that the model fits the training data well, as reflected in the high training F1-scores. The validation F1-score also increases with larger training sizes, showing that the model generalizes better as more data is added. However, the visible gap between training and validation scores highlights signs of overfitting. Since both validation curves are still improving at the maximum training size used, it indicates that collecting more data would likely continue to enhance the model’s performance. 4.5.11 Final Model Training Methods The final model selection process began by identifying the best-performing model based on cross-validation results. Among all candidates,

31

Made with FlippingBook flipbook maker