M.S. Applied Data Science - Capstone Chronicles 2025
29
By integrating resampling, feature selection, and model training into one cohesive flow, consistent preprocessing was ensured during both training and eventual deployment. This comprehensive approach increases confidence that the model will generalize well to unseen data while maintaining fairness across all target classes. 5 Results and Findings 5.1 Model’s Performance The comparative analysis of classification models revealed considerable variability in performance across algorithms. Tree-based methods, particularly ensemble and decision tree models,
demonstrated superior classification capabilities based on accuracy. Table 3 shows that random forest achieved the highest F1-score (0.9215), followed by decision tree (0.8986), with a marginal difference of 0.0229. MLP and XGBoost and exhibited moderate F1-scores of 0.8810 and 0.8781, respectively, while logistic regression underperformed with an F1-score of 0.6894. The tree-based models’ superior performance suggests that the classification problem involves complex, non-linear decision boundaries that these models are well-suited to capture.
Table 3 Optimal Hyperparameter Configurations for Each Model
Model
Optimal Hyperparameters
Logistic regression
C = 0.1, penalty = L2
Decision tree
max_depth = None, min_samples_leaf = 1, min_samples_split = 2
Random forest
n_estimators = 200, max_depth = None, min_samples_leaf = 1, min_samples_split = 2 n_estimators = 200, learning_rate = 0.1, max_depth = 6, subsample = 0.8, colsample_bytree = 1.0 hidden_layer_sizes = (50, 50), activation = tanh, alpha = 0.001, learning_rate = constant
XGBoost
MLPClassifier
Note . Optimal configurations were determined using grid search with F1-weighted score as the optimization metric.
33
Made with FlippingBook flipbook maker