ADS Capstone Chronicles Revised
12
parameter tuning is essential to optimize the model’s generalization capability. Following parameter tuning for Gradient Boosting, we identified the best hyperparameters as {‘learning_rate’: 0.2, ‘max_depth’: 5, ‘min_samples_leaf’: 1, ‘min_samples_split’: 10, ‘n_estimators’: 100}. This refinement yielded a testing accuracy of 88.28%. As we do not experience significant improvement, caution is warranted due to the persistent indication of overfitting. The SVM model demonstrates a moderate level of performance on both the training and testing sets, with an accuracy of 74.00% on the training set and 74.45% on the testing set. SVM shows Figure 7 Comparison of Model Performance Metrics
consistency in its predictive ability, although the precision, recall, and F1-score metrics suggest slightly better performance for predicting home team win (1) compared to away team win (0). The AUC-ROC score indicates reasonable discrimination capability, with potential for improvement. The Gaussian Naive Bayes model exhibits comparable performance to SVM, with an accuracy of 73.48% on the training set and 70.83% on the testing set. Though the precision, recall, and F1-score metrics are consistent, indicating a balanced performance across both classes, the AUC-ROC score is relatively lower compared to SVM. This suggests the model’s discriminatory ability may not be as strong.
Due to the suboptimal accuracy, precision, and recall of SVM and Gaussian Naive Bayes, we will not consider these two models. Additionally, despite attempts to address overfitting through
parameter tuning, both Gradient Boosting and Random Forest models still exhibit signs of overfitting. Therefore, we conclude logistic regression has the best performance among all
39
Made with FlippingBook - Online Brochure Maker