ADS Capstone Chronicles Revised

First page Table of contents Previous page 39 Next page Last page

parameter tuning is essential to optimize the model’s generalization capability. Following parameter tuning for Gradient Boosting, we identified the best hyperparameters as {‘learning_rate’: 0.2, ‘max_depth’: 5, ‘min_samples_leaf’: 1, ‘min_samples_split’: 10, ‘n_estimators’: 100}. This refinement yielded a testing accuracy of 88.28%. As we do not experience significant improvement, caution is warranted due to the persistent indication of overfitting. The SVM model demonstrates a moderate level of performance on both the training and testing sets, with an accuracy of 74.00% on the training set and 74.45% on the testing set. SVM shows Figure 7 Comparison of Model Performance Metrics

consistency in its predictive ability, although the precision, recall, and F1-score metrics suggest slightly better performance for predicting home team win (1) compared to away team win (0). The AUC-ROC score indicates reasonable discrimination capability, with potential for improvement. The Gaussian Naive Bayes model exhibits comparable performance to SVM, with an accuracy of 73.48% on the training set and 70.83% on the testing set. Though the precision, recall, and F1-score metrics are consistent, indicating a balanced performance across both classes, the AUC-ROC score is relatively lower compared to SVM. This suggests the model’s discriminatory ability may not be as strong.

Due to the suboptimal accuracy, precision, and recall of SVM and Gaussian Naive Bayes, we will not consider these two models. Additionally, despite attempts to address overfitting through

parameter tuning, both Gradient Boosting and Random Forest models still exhibit signs of overfitting. Therefore, we conclude logistic regression has the best performance among all

Made with FlippingBook - Online Brochure Maker