ADS Capstone Chronicles Revised
8
baseline logistic regression model had the fastest runtime, most likely due to the lack of parameters. 5.1.1 Model Runtime. When looking at hyperparameter tuning methods, randomized search is generally considered more efficient because it randomly selects a limited number of parameters to run, while grid search runs through multiple iterations over the same parameter grid. The XGBoost model used randomized search to randomly run fit several parameters such as max depth, learning rate, number of estimators, and subsamples. On the other hand, the K-NN and RF models were trained using grid search to perform hyperparameter tuning. The XGBoost model had a longer runtime, even with the more efficient parameter tuning method. Figure 2 Model Runtime (seconds)
Both K-NN models had similar performance despite using different hyperparameter tuning methods and identifying a different amount of optimal neighbors. While the initial K-NN found k = 10 to be the best, the grid search found k = 5 to perform the best. This model performed the best during the validation stage when looking at the precision of evaluating PII labels. The baseline logistic regression model performed best when looking at the recall performance of all models. The exception was the XGBoost model, which was marginally better than the baseline model. The XGBoost model also performed the best when evaluating the F1 score. The results from the Presidio analyzer were the lowest performing model. The F1 and precision scores were much lower in comparison to the other models. Table 3 Model Validation Performance Precision and Recall Metrics Model Precision Recall Logistic regression 0.7405 0.8605 RF 0.8859 0.7815 RF - grid search 0.7810 0.7810 K-NN 0.9897 0.7827 K-NN - grid search 0.9150 0.7844 XGBoost 0.8118 0.8642 Presidio 0.0727 0.2582 Table 4 Model Validation Performance F1 and Accuracy Metrics Model F1 Accuracy Logistic regression 0.7960 0.8605 RF 0.7821 0.8605 RF - grid search 0.7810 0.8605
5.1.2 Validation Performance. As seen in Table 3 and 4, the initial validation performance of the models were calculated taking into account classification metrics such as accuracy, precision, recall, and F1, the models had similar performance in the accuracy of predicting PII labels. More variation in performance metrics can be seen in the precision, recall, and F1 score.
12
Made with FlippingBook - Online Brochure Maker