ADS Capstone Chronicles Revised

10

Figure 6 Accuracy Score of Validation and Test Set

due to the class imbalance of the PII labels during the modeling stage, as the training dataset saw large occurrences of student_name labels compared to other labels. Figure 5 F1 Score of Validation and Test Set

6 Discussion Text

classification tasks require an understanding of the context in which words are being used. In the context of detecting PII, one priority is to prevent false negative and false positive detections otherwise sensitive information on an individual could be disseminated to the public. Several studies have used machine learning algorithms to automate the process of detecting and removing PII in many contexts such as the medical, financial, and educational industries. In this study, the RF model appeared to perform the best in detecting PII. It was important to use supervised learning models for this study to have a high level of understanding and interpretability as sensitive data was being used to train these models. Given the limited timeframe of this project, more hyperparameter tuning and evaluation of the models used in this study could have been done to provide a better understanding of what

Lastly, the accuracy of the model performances are seen in Figure 6. This metric evaluates the proportion of correct predictions within all predictions made. The K-NN and XGBoost models were highly inaccurate in the test set, despite having high performance during the validation set. On the other hand, the logistic regression model appeared to perform the best, followed by the RF models. The four classification metrics imply that the model may have been overfitted on the training data, which was why these models performed well on the validation set, but not on the test set. Additionally, there may have been random variation that causes a discrepancy between the validation and test set. The baseline logistic regression model and RF models performed the best, followed by the K-NN and XGBoost models.

14

Made with FlippingBook - Online Brochure Maker