AAI_2025_Capstone_Chronicles_Combined

‭ResolveAI‬

‭After this, the text is tokenized using a custom tokenizer that includes an‬ ‭out-of-vocabulary token and appropriate filtering of punctuation and special characters. The‬ ‭tokenized sequences are then padded to the optimal maximum length, and these padded‬ ‭sequences are concatenated with additional numerical features that have been reshaped‬ ‭appropriately, resulting in a comprehensive feature set. The labels are subsequently converted‬ ‭into a NumPy array for use during training.‬ ‭The training data is split into training and test sets using an 80/20 ratio with stratification‬ ‭to ensure that the class distribution is maintained in both subsets. This stratified, random split is‬ ‭essential for achieving a representative evaluation of the model’s performance.‬‭Training is‬ ‭configured to run for up to 30 epochs using a small batch size of 32 to allow for more granular‬ ‭updates to the model weights.‬ ‭An early stopping mechanism is implemented to monitor the validation loss and halt‬ ‭training if the loss does not improve for 10 consecutive epochs, thereby preventing overfitting. In‬ ‭parallel, the training process employs a model checkpointing strategy that saves the‬ ‭best-performing model based on validation loss, ensuring that only the model with the lowest‬ ‭observed validation loss is preserved. This combination of early stopping and checkpointing‬ ‭allows the training to be both efficient and robust, terminating the process when improvements‬ ‭plateau and retaining the optimal model configuration.‬ ‭Model performance is evaluated using standard metrics such as accuracy, precision, and‬ ‭recall; these metrics offer transparency and insight into both the overall and class-specific‬ ‭effectiveness of the model. Additionally, F1-score, derived from precision and recall, is used in‬ ‭the analysis to provide a balanced measure of performance on the imbalanced binary‬ ‭classification task.‬

‭17‬

65

Made with FlippingBook - Share PDF online