AAI_2025_Capstone_Chronicles_Combined

8

feature extraction with the dataset (LeCun et al., 2015). The architectural design choices focused on balancing computational efficiency with the ability to capture complex non-linear patterns in the data. The final optimized XGBoost model utilized a tree-based architecture with automated hyperparameter tuning. The optimal configuration included 200 estimators, a maximum tree depth of 5 to control model complexity and prevent overfitting, and a learning rate of 0.2 to balance training speed with convergence stability. Additionally, we utilized class frequency weighting (Ko, 2012), calculated for each label based on the ratio of negative to positive samples. This weighting factor effectively scales the loss function, instructing the model to treat errors on rare categories with proportionally higher severity than those on common ones. This parameter essentially tells the model how much more seriously to treat errors on rare categories compared to common ones. As a comparative baseline, the DNN was designed with a "funnel" architecture to compress sparse input features into dense representations. It consisted of three hidden layers with 512, 256, and 128 nodes respectively, utilizing ReLU activation functions. To mitigate overfitting on the sparse TF-IDF vectors, Batch Normalization and Dropout layers with rates of 0.3 and 0.2 were injected after each dense layer. The output layer utilized Sigmoid activation to output independent probabilities for each of the 36 classification categories. The initial baseline models exhibited high precision but suffered from low recall on minority classes, a common issue when certain labels such as "medical help" are rare. A three-phase optimization strategy was implemented to address this challenge (1)

hyperparameter tuning, (2) cost-sensitive learning through per-label class weighting, and (3) per-label threshold optimization. 5 Results Three distinct modeling approaches were initially evaluated with the XGBoost model emerging as the strongest performer with a micro-averaged F1-score of 0.676, demonstrating superior precision (0.790) though with moderate recall (0.590). The Neural Network with threshold tuning achieved competitive performance (micro F1=0.650) with more balanced precision-recall characteristics. When looking at the model loss for our neural network over time, as seen in Figure 3, there are signs of overfitting happening between epochs 4 through 8, as the model began to become worse at predicting new data.

Figure 3 Model loss for Neural Networks model

The Classifier Chains approach underperformed despite its theoretical capacity to capture label dependencies. Based on these comparative results

324

Made with FlippingBook - Share PDF online