AAI_2025_Capstone_Chronicles_Combined

7

generating contextualized embeddings that achieved state-of-the-art performance across numerous benchmarks, though Zhang et al. (2019) found that performance gaps between BERT and TF-IDF narrowed considerably with limited training data or prominent domain-specific vocabulary, suggesting both approaches merit consideration for specialized domains like disaster response. Applications to disaster response demonstrate the practical challenges of multi-label classification in emergency contexts. Nguyen et al. (2016) showed that multi-task learning frameworks modeling relationships between information types and credibility improved performance compared to independent classifiers. Madichetty and Sridevi (2020) found that bidirectional LSTM models achieved best overall performance for disaster tweet classification, though class imbalance significantly degraded performance for critical but rare categories like search and rescue without explicit mitigation strategies. Huang et al. (2021) demonstrated that ensemble methods combining multiple classifiers often outperformed single sophisticated models when dealing with imbalanced data, with threshold optimization for probabilistic predictions providing methodological guidance for multi-label scenarios. Alam et al. (2018) developed the CrisisNLP framework emphasizing that classification systems must process thousands of messages per minute, finding that simpler models with optimized preprocessing often proved more operationally viable than complex deep learning approaches requiring extensive computational resources. Domain-specific fine-tuning has shown substantial benefits for specialized classification tasks. Manin and Goutte (2018) found that domain-specific fine-tuning of pre-trained models combined with feature engineering to capture

specialized superior performance compared to generic language models. Kersten et al. (2020) demonstrated that even modest amounts of domain-specific pre-training on disaster-related data substantially improved BERT classification performance, particularly for categories involving domain-specific terminology and implicit information. Building on this literature, our approach combines TF-IDF vectorization with XGBoost classification, prioritizing computational efficiency and interpretability over complex deep learning architectures. We specifically address class imbalance through cost-sensitive learning and threshold optimization, techniques shown effective in prior disaster classification work (Huang et al., 2021; Alam et al., 2018), while focusing on multi-label prediction to capture the multiple simultaneous needs present in disaster communications. 4 Methodology To address the multi-label classification challenge of categorizing disaster response messages, three distinct machine learning architectures were evaluated: a Gradient Boosted Decision Tree (GBDT) ensemble, a probabilistic Classifier Chain, and a Deep Neural Network (DNN). The primary model selected for deployment was XGBoost (Extreme Gradient Boosting) based on its superior performance on high-dimensional sparse text data, interpretability through feature importance scores, and computational efficiency compared to deep learning alternatives (Chen & Guestrin, 2016). We also explored a Classifier Chain using Logistic Regression to model label dependencies (Read et al., 2011), and a Neural Network was trained to test the efficacy of deep learning vocabulary yielded

323

Made with FlippingBook - Share PDF online