AAI_2025_Capstone_Chronicles_Combined

6

model

can

effectively

detect

critical,

social media platforms have become primary sources of real-time disaster information requiring sophisticated natural language processing systems to process high-volume, multilingual emergency communications (Imran et al., 2015). Traditional keyword-based filtering and rule-based systems proved inadequate for handling linguistic diversity and contextual nuance in real-world emergency communications (Castillo, 2016). The introduction of supervised machine learning marked a significant advancement, with early approaches using Support Vector Machines, Naive Bayes, and Random Forests achieving reasonable performance when combined with engineered text features (Imran et al., 2015). More recent work has explored deep learning approaches, with Burel et al. (2017) demonstrating that neural network architectures could capture contextual relationships missed by bag-of-words representations through semantic analysis and deep learning integration. Multi-label classification, where messages can belong to multiple categories simultaneously, poses unique challenges for disaster response where messages often express multiple concurrent needs. Read et al. (2011) developed the Classifier Chains algorithm, which models label correlations by transforming multi-label problems into sequences of binary classification tasks, consistently outperforming independent binary classifiers and Label Powerset approaches that struggle with exponential class growth and poor generalization to unseen label combinations. Traditional text representation relies on TF-IDF vectorization, which remains popular in disaster response applications due to its interpretability, computational efficiency, and effectiveness with limited training data. Devlin et al. (2019) introduced BERT, a transformer-based model

low-prevalence events.

Table 1

Category labels are ranked by imbalance ratio, showing severe imbalance for rare categories like "other_infrastructure" (20:1) and extreme scarcity for categories like "shops" (211:1), necessitating cost-sensitive learning approaches. 3 Literature Review The challenge of automated disaster message classification has attracted significant attention as

322

Made with FlippingBook - Share PDF online