AAI_2025_Capstone_Chronicles_Combined
14
feature representation's computational efficiency came at the cost of discarding contextual information that may be crucial for disambiguating semantically similar phrases with different underlying needs. Continued development should pursue three primary directions building on the optimized baseline. First, transitioning from TF-IDF to fine-tuned BERT embeddings represents the most promising path for capturing semantic nuances, with DistilBERT fine-tuning enabling contextual distinction between phrases like "stuck in building" versus "stuck at shelter" and potential 8-15% improvements in critical category F1-scores (Alam et al., 2018). Second, advanced resampling and data augmentation strategies including SMOTE, back-translation, and few-shot learning could potentially improve rarest category F1-scores by 10-20%, though ultimately requiring collection of additional labeled messages from diverse disaster contexts. Third, productionization infrastructure would enable real-world deployment through comprehensive ML pipelines incorporating real-time data ingestion, threshold-tuned model inference, REST API integration with emergency management systems, and active learning frameworks for continuous retraining with monitoring dashboards tracking performance degradation. The deployment of automated disaster message classification systems carries both substantial humanitarian promise and important ethical responsibilities. The potential to reduce message triage time from minutes to seconds could accelerate life-saving interventions when response speed directly impacts mortality rates, though algorithmic bias, automation bias among operators, and performance degradation in novel contexts require proactive mitigation through
diverse training data, mandatory human review protocols, and transparent communication of system limitations. Responsible deployment must prioritize equity across demographic groups to avoid systematically disadvantaging populations with different linguistic patterns, literacy levels, or communication styles, requiring ongoing collaboration with affected communities, domain experts, and disaster response organizations to validate that the technology saves lives rather than introducing new sources of harm. The success of this project should ultimately be measured not by F1-scores but by its contribution to faster, more equitable emergency response that reduces suffering and saves lives in the chaotic aftermath of natural disasters. In conclusion, this project demonstrates that systematic multi-stage optimization can meaningfully improve disaster message classification efficiency, with the final XGBoost model representing a viable foundation for hybrid human-AI systems in disaster response. The optimization methodology developed, provides a replicable framework for addressing severe class imbalance in other multi-label classification domains beyond disaster response. ACKNOWLEDGMENTS The authors would like to thank Professor Anna Marbut for her invaluable guidance and feedback throughout this project. Works Cited Alam, F., Ofli, F., & Imran, M. (2018). CRISISMMD: Multimodal Twitter datasets from natural disasters. Proceedings of the International AAAI Conference on Web and Social Media, 12 (1). https://doi.org/10.1609/icwsm.v12i1.14983
330
Made with FlippingBook - Share PDF online