AAI_2025_Capstone_Chronicles_Combined
11
testing thresholds from 0.1 to 0.9 in 0.05 increments for each of the 34 categories. This delivered substantial performance gains: micro F1 improved from 0.625 to 0.682 (+9.1%) with precision reaching 0.657 and recall 0.708. Optimal thresholds varied dramatically from 0.20 for high-frequency categories to 0.80 for rare categories, reflecting different base rates and prediction confidence distributions. As seen in Table 4, critical category thresholds were: medical help (0.50, F1=0.505), search and rescue (0.75, F1=0.365), water (0.75, F1=0.701), food (0.60, F1=0.779), and shelter (0.80, F1=0.675). Alternative recall-focused thresholds targeting 70% minimum recall were explored for critical categories, revealing that medical help could achieve 71.4% recall at threshold 0.35 (F1=0.437, precision=0.315) and search and rescue 75.3% recall at threshold 0.20 (F1=0.091, precision=0.048), though the extremely low precision would create operational burden from high false positive rates. Table 4
Optimal thresholds for XGBoost model categories The final optimized XGBoost model combining cost-sensitive learning, per-label threshold optimization, and category-specific hyperparameter tuning achieved micro F1=0.682 on validation, representing 9.2% and 8.2% improvements over baseline as seen in Table 5. Critical category F1-scores demonstrate strong performance for basic needs, food (0.782), water (0.711), shelter (0.675), with medical help reaching acceptable levels (0.513) for hybrid deployment. Search and rescue remains
327
Made with FlippingBook - Share PDF online