AAI_2025_Capstone_Chronicles_Combined
14
(NHANES I Survival Model — SHAP Latest Documentation, 2018) was again applied to interpret
feature importance (see Figure 5).
Results/Conclusion
Our evaluation of four machine learning models, XGBoost, CNN-LSTM,
PatchTSTForClassification (PatchTST, 2025), and ICUStaticFusionTimesFM (TimesFM, 2025),
revealed distinct strengths and trade-offs in predicting ICU mortality. XGBoost delivered the highest
overall accuracy (0.86) and precision, making it well-suited for decision support systems that prioritize
reliability and minimizing false alarms. However, its lower recall suggests a risk of missing critical
mortality cases. In contrast, the CNN-LSTM model achieved the highest recall (0.83), demonstrating
strong sensitivity to high-risk patients and offering value for early risk stratification, though its precision
was notably lower. The transformer-based models, PatchTST and ICUStaticFusionTimesFM, offered a
balanced approach achieving consistent recall (0.75 and 0.67 respectively), moderate F1 scores, and
enhanced interpretability through temporal modeling and multimodal fusion (See Figure 6).
These performance patterns have clear clinical implications. High-recall models like CNN-LSTM
and the transformer variants are valuable for boosting ICU vigilance and flagging likely deaths, but they
may increase the burden of false alerts and unnecessary interventions. XGBoost’s precision -focused
behavior reduces false positives, supporting more conservative decision-making, though at the cost of
sensitivity. The transformer models strike a middle ground, balancing interpretability and recall, which
may enable more reliable and actionable predictions in real-world ICU settings. Across all models, the
observed trade-off between recall and precision reflects the challenge of working with imbalanced clinical
data. SHAP analysis (NHANES I Survival Model — SHAP Latest Documentation, 2018) (See figure 5)
160
Made with FlippingBook - Share PDF online