AAI_2025_Capstone_Chronicles_Combined

14

(NHANES I Survival Model — SHAP Latest Documentation, 2018) was again applied to interpret

feature importance (see Figure 5).

Results/Conclusion

Our evaluation of four machine learning models, XGBoost, CNN-LSTM,

PatchTSTForClassification (PatchTST, 2025), and ICUStaticFusionTimesFM (TimesFM, 2025),

revealed distinct strengths and trade-offs in predicting ICU mortality. XGBoost delivered the highest

overall accuracy (0.86) and precision, making it well-suited for decision support systems that prioritize

reliability and minimizing false alarms. However, its lower recall suggests a risk of missing critical

mortality cases. In contrast, the CNN-LSTM model achieved the highest recall (0.83), demonstrating

strong sensitivity to high-risk patients and offering value for early risk stratification, though its precision

was notably lower. The transformer-based models, PatchTST and ICUStaticFusionTimesFM, offered a

balanced approach achieving consistent recall (0.75 and 0.67 respectively), moderate F1 scores, and

enhanced interpretability through temporal modeling and multimodal fusion (See Figure 6).

These performance patterns have clear clinical implications. High-recall models like CNN-LSTM

and the transformer variants are valuable for boosting ICU vigilance and flagging likely deaths, but they

may increase the burden of false alerts and unnecessary interventions. XGBoost’s precision -focused

behavior reduces false positives, supporting more conservative decision-making, though at the cost of

sensitivity. The transformer models strike a middle ground, balancing interpretability and recall, which

may enable more reliable and actionable predictions in real-world ICU settings. Across all models, the

observed trade-off between recall and precision reflects the challenge of working with imbalanced clinical

data. SHAP analysis (NHANES I Survival Model — SHAP Latest Documentation, 2018) (See figure 5)

160

Made with FlippingBook - Share PDF online