AAI_2025_Capstone_Chronicles_Combined

8

LSTMs in mortality and length-of-stay prediction using datasets like MIMIC-III (Zhang et al., 2022).

Benchmark architectures such as Informer (Zhou et al., 2021), Autoformer (Wu et al., 2021), and

PatchTST (Nie et al., 2023) have introduced innovations in long-sequence forecasting, including attention

sparsity and seasonal- trend decomposition. Commercial efforts like Google’s TimesFM (Das et al., 2023)

further illustrate the potential for scalable, zero-shot adaptation in dynamic clinical environments.

Building on these foundations, this project develops an interpretable and scalable pipeline for

ICU mortality prediction by integrating temporal dynamics of physiological signals with static patient

context.

Experimental Methods

In this section we provide an overview of the experimental methods we implemented for our

machine learning models based on XGBoost, CNN-LSTM and Transformers.

XGBoost

We implemented a multi-stage preprocessing pipeline to ensure data quality, prevent data

leakage, and handle missing values. The process began with a data cleaning phase in which we loaded the

aggregated feature set and removed variables unavailable at prediction time, such as outcome-related and

post-discharge information. Exploratory data analysis showed that many features, particularly those from

specialized laboratory tests, had a high proportion of missing values. To avoid noise and unreliable

imputations, we applied a threshold-based filter to remove columns with more than 80 percent missing

data. For the remaining features, missing values were imputed using a K-Nearest Neighbors strategy with

the five most similar patients, chosen over simpler methods for its context-aware estimates that better

preserved data structure.

154

Made with FlippingBook - Share PDF online