AAI_2025_Capstone_Chronicles_Combined
8
LSTMs in mortality and length-of-stay prediction using datasets like MIMIC-III (Zhang et al., 2022).
Benchmark architectures such as Informer (Zhou et al., 2021), Autoformer (Wu et al., 2021), and
PatchTST (Nie et al., 2023) have introduced innovations in long-sequence forecasting, including attention
sparsity and seasonal- trend decomposition. Commercial efforts like Google’s TimesFM (Das et al., 2023)
further illustrate the potential for scalable, zero-shot adaptation in dynamic clinical environments.
Building on these foundations, this project develops an interpretable and scalable pipeline for
ICU mortality prediction by integrating temporal dynamics of physiological signals with static patient
context.
Experimental Methods
In this section we provide an overview of the experimental methods we implemented for our
machine learning models based on XGBoost, CNN-LSTM and Transformers.
XGBoost
We implemented a multi-stage preprocessing pipeline to ensure data quality, prevent data
leakage, and handle missing values. The process began with a data cleaning phase in which we loaded the
aggregated feature set and removed variables unavailable at prediction time, such as outcome-related and
post-discharge information. Exploratory data analysis showed that many features, particularly those from
specialized laboratory tests, had a high proportion of missing values. To avoid noise and unreliable
imputations, we applied a threshold-based filter to remove columns with more than 80 percent missing
data. For the remaining features, missing values were imputed using a K-Nearest Neighbors strategy with
the five most similar patients, chosen over simpler methods for its context-aware estimates that better
preserved data structure.
154
Made with FlippingBook - Share PDF online