AAI_2025_Capstone_Chronicles_Combined

9

Categorical features were numerically encoded, and one-hot encoding was applied to avoid false

ordinal relationships. We began with a baseline model and used systematic hyperparameter tuning to

optimize parameters controlling model complexity and regularization, including number of trees,

maximum tree depth, and learning rate. The dataset showed substantial class imbalance, with mortality

representing approximately 14 percent of cases. To address this, the data was split into 80 percent training

and 20 percent test sets using stratification to preserve class proportions, followed by the Synthetic

Minority Over-sampling Technique (SMOTE) on the training set to generate synthetic examples of the

minority class and reduce bias toward the majority class.

To build the XGBoost model, we used a systematic process combining sequential feature

engineering with hyperparameter optimization. The initial baseline model, evaluated using five-fold

cross-validation with synthetic oversampling applied within each fold to prevent data leakage, achieved a

ROC AUC of 0.8298 and served as the reference for subsequent experiments.

The first stage tested interaction features, designed to capture clinically meaningful ratios and

ranges. These aimed to make certain physiological relationships more explicit, such as combining

cardiovascular measures into composite indicators. However, their inclusion slightly reduced the ROC

AUC to 0.8273, indicating that the base model already captured these relationships, and they were

removed.

The second stage added time-series features from the first 24 hours of ICU data, summarizing

temporal patterns in patient measurements, including trends, variability, and observation counts. These

improved the ROC AUC to 0.8337 and were retained.

The third stage applied polynomial transformations to the most predictive features from the prior

iteration to better capture non-linear relationships with mortality risk. In some variables, extreme values

have disproportionate impact and squaring them emphasizes such effects. This raised the ROC AUC to

0.8358, the highest in the feature engineering phase.

155

Made with FlippingBook - Share PDF online