AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 157 Next page Last page

activation) to detect localized patterns; the second layer expands to 128 filters with the same kernel size to

capture more complex feature interactions. Max-pooling reduces sequence length while retaining salient

features, and dropout layers (rate = 0.3) mitigate overfitting by randomly deactivating neurons during

training. The CNN output feeds into a single LSTM layer with 64 units, where input, forget, and output

gates regulate information flow, addressing vanishing gradient issues common in recurrent networks

(Hochreiter & Schmidhuber, 1997). Static patient variables are processed through a parallel dense

pathway with two fully connected layers (64 and 32 neurons, ReLU activation). Outputs from the

temporal and static branches are concatenated into a multimodal representation, followed by a dense layer

(32 neurons) and a sigmoid output layer for binary mortality prediction.

Training Procedure

To evaluate temporal resolution, datasets were aggregated at 6, 12, 24, 36, and 48 hours. Stratified

five-fold cross-validation preserved mortality class proportions across folds (Kohavi, 1995), and results

identified the 36-hour window as optimal, yielding higher recall than both shorter and longer horizons

(Yeh et al., 2024). The final model was trained for up to 50 epochs with a batch size of 32, binary cross

entropy loss, and the Adam optimizer (Kingma & Ba, 2015). Performance metrics included accuracy,

precision, recall, F1-score, and AUC-ROC, with recall emphasized to reduce false negatives (Chicco &

Jurman, 2020). Early stopping (patience = 10) and model checkpointing were applied to prevent

overfitting and preserve the best weights.

Model Optimization

Hyperparameter tuning explored variations in convolutional filter counts, kernel sizes, LSTM units,

dropout rates, and batch sizes. Cross-validation results informed the selection of 64 and 128 filters in the

first and second convolutional layers, respectively, a kernel size of 3, 64 LSTM units, a dropout rate of

0.3, and a batch size of 32. This configuration balanced recall performance, generalization capability, and

computational efficiency.

157

Made with FlippingBook - Share PDF online