AAI_2025_Capstone_Chronicles_Combined
11
activation) to detect localized patterns; the second layer expands to 128 filters with the same kernel size to
capture more complex feature interactions. Max-pooling reduces sequence length while retaining salient
features, and dropout layers (rate = 0.3) mitigate overfitting by randomly deactivating neurons during
training. The CNN output feeds into a single LSTM layer with 64 units, where input, forget, and output
gates regulate information flow, addressing vanishing gradient issues common in recurrent networks
(Hochreiter & Schmidhuber, 1997). Static patient variables are processed through a parallel dense
pathway with two fully connected layers (64 and 32 neurons, ReLU activation). Outputs from the
temporal and static branches are concatenated into a multimodal representation, followed by a dense layer
(32 neurons) and a sigmoid output layer for binary mortality prediction.
Training Procedure
To evaluate temporal resolution, datasets were aggregated at 6, 12, 24, 36, and 48 hours. Stratified
five-fold cross-validation preserved mortality class proportions across folds (Kohavi, 1995), and results
identified the 36-hour window as optimal, yielding higher recall than both shorter and longer horizons
(Yeh et al., 2024). The final model was trained for up to 50 epochs with a batch size of 32, binary cross
entropy loss, and the Adam optimizer (Kingma & Ba, 2015). Performance metrics included accuracy,
precision, recall, F1-score, and AUC-ROC, with recall emphasized to reduce false negatives (Chicco &
Jurman, 2020). Early stopping (patience = 10) and model checkpointing were applied to prevent
overfitting and preserve the best weights.
Model Optimization
Hyperparameter tuning explored variations in convolutional filter counts, kernel sizes, LSTM units,
dropout rates, and batch sizes. Cross-validation results informed the selection of 64 and 128 filters in the
first and second convolutional layers, respectively, a kernel size of 3, 64 LSTM units, a dropout rate of
0.3, and a batch size of 32. This configuration balanced recall performance, generalization capability, and
computational efficiency.
157
Made with FlippingBook - Share PDF online