AAI_2025_Capstone_Chronicles_Combined
5
stays. This dataset provides a valuable opportunity to explore how advanced machine learning models can
leverage complex patient data to generate accurate and timely risk assessments.
Historically, this problem has been addressed using clinical scoring systems like the Simplified
Acute Physiology Score (SAPS-I) (LeGall et al., 1984) and the Sequential Organ Failure Assessment
(SOFA) (Vincent et al., 1996) score, both of which are present in our dataset. While useful, these scores
are often based on a limited set of variables and traditional statistical models like logistic regression. The
advent of large-scale electronic health record (EHR) databases has spurred a shift towards more
sophisticated machine learning approaches. Researchers have successfully applied a range of methods to
this problem, from tree-based ensembles to deep learning models capable of analyzing raw time-series
data, demonstrating the potential for data-driven models to improve upon traditional scoring systems. Our
project will explore a multi-modal approach, evaluating three powerful machine learning architectures:
XGBoost for aggregated tabular data, and Convolutional Neural Networks (CNNs) and Transformers for
direct time-series analysis.
XGBoost
XGBoost is a fast, regularized implementation of gradient boosting that builds an ensemble of
decision trees, each trained to correct the errors of the previous ones, resulting in high predictive accuracy
(Chen & Guestrin, 2016). It is well-suited for our tabular patient dataset, as it efficiently handles missing
values and provides built-in feature importance rankings. These capabilities allow us to both achieve
strong predictive performance and identify the clinical factors most associated with mortality risk.
XGBoost is well-suited for our project because it excels with structured, tabular data and can
natively handle missing values, simplifying preprocessing. It also provides feature importance rankings,
151
Made with FlippingBook - Share PDF online