ADS Capstone Chronicles Revised
18
models yielded similar performance metric results for all three classes, with the most accuratecrossvalidationfoldscoringaround 70%. 4.6.1.2 Single Decision Tree. A single decision tree model was generated utilizing the DecisionTreeClassifier class fromScikit-learn(Pedregosaetal.,2011).In decision trees, data features arecategorized into branch-like nodes which branch into more nodes and eventuallyoutcomeclasses based on threshold values (Anuradha & Gupta, 2014). The hyperparameters for the single tree were maximum tree depth (number of tree levels) and minimum samples per leaf. The best performing decision tree had a max_depth of 10 and a minimum of 5 samples per leaf (Figure 11). Figure 11 Initial Decision Tree Splits
Report Source Reporter Professional Title
4.6.2 Selection of Model Training Techniques. A variety of multi-class classificationmodelsweretrainedandtuned to classify the three-level multiclass outcome.White-boxmodelswereprioritized due to interpretability, in addition to tree-based ensembles. Hyperparameter tuningwasimplementedusinggridsearches withhyperparameteroptionsspecifictoeach classificationalgorithm.Thiswasperformed using the GridSearchCV class from Scikit-learn (Pedregosa et al., 2011). Five-foldcrossvalidationwasconductedfor modeling training. Primary machine learning libraries were Scikit-learn (Pedregosa et al., 2011). The final model was selected based on recall for the death multiclassoutcome;truepositivepredictions will be prioritized to ensure detection of these serious and fatal health outcomes, at the risk of increasing the falsepositiverate for nonserious outcomes. 4.6.1.1 Penalized Logistic Regression. Logistic regression models were generated using the LogisticRegression class from Scikit-learn (Pedregosa et al., 2011). Models with 3 different regularization techniques were fitted: lasso regression, ridge regression, and elastic net. Lasso regression utilizes the L1 regularization technique, ridge regression utilizes the L2 regularization technique, and elastic net applies both L1 and L2 penalty terms (Nagpal, 2017). The L1 penalty involves summing the absolute coefficients into the loss function;thisthereforedrivesvaluesof lessimportantfeaturestozeroandcanwork better in models with a large number of features (Nagpal, 2017). The L2 penalty adds the squared magnitude of coefficients to the loss function, which helps prevent overfitting. All of the tested regression
4.6.1.3 Random Forest. A random forest model was generated utilizing the ` RandomForestClassifier ` class from Scikit-learn (Pedregosa et al., 2011). Random forest classifiers are bagging ensemble learning methods, composed of multiple decision tree classifiers generated from random subsets of the dataset (Shafi, 2023). The algorithm has implicit feature selection and works well with high dimensionality. Each sample is then processed by all trees; the most frequent class prediction is saved as the result. This model was run with5-foldcrossvalidation. The hyperparameter for the random forest was the number of estimators. The optimal
168
Made with FlippingBook - Online Brochure Maker