ADS Capstone Chronicles Revised

‭18‬

‭models‬ ‭yielded‬ ‭similar‬ ‭performance‬ ‭metric‬ ‭results‬ ‭for‬ ‭all‬ ‭three‬ ‭classes,‬ ‭with‬ ‭the‬ ‭most‬ ‭accurate‬‭cross‬‭validation‬‭fold‬‭scoring‬‭around‬ ‭70%.‬ ‭4.6.1.2‬ ‭Single‬ ‭Decision‬ ‭Tree.‬ ‭A‬ ‭single‬ ‭decision‬ ‭tree‬ ‭model‬ ‭was‬ ‭generated‬ ‭utilizing‬ ‭the‬ ‭DecisionTreeClassifier‬ ‭class‬ ‭from‬‭Scikit-learn‬‭(Pedregosa‬‭et‬‭al.,‬‭2011).‬‭In‬ ‭decision‬ ‭trees,‬ ‭data‬ ‭features‬ ‭are‬‭categorized‬ ‭into‬ ‭branch-like‬ ‭nodes‬ ‭which‬ ‭branch‬ ‭into‬ ‭more‬ ‭nodes‬ ‭and‬ ‭eventually‬‭outcome‬‭classes‬ ‭based‬ ‭on‬ ‭threshold‬ ‭values‬ ‭(Anuradha‬ ‭&‬ ‭Gupta,‬ ‭2014).‬ ‭The‬ ‭hyperparameters‬ ‭for‬ ‭the‬ ‭single‬ ‭tree‬ ‭were‬ ‭maximum‬ ‭tree‬ ‭depth‬ ‭(number‬ ‭of‬ ‭tree‬ ‭levels)‬ ‭and‬ ‭minimum‬ ‭samples‬ ‭per‬ ‭leaf.‬ ‭The‬ ‭best‬ ‭performing‬ ‭decision‬ ‭tree‬ ‭had‬ ‭a‬ ‭max_depth‬ ‭of‬ ‭10‬ ‭and‬ ‭a‬ ‭minimum of 5 samples per leaf (Figure 11).‬ ‭Figure 11‬ ‭Initial Decision Tree Splits‬

‭Report Source‬ ‭Reporter Professional Title‬

‭4.6.2‬ ‭Selection‬ ‭of‬ ‭Model‬ ‭Training‬ ‭Techniques.‬ ‭A‬ ‭variety‬ ‭of‬ ‭multi-class‬ ‭classification‬‭models‬‭were‬‭trained‬‭and‬‭tuned‬ ‭to‬ ‭classify‬ ‭the‬ ‭three-level‬ ‭multiclass‬ ‭outcome.‬‭White-box‬‭models‬‭were‬‭prioritized‬ ‭due‬ ‭to‬ ‭interpretability,‬ ‭in‬ ‭addition‬ ‭to‬ ‭tree-based‬ ‭ensembles.‬ ‭Hyperparameter‬ ‭tuning‬‭was‬‭implemented‬‭using‬‭grid‬‭searches‬ ‭with‬‭hyperparameter‬‭options‬‭specific‬‭to‬‭each‬ ‭classification‬‭algorithm.‬‭This‬‭was‬‭performed‬ ‭using‬ ‭the‬ ‭GridSearchCV‬ ‭class‬ ‭from‬ ‭Scikit-learn‬ ‭(Pedregosa‬ ‭et‬ ‭al.,‬ ‭2011).‬ ‭Five-fold‬‭cross‬‭validation‬‭was‬‭conducted‬‭for‬ ‭modeling‬ ‭training.‬ ‭Primary‬ ‭machine‬ ‭learning‬ ‭libraries‬ ‭were‬ ‭Scikit-learn‬ ‭(Pedregosa‬ ‭et‬ ‭al.,‬ ‭2011).‬ ‭The‬ ‭final‬ ‭model‬ ‭was‬ ‭selected‬ ‭based‬ ‭on‬ ‭recall‬ ‭for‬ ‭the‬ ‭death‬ ‭multiclass‬‭outcome;‬‭true‬‭positive‬‭predictions‬ ‭will‬ ‭be‬ ‭prioritized‬ ‭to‬ ‭ensure‬ ‭detection‬ ‭of‬ ‭these‬ ‭serious‬ ‭and‬ ‭fatal‬ ‭health‬ ‭outcomes,‬ ‭at‬ ‭the‬ ‭risk‬ ‭of‬ ‭increasing‬ ‭the‬ ‭false‬‭positive‬‭rate‬ ‭for‬ ‭nonserious‬ ‭outcomes.‬ ‭4.6.1.1‬ ‭Penalized‬ ‭Logistic‬ ‭Regression.‬ ‭Logistic‬ ‭regression‬ ‭models‬ ‭were‬ ‭generated‬ ‭using‬ ‭the‬ ‭LogisticRegression‬ ‭class‬ ‭from‬ ‭Scikit-learn‬ ‭(Pedregosa‬ ‭et‬ ‭al.,‬ ‭2011).‬ ‭Models‬ ‭with‬ ‭3‬ ‭different‬ ‭regularization‬ ‭techniques‬ ‭were‬ ‭fitted:‬ ‭lasso‬ ‭regression,‬ ‭ridge‬ ‭regression,‬ ‭and‬ ‭elastic‬ ‭net.‬ ‭Lasso‬ ‭regression‬ ‭utilizes‬ ‭the‬ ‭L1‬ ‭regularization‬ ‭technique,‬ ‭ridge‬ ‭regression‬ ‭utilizes‬ ‭the‬ ‭L2‬ ‭regularization‬ ‭technique,‬ ‭and‬ ‭elastic‬ ‭net‬ ‭applies‬ ‭both‬ ‭L1‬ ‭and‬ ‭L2‬ ‭penalty‬ ‭terms‬ ‭(Nagpal,‬ ‭2017).‬ ‭The‬ ‭L1‬ ‭penalty‬ ‭involves‬ ‭summing‬ ‭the‬ ‭absolute‬ ‭coefficients‬ ‭into‬ ‭the‬ ‭loss‬ ‭function;‬‭this‬‭therefore‬‭drives‬‭values‬‭of‬ ‭less‬‭important‬‭features‬‭to‬‭zero‬‭and‬‭can‬‭work‬ ‭better‬ ‭in‬ ‭models‬ ‭with‬ ‭a‬ ‭large‬ ‭number‬ ‭of‬ ‭features‬ ‭(Nagpal,‬ ‭2017).‬ ‭The‬ ‭L2‬ ‭penalty‬ ‭adds‬ ‭the‬ ‭squared‬ ‭magnitude‬ ‭of‬ ‭coefficients‬ ‭to‬ ‭the‬ ‭loss‬ ‭function,‬ ‭which‬ ‭helps‬ ‭prevent‬ ‭overfitting.‬ ‭All‬ ‭of‬ ‭the‬ ‭tested‬ ‭regression‬

‭4.6.1.3‬ ‭Random‬ ‭Forest.‬ ‭A‬ ‭random‬ ‭forest‬ ‭model‬ ‭was‬ ‭generated‬ ‭utilizing‬ ‭the‬ ‭`‬ ‭RandomForestClassifier‬ ‭`‬ ‭class‬ ‭from‬ ‭Scikit-learn‬ ‭(Pedregosa‬ ‭et‬ ‭al.,‬ ‭2011).‬ ‭Random‬ ‭forest‬ ‭classifiers‬ ‭are‬ ‭bagging‬ ‭ensemble‬ ‭learning‬ ‭methods,‬ ‭composed‬ ‭of‬ ‭multiple‬ ‭decision‬ ‭tree‬ ‭classifiers‬ ‭generated‬ ‭from‬ ‭random‬ ‭subsets‬ ‭of‬ ‭the‬ ‭dataset‬ ‭(Shafi,‬ ‭2023).‬ ‭The‬ ‭algorithm‬ ‭has‬ ‭implicit‬ ‭feature‬ ‭selection‬ ‭and‬ ‭works‬ ‭well‬ ‭with‬ ‭high‬ ‭dimensionality.‬ ‭Each‬ ‭sample‬ ‭is‬ ‭then‬ ‭processed‬ ‭by‬ ‭all‬ ‭trees;‬ ‭the‬ ‭most‬ ‭frequent‬ ‭class‬ ‭prediction‬ ‭is‬ ‭saved‬ ‭as‬ ‭the‬ ‭result.‬ ‭This‬ ‭model‬ ‭was‬ ‭run‬ ‭with‬‭5-fold‬‭cross‬‭validation.‬ ‭The‬ ‭hyperparameter‬ ‭for‬ ‭the‬ ‭random‬ ‭forest‬ ‭was‬ ‭the‬ ‭number‬ ‭of‬ ‭estimators.‬ ‭The‬ ‭optimal‬

168

Made with FlippingBook - Online Brochure Maker