ADS Capstone Chronicles Revised

‭6‬

‭(KEGG)‬ ‭Drug‬ ‭(Kanehisa‬ ‭&‬ ‭Goto,‬ ‭2000).‬ ‭Benchmark‬ ‭datasets‬ ‭were‬ ‭Pauwels‬ ‭et‬ ‭al.’s‬ ‭(2011),‬ ‭Mizutani‬‭et‬‭al.’s‬‭(2012),‬‭and‬‭Liu’‬‭et‬ ‭al.’s‬ ‭(2012).‬ ‭Drugs‬ ‭and‬ ‭side‬ ‭effects‬ ‭were‬ ‭represented‬‭as‬‭an‬ ‭n‬ ‭x‬ ‭m‬ ‭binary‬‭coded‬‭matrix.‬ ‭The‬ ‭recommendation‬ ‭systems‬ ‭were‬ ‭trained‬ ‭with‬ ‭20‬ ‭times‬ ‭5-fold‬ ‭cross‬ ‭validation‬ ‭and‬ ‭average‬ ‭performance‬ ‭on‬ ‭the‬ ‭test‬ ‭folds‬ ‭was‬ ‭evaluated‬ ‭with‬ ‭area‬ ‭under‬ ‭the‬ ‭precision-recall‬ ‭curve‬ ‭(AUPR),‬ ‭area‬ ‭under‬ ‭the‬ ‭receiver‬ ‭operating‬ ‭characteristic‬ ‭(ROC)‬ ‭curve‬ ‭(AUC),‬ ‭sensitivity,‬ ‭specificity,‬ ‭precision,‬ ‭accuracy,‬ ‭and‬ ‭F‬ ‭scores.‬ ‭Overall,‬ ‭the‬ ‭best‬ ‭performing‬ ‭model‬ ‭was‬ ‭the‬ ‭ensemble,‬‭but‬‭the‬‭small‬‭gain‬‭in‬‭performance‬ ‭over‬ ‭using‬ ‭the‬ ‭neighborhood-based‬ ‭or‬ ‭machine-based‬ ‭method‬‭alone‬‭was‬‭not‬‭worth‬ ‭the‬ ‭added‬ ‭complexity.‬ ‭Even‬ ‭though‬ ‭the‬ ‭ensemble‬ ‭was‬ ‭the‬ ‭best,‬ ‭the‬ ‭model‬ ‭had‬ ‭low‬ ‭performance‬‭on‬‭all‬‭three‬‭benchmark‬‭datasets‬ ‭for‬ ‭AUPR‬ ‭(.662‬ ‭average),‬ ‭sensitivity‬ ‭(.623‬ ‭average),‬ ‭precision‬ ‭(.614‬ ‭average),‬ ‭and‬ ‭Fs‬ ‭scores‬ ‭(.618‬ ‭average).‬ ‭AUC,‬ ‭accuracy,‬ ‭and‬ ‭specificity‬ ‭were‬ ‭all‬ ‭greater‬ ‭than‬ ‭0.90.‬ ‭The‬ ‭results‬ ‭show‬ ‭that‬ ‭the‬ ‭training‬ ‭could‬ ‭benefit‬ ‭from‬ ‭fixing‬ ‭class‬ ‭imbalance.‬ ‭The‬ ‭models‬ ‭need‬ ‭more‬‭fine‬‭tuning‬‭for‬‭the‬‭target‬‭class‬‭to‬ ‭be‬‭useful.‬‭The‬‭intended‬‭implications‬‭and‬‭use‬ ‭of‬ ‭the‬ ‭model‬ ‭would‬ ‭require‬ ‭performance‬ ‭above‬ ‭0.600‬ ‭for‬ ‭all‬ ‭metrics.‬ ‭The‬ ‭authors‬ ‭offer‬ ‭no‬ ‭insight‬ ‭into‬ ‭the‬ ‭interpretability‬ ‭of‬ ‭the model beyond the benchmark datasets.‬ ‭3.4 Deep Neural Network‬ ‭Wang‬ ‭et‬ ‭al.‬ ‭(2019)‬ ‭created‬ ‭a‬ ‭deep‬ ‭neural‬ ‭network‬ ‭with‬ ‭“chemical,‬ ‭biological,‬ ‭and‬ ‭biomedical‬ ‭information‬ ‭of‬ ‭drugs”‬ ‭from‬ ‭biomedical‬‭literature‬‭to‬‭predict‬‭adverse‬‭drug‬ ‭reactions‬ ‭based‬ ‭on‬ ‭Word2Vec‬ ‭word-embeddings‬ ‭(p.‬ ‭1).‬ ‭Data‬ ‭about‬ ‭746‬ ‭drugs‬‭was‬‭sourced‬‭from‬‭SIDER‬‭(Kuhn‬‭et‬‭al.,‬ ‭2016),‬ ‭PubChem‬ ‭(NCBI,‬ ‭n.d.),‬ ‭DrugBank‬ ‭(Wishart‬‭et‬‭al.,‬‭2018),‬‭and‬‭2.3‬‭million‬‭papers‬ ‭from‬ ‭MEDLINE‬ ‭(National‬ ‭Library‬ ‭of‬ ‭Medicine,‬ ‭n.d.)‬ ‭about‬ ‭each‬ ‭drug‬ ‭in‬ ‭the‬ ‭dataset‬ ‭-‬ ‭case‬ ‭studies,‬ ‭clinical‬ ‭trials,‬ ‭and‬

‭observational‬ ‭studies‬ ‭-‬‭to‬‭assess‬‭progress‬‭of‬ ‭surveillance‬ ‭in‬ ‭the‬ ‭literature‬ ‭from‬ ‭2009‬ ‭to‬ ‭2012.‬ ‭The‬ ‭multilayer‬ ‭perceptron‬ ‭had‬ ‭1325‬ ‭hidden‬ ‭nodes‬ ‭in‬ ‭the‬ ‭last‬ ‭layer‬ ‭which‬ ‭corresponded‬‭to‬‭the‬‭number‬‭of‬‭known‬‭ADR‬ ‭side‬ ‭effects‬ ‭in‬ ‭the‬‭dataset.‬‭The‬‭performance‬ ‭of‬‭the‬‭neural‬‭network‬‭was‬‭compared‬‭to‬‭other‬ ‭models‬ ‭-‬ ‭probability‬ ‭matrix‬ ‭factorization,‬ ‭linear‬‭support‬‭vector‬‭classifier,‬‭and‬‭Gaussian‬ ‭Naive‬ ‭Bayes.‬ ‭All‬ ‭models‬ ‭were‬ ‭trained‬ ‭with‬ ‭five-fold‬ ‭cross‬ ‭validation‬ ‭and‬‭with‬‭different‬ ‭sets‬ ‭of‬ ‭input‬ ‭features‬ ‭(chemical‬ ‭properties,‬ ‭biological‬ ‭properties,‬ ‭word-embeddings).‬ ‭The‬ ‭models‬ ‭were‬ ‭trained‬ ‭as‬ ‭identifiers‬ ‭and‬ ‭classifiers‬ ‭of‬ ‭adverse‬ ‭drug‬‭reactions.‬‭Model‬ ‭performance‬‭was‬‭assessed‬‭using‬‭ROC‬‭curve‬ ‭and‬ ‭mean‬ ‭average‬ ‭precision‬ ‭(MAP)‬ ‭on‬ ‭the‬ ‭test‬‭folds.‬‭The‬‭deep‬‭neural‬‭network‬‭with‬‭two‬ ‭hidden‬ ‭layers‬ ‭performed‬ ‭the‬ ‭best‬ ‭as‬ ‭classification‬ ‭(AUC‬ ‭0.844,‬ ‭MAP‬ ‭0.721).‬ ‭The‬ ‭biological‬ ‭feature‬ ‭set‬ ‭from‬ ‭DrugBank‬ ‭carried‬ ‭most‬ ‭of‬ ‭the‬ ‭variance,‬ ‭with‬ ‭the‬ ‭literature‬ ‭word-embeddings‬ ‭adding‬ ‭slight‬ ‭performance‬ ‭improvement.‬ ‭The‬ ‭chemical‬ ‭features‬ ‭from‬ ‭PubChem‬ ‭were‬ ‭found‬ ‭to‬ ‭be‬ ‭noninformative for classifying outcomes.‬ ‭3.5 Matrix Decomposition‬ ‭Galeano‬ ‭et‬ ‭al.‬ ‭(2020)‬ ‭generated‬ ‭a‬ ‭matrix‬ ‭decomposition‬ ‭algorithm‬ ‭to‬ ‭predict‬ ‭the‬ ‭frequencies‬ ‭of‬ ‭drug‬ ‭side‬ ‭effects.‬ ‭They‬ ‭obtained‬ ‭data‬ ‭from‬ ‭SIDER‬ ‭(Kuhn‬ ‭et‬ ‭al.,‬ ‭2018)‬ ‭to‬ ‭obtain‬ ‭side‬ ‭effect‬ ‭frequencies,‬‭and‬ ‭generated‬ ‭5‬ ‭frequency‬ ‭classes:‬ ‭very‬ ‭rare‬ ‭(=1),‬ ‭rare‬ ‭(=2)‬ ‭,‬ ‭infrequent‬ ‭(=3),‬ ‭frequent‬ ‭(=4),‬ ‭and‬ ‭very‬ ‭frequent‬ ‭(=5).‬ ‭A‬ ‭matrix,‬ ‭R‬ ‭,‬ ‭composed‬ ‭of‬ ‭994‬ ‭various‬ ‭side‬ ‭effects‬ ‭and‬ ‭759‬ ‭diverse‬ ‭drugs,‬ ‭with‬ ‭37,441‬ ‭known‬ ‭associations‬‭was‬‭generated.‬‭It‬‭is‬‭important‬‭to‬ ‭note‬ ‭that‬ ‭about‬ ‭95%‬ ‭of‬ ‭associations‬ ‭were‬ ‭unobserved,‬ ‭and‬ ‭therefore‬ ‭coded‬ ‭as‬ ‭zero.‬ ‭The‬ ‭algorithm‬ ‭decomposed‬ ‭the‬ ‭matrix‬ ‭into‬ ‭two‬‭matrices‬‭W‬‭(number‬‭of‬‭drugs‬‭x‬‭number‬ ‭of‬ ‭latent‬ ‭features)‬ ‭and‬ ‭H‬ ‭(number‬ ‭of‬ ‭latent‬ ‭features‬ ‭x‬ ‭number‬ ‭of‬ ‭side‬ ‭effects),‬ ‭which‬ ‭were‬‭multiplied‬‭to‬‭obtain‬ ‭,‬‭the‬‭model‬‭of‬ ‭R‬ ‭.‬ ‭ ‬

156

Made with FlippingBook - Online Brochure Maker