ADS Capstone Chronicles Revised
7
InitialvaluesofWandHwerenonnegative. Then, frequency classes were derived from using a threshold density function for each class. Predictions between each class were statistically significant. The mean accuracy for each class ranged between 67.8% to 94% when the contiguous upper and lower classes were considered. The model performs best with very more frequent side effects, and poorer with the very rare class. They also evaluated the model against test sets, that contained drug side effect data from post-marketing; these data sources were the post-market SIDER dataset (Kuhn et al., 2018), and a post-market off label drug side effects (OFFSIDES) database which contains drug side-effects that are found mainly through EHR records but not listed on FDA labels (Tatonetti, 2012). Both these test sets only contained data regarding presence or absence of a side effect, not the frequency. Statistical analysis showed that predicted scores SIDER aligned with predictions in the held-out test set, while predictions for OFFSIDESwerelower.Theythenexamined theeffectofsignaturedrugcomponentsand side effects by bucketing drugs into anatomical classes, and summarized the statistically significant associations and anatomical drug categories, with MedDRA side effect categories. 3.6 Deep Learning Network Zhao et al. (2023) developed a two-step, multi-task deeplearningnetworktoclassify outcomes of adverse events based on seriousness of adverse drug reactions (ADRs) reported in FAERS. Step one is to classify whether an adverse reaction is relatedtoaseriousclinicaloutcome(yes/no) and step two is to classify the severity of outcome out of seven options - death, life-threatening, hospitalization, disability, congenital anomaly, required intervention, and other. Input features of the custom
benchmark dataset include one-hotencoded drug structure sequences (SMILES; Weininger, 1988), semantic features of ADRs listed in PubChem (NCBI, n.d.) and ADReCS (Cai et al., 2015), and 141,752 “known drug-ADR interactions” of which 58,429 “result in seriousclinicaloutcomes” (Zhao et al., 2023, p. 2) from FAERS. The data was represented as two n x m binary matrices: Interaction x. Seriousness Level, Interaction x. Serious (Yes, No). The network was trained with 10 times 10-fold cross validation on the custom benchmark dataset, and performance (AUC, AUPR) wasevaluatedonthetestfoldsinadditionto independent test sets - SIDER (Kuhnetal., 2016) and OFFSIDES (Tatonetti et al., 2012)-inwhichoverlappingdrugswiththe benchmark training set were removed. For binary classification, the performance was above 90% for all performance metrics on all testing data.Formulticlassclassification of outcome seriousness, the network performance declines. The class imbalance inthesevenlevelsofseriousnessisreflected in the AUPR scores for the test folds, with congenital anomaly and required intervention scoringthelowestat0.529and 0.431 respectively. The AUPR for SIDER and OFFSIDES was 0.674 and 0.663, respectively. Performance was increased by 3.6% for AUPR by constructing a directed acyclic graph for each ADR and using a multi-head self-attention module for the multiclassclassification.Themodeltraining couldhavebenefitedfromclassbalancingor choosing a different outcome variable with less levels. 3.7 FAERS Limitations One of the primary limitations for FAERS data is that true population frequencies of ADRs cannot be inferred due to underreporting(Hazell&Shakir,2006).Yue et al. (2024) used the national Medical Expenditure Panel Survey’s (MEPS)
157
Made with FlippingBook - Online Brochure Maker