ADS Capstone Chronicles Revised
15
outcome of recovered , along with “fatigue” with an outcome of not recovered , which makes using that outcome variable confusing. Due to the ambiguity in the 5-level labeling, this analysis will use a 3-level outcome of death , serious , and nonserious . This outcome is more robust because it refers to the primary outcome from all possibleADRconditionsandthere is only one label per patient. A serious outcome is comprised of hospitalizations, disabling conditions, life threatening conditions,andbirthdefects.Thethree-level categorical outcome will be assessed with accuracy, precision, recall, specificity, and F1.Theclassimbalanceofthethreelevelsis shown in Figure 7 (i.e., baseline classification rates for each level: serious 69.4%, death 27.1%,and nonserious 3.6%). Reducing the amount of levels makes interpretation and model training more feasible. Class balancing was conducted prior to training for assessments of categorical outcomes (see Modeling). Figure 7 Outcome Variable Distribution
4.5 Exploratory Data Analysis The dataset used for machine learning objectives was queried from a local connection to pharma_db : “““ SELECT d.med_product, d.manu_num, d.ndc9, a.serious_outcome, a.expedited, a.age, a.sex, a.year, a.weight, p.price FROM adverse_events a INNER JOIN patient_reactions r ON a.event_id = r.event_id INNER JOIN patient_drugs d ON a.event_id = d.event_id LEFT JOIN prices p ON p.ndc9 = d.ndc9 ””” The query resulted in a sample size of 83,307 at time of this project. The data is split into 80/10/10 (66,645/8,331/8,331) for training/validation/testing. 4.5.1 Outcome Variable. There are a few possible outcome variables in the FAERS data.ADR-specificoutcomeseverityasfive levels - recovered, recovering, recovered with sequelae, not recovered, and fatal. These values have been mapped to an ordinal scale of 1-5 (Yue e al., 2024). However, for example, recovering and not recovered might be interpreted as the same thingbydifferentpeople.Additionally,each personcanhavemultipleADRtermsnested inasinglereport,witheachoftheoutcomes being different. For example, the same patient could have “chest pains” with an
4.5.2 Numerical Input Features. The numericalinputfeaturesareage(yr),weight (kg), drug prices (per unit), and number of manufacturers (manu). The distributions of these were examined with respect to the outcome variable levels (Figure 8a,b). Descriptive statistics of numerical variables were calculated with df.describe and multicollinearity examined with a
165
Made with FlippingBook - Online Brochure Maker