ADS Capstone Chronicles Revised
26
this subsample of the broader population, and mayinferthatthesepeoplehaveaccess to better healthcare resources compared to people taking the same drugs who might have had an ADR, but who did not seek medical attention, or who’s healthcare providerswerenotthoroughenoughtomake areportinFAERS.Additionally,thesystem can only make predictions on the list of drugs reported in FAERS. If an end-user inputsadrugintothepredicationapplication thatwasnotretainedasafeatureinthefinal model, the prediction will be slightly less accurate and only be based on age,weight, sex, drug price, and provider qualification. However, since all of these features carry the majority of feature importance, the potential outcome will be classified meaningfully. FAERS data also lacks context for underlying circumstances. For example, acetaminophen,alsoknownasTylenol,isin the top five reported drugsrelatedtodeath. At first glance, this may seem like a safe, over-the-counter drug, but it has the potential for liver damagefromoverdosing, and this can lead to death when combined with alcohol use (Ghosh et al., 2021). Another example is aromatase inhibitors which are used for cancer treatment in femalesandhormonebalancingandfertility in males. Currently, FAERS reports do not capture the situational context of pharmaceuticaldruguse,soanyconclusions made about drugs are based on speculation and the limited demographic information available. The amount of data intheFAERSdatabase is into the millions and requires high-load computational resources. This analysis utilized a robust subset of the data to build and test the pipeline, which may have containeddifferentstatisticalpropertiesthan the entire dataset. Additionally, the FAERS
database has a lack of required fields, so each record has its own pattern of missing data.Thus,thefull28millionsamplesizeis not usable. When filtering for quality data with complete records of individual difference data (age,sex,weight,outcome), the available sample drops to a million at most. It was further filtered toreportsfrom healthcare professionals only, andrestricted tothemostcurrentquarter-yearofdata.The pipelinecanbeadaptedtoretrievemoredata from FAERS by updating the URL date limits and other parameters in the API request. Training the classification model required undersampling the lowest frequency class, nonserious , which means that about a third ofthe death outcomesandatenthof serious outcomes were not usedformodeltraining. Thus, the models might not have incorporated as much useful information as was available.However,thisdidnotimpact the ensemble models and mainly affected the other models’ performance. ThehistoricaldocumentsAPIendpointonly contains data up until the year 2014. The historical records could be improved by using other sources of data on pharmaceutical drug news headlines like ProQuest database API (Clarivate, n.d.). This would make for an interesting dashboard feature. TheMedicaiddrugpricesAPIonlycontains informationondrugsthatareprescribedand can be reimbursed through their services, andonlyincludesthepriceperunit.Thus,it does not contain prices for all marketed drugs. Only 700 matching drugpriceswere foundintheFAERSdataset(lessthan10%). Missing price values were imputed using medianprice.Thus,thepricefeaturedidnot add as much variance to the model as expected.Addingadditionalsourcesofdrug
176
Made with FlippingBook - Online Brochure Maker