ADS Capstone Chronicles Revised

‭12‬

‭4.2.3‬ ‭Redundant‬ ‭Data.‬ ‭OpenFDA‬ ‭specifies‬ ‭two‬ ‭sources‬ ‭resulting‬ ‭in‬ ‭redundant‬ ‭reports.‬ ‭The‬ ‭first‬ ‭is‬ ‭that‬ ‭some‬ ‭reports‬ ‭are‬ ‭updated‬ ‭with‬ ‭new‬ ‭version‬ ‭numbers;‬ ‭manufacturers‬ ‭must‬ ‭submit‬ ‭exact‬ ‭copies‬ ‭of‬ ‭reports‬ ‭submitted‬‭by‬‭consumers.‬‭It‬‭was‬‭verified‬‭that‬ ‭only‬ ‭the‬ ‭newest‬ ‭version‬ ‭of‬ ‭reports‬ ‭are‬ ‭delivered from the API.‬ ‭4.2.4‬ ‭Filtering.‬ ‭The‬ ‭data‬ ‭was‬ ‭filtered‬ ‭for‬ ‭quality‬ ‭based‬ ‭on‬ ‭a‬ ‭few‬‭properties.‬‭First,‬‭the‬ ‭API‬ ‭was‬ ‭requested‬ ‭to‬ ‭return‬ ‭only‬ ‭reports‬ ‭submitted‬ ‭by‬ ‭healthcare‬ ‭personnel‬ ‭(doctors,‬ ‭pharmacists,‬ ‭and‬ ‭other‬ ‭healthcare‬ ‭professionals).‬ ‭Any‬ ‭reports‬ ‭made‬ ‭by‬ ‭consumers‬‭or‬‭lawyers‬‭were‬‭dropped‬‭with‬‭the‬ ‭assumption‬ ‭that‬ ‭they‬ ‭would‬ ‭lack‬ ‭a‬ ‭medically-accurate‬ ‭and/or‬ ‭inconsistent‬ ‭assessment of drug reactions.‬ ‭Second,‬ ‭records‬ ‭were‬ ‭filtered‬ ‭to‬ ‭select‬ ‭for‬ ‭drugs‬ ‭denoted‬ ‭as‬ ‭the‬ ‭primary‬ ‭suspect‬ ‭of‬ ‭ADRs‬ ‭in‬ ‭the‬ ‭events‬ ‭table‬ ‭(Characterization=1),‬ ‭thereby‬ ‭excluding‬ ‭concomitant‬ ‭and‬ ‭interaction‬ ‭drug‬ ‭suspects‬ ‭from the drug characterization field.‬ ‭Third,‬ ‭when‬ ‭documenting‬ ‭the‬‭generic‬‭name‬ ‭of‬‭a‬‭drug,‬‭ADR‬‭reports‬‭can‬‭contain‬‭multiple‬ ‭associated‬ ‭national‬ ‭drug‬ ‭codes,‬ ‭RxNorm‬ ‭unique‬ ‭identifier‬ ‭codes,‬ ‭and‬ ‭unique‬ ‭ingredient‬ ‭codes‬ ‭in‬ ‭the‬‭form‬‭of‬‭lists,‬‭so‬‭this‬ ‭analysis‬ ‭only‬ ‭retained‬ ‭the‬ ‭first‬ ‭code‬ ‭listed‬ ‭for‬ ‭each‬‭drug‬‭to‬‭reduce‬‭multidimensionality‬ ‭and‬ ‭simplify‬ ‭input‬ ‭features.‬‭The‬‭differences‬ ‭in‬ ‭codes‬ ‭refer‬ ‭to‬ ‭different‬ ‭strengths,‬ ‭packaging‬‭size,‬‭and‬‭label‬‭text,‬‭but‬‭all‬‭refer‬‭to‬ ‭the‬ ‭same‬ ‭underlying,‬ ‭active‬ ‭chemical‬‭in‬‭the‬ ‭drug.‬ ‭Thus,‬‭the‬‭excess‬‭lists‬‭of‬‭codes‬‭are‬‭not‬ ‭necessary.‬ ‭4.3 Feature Engineering‬ ‭Existing‬ ‭features‬ ‭in‬ ‭tables‬ ‭were‬ ‭engineered‬ ‭through‬ ‭transformations‬ ‭to‬ ‭better‬ ‭suit‬ ‭machine‬ ‭learning‬ ‭analysis.‬ ‭This‬ ‭was‬ ‭done‬ ‭through‬‭standardizing‬‭and‬‭feature‬‭extraction‬

‭4.2.2.5‬ ‭Patient‬ ‭Reactions‬‭Table.‬ ‭The‬‭patient‬ ‭reaction‬ ‭table‬ ‭was‬ ‭a‬‭nested‬‭variable‬‭derived‬ ‭from‬ ‭the‬ ‭events‬ ‭table‬ ‭and‬ ‭linked‬ ‭by‬ ‭event_id.‬ ‭This‬ ‭table‬ ‭contains‬ ‭the‬ ‭adverse‬ ‭event‬ ‭reaction‬ ‭terms‬ ‭based‬ ‭on‬ ‭medical‬ ‭dictionary‬ ‭for‬ ‭regulatory‬ ‭activities‬ ‭version‬ ‭26.1‬ ‭(International‬ ‭Council‬ ‭for‬ ‭Harmonization,‬ ‭n.d.)‬ ‭and‬ ‭patient‬ ‭outcomes.‬ ‭52.9%‬ ‭of‬ ‭individual‬ ‭ADR‬ ‭outcomes‬ ‭were‬ ‭labeled‬ ‭as‬ ‭“unknown”‬ ‭and‬ ‭2.3%‬ ‭were‬ ‭missing.‬ ‭The‬ ‭following‬‭functions‬‭were‬‭used‬ ‭on this table:‬ ‭-add_sequential_index‬ ‭-process_label_text‬ ‭-nan_info‬ ‭-descriptive_stats‬ ‭-plot_character_length‬ ‭4.2.2.6‬ ‭Prices‬ ‭Table.‬ ‭The‬ ‭prices‬ ‭dataframe‬ ‭was‬ ‭obtained‬ ‭from‬ ‭Data.Medicaid‬ ‭API.‬ ‭It‬ ‭contains‬ ‭the‬ ‭average‬ ‭drug‬ ‭cost‬ ‭per‬ ‭unit‬ ‭of‬ ‭drug,‬ ‭national‬ ‭drug‬ ‭codes‬ ‭(NDCs),‬ ‭the‬ ‭effective‬‭price‬‭date‬‭(YYYYMMDD)‬‭and‬‭the‬ ‭drug‬‭type‬‭(generic‬‭or‬‭brand).‬‭There‬‭were‬‭no‬ ‭missing‬ ‭values‬ ‭in‬ ‭this‬ ‭dataframe.‬ ‭The‬ ‭following functions were used on this table:‬ ‭-nan_info‬ ‭-add_sequential_index‬ ‭4.2.2.7‬ ‭Manufacturer‬ ‭Table.‬ ‭The‬ ‭NDC‬ ‭API‬ ‭was‬ ‭used‬ ‭to‬ ‭obtain‬ ‭ndc‬ ‭codes,‬ ‭labeler’s‬ ‭name,‬ ‭and‬ ‭manufacturer‬ ‭name‬‭for‬‭all‬‭active‬ ‭marketed‬‭products‬‭during‬‭the‬‭time‬‭period‬‭of‬ ‭the‬ ‭FAERS‬ ‭data‬ ‭(i.e.,‬ ‭past‬ ‭three‬ ‭months),‬ ‭with‬ ‭the‬ ‭only‬ ‭missing‬ ‭data‬ ‭being‬ ‭18%‬ ‭of‬ ‭manufacturer‬‭name.‬‭The‬‭following‬‭functions‬ ‭were used to process this table:‬ ‭-add_sequential_index‬ ‭-nan_info‬ ‭-process_label_text‬

‭-plot_character_length‬ ‭-examine_text_outliers‬ ‭-clean_manufacturer_text‬

162

Made with FlippingBook - Online Brochure Maker