ADS Capstone Chronicles Revised
12
4.2.3 Redundant Data. OpenFDA specifies two sources resulting in redundant reports. The first is that some reports are updated with new version numbers; manufacturers must submit exact copies of reports submittedbyconsumers.Itwasverifiedthat only the newest version of reports are delivered from the API. 4.2.4 Filtering. The data was filtered for quality based on a fewproperties.First,the API was requested to return only reports submitted by healthcare personnel (doctors, pharmacists, and other healthcare professionals). Any reports made by consumersorlawyersweredroppedwiththe assumption that they would lack a medically-accurate and/or inconsistent assessment of drug reactions. Second, records were filtered to select for drugs denoted as the primary suspect of ADRs in the events table (Characterization=1), thereby excluding concomitant and interaction drug suspects from the drug characterization field. Third, when documenting thegenericname ofadrug,ADRreportscancontainmultiple associated national drug codes, RxNorm unique identifier codes, and unique ingredient codes in theformoflists,sothis analysis only retained the first code listed for eachdrugtoreducemultidimensionality and simplify input features.Thedifferences in codes refer to different strengths, packagingsize,andlabeltext,butallreferto the same underlying, active chemicalinthe drug. Thus,theexcesslistsofcodesarenot necessary. 4.3 Feature Engineering Existing features in tables were engineered through transformations to better suit machine learning analysis. This was done throughstandardizingandfeatureextraction
4.2.2.5 Patient ReactionsTable. Thepatient reaction table was anestedvariablederived from the events table and linked by event_id. This table contains the adverse event reaction terms based on medical dictionary for regulatory activities version 26.1 (International Council for Harmonization, n.d.) and patient outcomes. 52.9% of individual ADR outcomes were labeled as “unknown” and 2.3% were missing. The followingfunctionswereused on this table: -add_sequential_index -process_label_text -nan_info -descriptive_stats -plot_character_length 4.2.2.6 Prices Table. The prices dataframe was obtained from Data.Medicaid API. It contains the average drug cost per unit of drug, national drug codes (NDCs), the effectivepricedate(YYYYMMDD)andthe drugtype(genericorbrand).Therewereno missing values in this dataframe. The following functions were used on this table: -nan_info -add_sequential_index 4.2.2.7 Manufacturer Table. The NDC API was used to obtain ndc codes, labeler’s name, and manufacturer nameforallactive marketedproductsduringthetimeperiodof the FAERS data (i.e., past three months), with the only missing data being 18% of manufacturername.Thefollowingfunctions were used to process this table: -add_sequential_index -nan_info -process_label_text
-plot_character_length -examine_text_outliers -clean_manufacturer_text
162
Made with FlippingBook - Online Brochure Maker