ADS Capstone Chronicles Revised
17
Expedited 2 (Y/N)
Yes
89.4%
For the adverse events data, there are over 5000 ADR terms listedwith“offlabeluse” being the highest reported. Out of the current marketed drugs in the NDC database, the adverse events data contains reports on 1400 unique ingredient identifiers, from 1200 manufacturers, of whichthetopfivedrugsrelatedtoADRsare methotrexate, actemra, prednisone, sulfasalazine, and rituximab. Threeofthese drugs treat rheumatoid arthritis, and the other two treat inflammation and cancer. 4.6 Modeling 4.6.1 Input Feature Transformation Pipeline. Aftersplittingthedata80/10/10,a feature transformation pipeline was applied to the three splitsseparatelytopreventdata leakage, except for the drug name variable which needed to be one-hot encoded with the same drugs for the entire dataset. This allows for the representation of alldrugsin the FAERS data to be present in all data splits. Numerical features were centered, scaled, and skew-corrected, and categorical features were one-hot encoded (c-1 dummies).Thefinallistofinputfeaturesare in Table 4. The training set was undersampledtoachievethesamefrequency counts for each level of the outcome ( n = 2,691 per Class). Table 4 Input Features Descriptives Feature Description Drug (binary matrix) Age Age in years Sex Male/Female Weight Weight in kg Price Drug price at time of ADR
Country
81
USA
66.3%
Figure 10 Report Source by Outcome
4.5.4 Text Input features. Text features include drug label information (ingredients, warnings, manufacturer list, drug names, drug purpose), drug compounds, medicinal product, drug indication, and adverse event reaction descriptions (binary). The freetext wascleanedandtokenized.Thetokenswere assessed with a descriptive statistics function for number of unique tokens, total tokens, lexical diversity, and most common tokens. 4.5.4.1 Text Token Insights. The National Drug code databasecontainsjustover8000 currently marketed drugs with 105,000 formulations. The top three most produced drugproductsareibuprofen,gabapentin,and oxygen. The top manufacturers and distributors based on number marketed products are companies that produce supplements and homeopathic drugs. The Labels database has information from 47,000 active drug codes with 4800 unique ingredient identifiers with 7800 different marketed drug names. The highest frequency drug types are over-the-counter drugs like sunscreen, antiseptics, and fever reducers,sincetheycanbemanufacturedby multiple companies.
167
Made with FlippingBook - Online Brochure Maker