ADS Capstone Chronicles Revised

‭17‬

‭Expedited‬ ‭2 (Y/N)‬

‭Yes‬

‭89.4%‬

‭For‬ ‭the‬ ‭adverse‬ ‭events‬ ‭data,‬ ‭there‬ ‭are‬ ‭over‬ ‭5000‬ ‭ADR‬ ‭terms‬ ‭listed‬‭with‬‭“off‬‭label‬‭use”‬ ‭being‬ ‭the‬ ‭highest‬ ‭reported.‬ ‭Out‬ ‭of‬ ‭the‬ ‭current‬ ‭marketed‬ ‭drugs‬ ‭in‬ ‭the‬ ‭NDC‬ ‭database,‬ ‭the‬ ‭adverse‬ ‭events‬ ‭data‬ ‭contains‬ ‭reports‬ ‭on‬ ‭1400‬ ‭unique‬ ‭ingredient‬ ‭identifiers,‬ ‭from‬ ‭1200‬ ‭manufacturers,‬ ‭of‬ ‭which‬‭the‬‭top‬‭five‬‭drugs‬‭related‬‭to‬‭ADRs‬‭are‬ ‭methotrexate,‬ ‭actemra,‬ ‭prednisone,‬ ‭sulfasalazine,‬ ‭and‬ ‭rituximab.‬ ‭Three‬‭of‬‭these‬ ‭drugs‬ ‭treat‬ ‭rheumatoid‬ ‭arthritis,‬ ‭and‬ ‭the‬ ‭other two treat inflammation and cancer.‬ ‭4.6 Modeling‬ ‭4.6.1‬ ‭Input‬ ‭Feature‬ ‭Transformation‬ ‭Pipeline.‬ ‭After‬‭splitting‬‭the‬‭data‬‭80/10/10,‬‭a‬ ‭feature‬ ‭transformation‬ ‭pipeline‬ ‭was‬ ‭applied‬ ‭to‬ ‭the‬ ‭three‬ ‭splits‬‭separately‬‭to‬‭prevent‬‭data‬ ‭leakage,‬ ‭except‬ ‭for‬ ‭the‬ ‭drug‬ ‭name‬ ‭variable‬ ‭which‬ ‭needed‬ ‭to‬ ‭be‬ ‭one-hot‬ ‭encoded‬ ‭with‬ ‭the‬ ‭same‬ ‭drugs‬ ‭for‬ ‭the‬ ‭entire‬ ‭dataset.‬ ‭This‬ ‭allows‬ ‭for‬ ‭the‬ ‭representation‬ ‭of‬ ‭all‬‭drugs‬‭in‬ ‭the‬ ‭FAERS‬ ‭data‬ ‭to‬ ‭be‬ ‭present‬ ‭in‬ ‭all‬ ‭data‬ ‭splits.‬ ‭Numerical‬ ‭features‬ ‭were‬ ‭centered,‬ ‭scaled,‬ ‭and‬ ‭skew-corrected,‬ ‭and‬ ‭categorical‬ ‭features‬ ‭were‬ ‭one-hot‬ ‭encoded‬ ‭(c-1‬ ‭dummies).‬‭The‬‭final‬‭list‬‭of‬‭input‬‭features‬‭are‬ ‭in‬ ‭Table‬ ‭4.‬ ‭The‬ ‭training‬ ‭set‬ ‭was‬ ‭undersampled‬‭to‬‭achieve‬‭the‬‭same‬‭frequency‬ ‭counts‬ ‭for‬ ‭each‬ ‭level‬ ‭of‬ ‭the‬ ‭outcome‬ ‭(‬ ‭n‬ ‭=‬ ‭2,691 per Class).‬ ‭Table 4‬ ‭Input Features Descriptives‬ ‭Feature‬ ‭Description‬ ‭Drug‬ ‭(binary matrix)‬ ‭Age‬ ‭Age in years‬ ‭Sex‬ ‭Male/Female‬ ‭Weight‬ ‭Weight in kg‬ ‭Price‬ ‭Drug price at time of ADR‬

‭Country‬

‭81‬

‭USA‬

‭66.3%‬

‭Figure 10‬ ‭Report Source by Outcome‬

‭4.5.4‬ ‭Text‬ ‭Input‬ ‭features.‬ ‭Text‬ ‭features‬ ‭include‬ ‭drug‬ ‭label‬ ‭information‬ ‭(ingredients,‬ ‭warnings,‬ ‭manufacturer‬ ‭list,‬ ‭drug‬ ‭names,‬ ‭drug‬ ‭purpose),‬ ‭drug‬ ‭compounds,‬ ‭medicinal‬ ‭product,‬ ‭drug‬ ‭indication,‬ ‭and‬ ‭adverse‬ ‭event‬ ‭reaction‬ ‭descriptions‬ ‭(binary).‬ ‭The‬ ‭free‬‭text‬ ‭was‬‭cleaned‬‭and‬‭tokenized.‬‭The‬‭tokens‬‭were‬ ‭assessed‬ ‭with‬ ‭a‬ ‭descriptive‬ ‭statistics‬ ‭function‬ ‭for‬ ‭number‬ ‭of‬ ‭unique‬ ‭tokens,‬ ‭total‬ ‭tokens,‬ ‭lexical‬ ‭diversity,‬ ‭and‬ ‭most‬ ‭common‬ ‭tokens.‬ ‭4.5.4.1‬ ‭Text‬ ‭Token‬ ‭Insights.‬ ‭The‬ ‭National‬ ‭Drug‬ ‭code‬ ‭database‬‭contains‬‭just‬‭over‬‭8000‬ ‭currently‬ ‭marketed‬ ‭drugs‬ ‭with‬ ‭105,000‬ ‭formulations.‬ ‭The‬ ‭top‬ ‭three‬ ‭most‬ ‭produced‬ ‭drug‬‭products‬‭are‬‭ibuprofen,‬‭gabapentin,‬‭and‬ ‭oxygen.‬ ‭The‬ ‭top‬ ‭manufacturers‬ ‭and‬ ‭distributors‬ ‭based‬ ‭on‬ ‭number‬ ‭marketed‬ ‭products‬ ‭are‬ ‭companies‬ ‭that‬ ‭produce‬ ‭supplements‬ ‭and‬ ‭homeopathic‬ ‭drugs.‬ ‭The‬ ‭Labels‬ ‭database‬ ‭has‬ ‭information‬ ‭from‬ ‭47,000‬ ‭active‬ ‭drug‬ ‭codes‬ ‭with‬ ‭4800‬ ‭unique‬ ‭ingredient‬ ‭identifiers‬ ‭with‬ ‭7800‬ ‭different‬ ‭marketed‬ ‭drug‬ ‭names.‬ ‭The‬ ‭highest‬ ‭frequency‬ ‭drug‬ ‭types‬ ‭are‬ ‭over-the-counter‬ ‭drugs‬ ‭like‬ ‭sunscreen,‬ ‭antiseptics,‬ ‭and‬ ‭fever‬ ‭reducers,‬‭since‬‭they‬‭can‬‭be‬‭manufactured‬‭by‬ ‭multiple companies.‬

167

Made with FlippingBook - Online Brochure Maker