ADS Capstone Chronicles Revised

‭13‬

‭to‬ ‭generate‬ ‭binary‬ ‭reaction‬ ‭indicator‬ ‭columns.‬ ‭4.3.1 Binary Reaction Indicator Columns.‬ ‭Binary‬ ‭reaction‬ ‭indicator‬ ‭columns‬ ‭were‬ ‭added‬ ‭to‬ ‭both‬ ‭the‬ ‭documents‬ ‭and‬ ‭events‬ ‭dataframe.‬ ‭Each‬ ‭unique‬ ‭value‬ ‭in‬ ‭the‬ ‭reactions‬ ‭field‬ ‭from‬ ‭the‬ ‭events‬ ‭dataframe‬ ‭was‬ ‭stored‬ ‭in‬ ‭a‬ ‭list.‬‭This‬‭list‬‭was‬‭filtered‬‭to‬ ‭limit‬‭the‬‭character‬‭length‬‭of‬‭each‬‭reaction‬‭to‬ ‭64‬‭(MySQL’s‬‭column‬‭name‬‭character‬‭limit).‬ ‭Additionally,‬ ‭only‬ ‭the‬ ‭1000‬ ‭most‬ ‭frequent‬ ‭reactions‬ ‭were‬ ‭in‬ ‭the‬ ‭list‬ ‭to‬ ‭aid‬ ‭in‬ ‭dimensionality‬‭reduction.‬‭These‬‭values‬‭were‬ ‭then‬ ‭each‬ ‭added‬ ‭as‬ ‭a‬‭new‬‭binary‬‭column‬‭to‬ ‭the‬ ‭documents‬‭and‬‭events‬‭dataframe,‬‭with‬‭a‬ ‭default‬ ‭value‬ ‭of‬ ‭0.‬ ‭In‬ ‭the‬ ‭documents‬ ‭dataframe,‬ ‭if‬ ‭any‬ ‭text‬ ‭matched‬ ‭against‬ ‭unique‬ ‭drugs‬ ‭from‬ ‭the‬ ‭labels‬ ‭dataframe,‬ ‭it‬ ‭was‬‭extracted‬‭and‬‭saved‬‭as‬‭a‬‭new‬‭dataframe‬ ‭row‬‭in‬‭a‬‭newly‬‭created‬‭drugs‬‭column.‬‭If‬‭the‬ ‭associated‬ ‭article‬ ‭contained‬ ‭text‬ ‭matching‬ ‭any‬‭of‬‭the‬‭unique‬‭reactions,‬‭the‬‭binary‬‭value‬ ‭was changed to 1.‬ ‭4.3.2‬ ‭National‬ ‭Drug‬ ‭Code‬ ‭Standardizing.‬ ‭National‬ ‭Drug‬ ‭Codes‬ ‭(NDCs)‬ ‭can‬ ‭come‬ ‭in‬ ‭various‬ ‭formats,‬ ‭which‬ ‭are‬ ‭all‬ ‭related,‬ ‭but‬ ‭vary‬ ‭in‬ ‭structure‬ ‭(Table‬ ‭1).‬ ‭The‬ ‭first‬ ‭set‬ ‭of‬ ‭numbers‬ ‭refer‬ ‭to‬ ‭manufacturer‬ ‭code,‬ ‭the‬ ‭middle‬ ‭set‬ ‭refers‬ ‭to‬ ‭the‬ ‭drug‬ ‭formulation‬ ‭code,‬‭and‬‭the‬‭last‬‭two‬‭numbers‬‭in‬‭the‬‭11‬‭and‬ ‭10‬ ‭version‬ ‭refer‬ ‭to‬ ‭the‬ ‭specific‬ ‭packaging‬ ‭version.‬ ‭Version‬ ‭9‬ ‭does‬ ‭not‬ ‭contain‬ ‭information‬ ‭on‬ ‭packaging‬ ‭and‬ ‭is‬ ‭the‬ ‭most‬ ‭common‬ ‭version‬ ‭found‬ ‭across‬‭data‬‭sources.‬ ‭To‬ ‭match‬ ‭NDCs‬ ‭across‬ ‭data‬ ‭sources,‬ ‭string‬ ‭cleaning‬ ‭functions‬ ‭were‬ ‭used‬ ‭to‬ ‭make‬ ‭version‬ ‭11‬ ‭into‬ ‭9‬ ‭when‬ ‭needed,‬ ‭if‬ ‭both‬ ‭numbers‬ ‭were‬ ‭not‬ ‭available‬ ‭from‬ ‭the‬ ‭same‬ ‭source. Version 10 was not used.‬ ‭Table 1‬ ‭National Drug Code Standardization‬ ‭Version‬ ‭String Format‬ ‭Standardized‬ ‭11‬ ‭12345006789‬ ‭Remove “89”‬

‭10‬

‭12345-067-89‬

‭Not used‬

‭9‬

‭12345-067‬

‭Replace “-” with “0”‬

‭4.4 Database Creation‬ ‭A‬ ‭new‬ ‭database‬ ‭was‬ ‭created‬ ‭called‬ ‭pharma_db‬ ‭using‬ ‭a‬ ‭local‬ ‭SQL‬ ‭connection‬ ‭(Figure‬ ‭6).‬ ‭Table‬ ‭parameters‬ ‭were‬‭specified‬ ‭based‬‭on‬‭the‬‭datatype‬‭characteristics‬‭of‬‭each‬ ‭respective‬ ‭processed‬ ‭dataframe.‬ ‭Each‬ ‭data‬ ‭table has a unique, primary index:‬ ‭-doc_id‬ ‭-event_id‬ ‭-patient_reaction_id‬ ‭-patient_drug_id‬ ‭Additional‬ ‭indices‬ ‭were‬ ‭created‬ ‭for‬ ‭rxcui‬ ‭and‬ ‭ndc‬ ‭codes‬ ‭across‬ ‭the‬ ‭prices,‬ ‭patient‬ ‭drugs,‬ ‭and‬ ‭labels‬ ‭tables.‬ ‭Constraints‬ ‭were‬ ‭placed‬ ‭on‬ ‭the‬ ‭patient_drugs‬ ‭and‬ ‭patient_reactions‬ ‭whereby‬ ‭any‬ ‭updates‬ ‭made‬ ‭to‬ ‭the‬ ‭parent‬ ‭table,‬ ‭adverse_events‬ ‭,‬ ‭would‬ ‭also‬ ‭update‬ ‭the‬ ‭two‬ ‭child‬ ‭tables.‬ ‭VARCHAR‬‭limits‬‭for‬‭text‬‭data‬‭and‬‭BIGINT‬ ‭limits‬ ‭were‬ ‭determined‬ ‭based‬ ‭on‬ ‭the‬ ‭text‬ ‭lengths and NDC code lengths.‬ ‭-label_id‬ ‭-price_id‬ ‭-manu_id‬

163

Made with FlippingBook - Online Brochure Maker