ADS Capstone Chronicles Revised

First page Table of contents Previous page 160 Next page Last page

‭10‬

‭allow‬‭240‬‭requests‬‭per‬‭minute‬‭with‬‭a‬‭max‬‭of‬ ‭120,000‬ ‭requests‬ ‭per‬ ‭day‬ ‭per‬ ‭key.‬ ‭The‬ ‭data‬ ‭is‬‭updated‬‭quarterly‬‭and‬‭stored‬‭in‬‭JavaScript‬ ‭Object‬ ‭Notation‬ ‭(‭J‬ SON)‬ ‭format.‬ ‭The‬ ‭exact‬ ‭date‬ ‭of‬ ‭data‬ ‭updates‬ ‭have‬ ‭historically‬ ‭not‬ ‭landed‬ ‭on‬ ‭the‬ ‭exact‬ ‭same‬ ‭dates‬ ‭and‬ ‭will‬‭be‬ ‭monitored‬ ‭to‬ ‭keep‬ ‭the‬ ‭trigger‬ ‭system‬ ‭up-to-date.‬ ‭The‬ ‭FAERS‬ ‭database‬ ‭is‬ ‭accessed‬ ‭via‬ ‭the‬ ‭OpenFDA‬ ‭Drug‬ ‭Adverse‬ ‭Event‬ ‭API‬ ‭(>28‬ ‭million‬ ‭records;‬ ‭FDA,‬‭2023b).‬‭There‬‭are‬‭42‬ ‭fields‬ ‭-‬ ‭numerical,‬ ‭dates,‬ ‭categorical,‬ ‭free‬ ‭text‬ ‭-‬ ‭and‬ ‭some‬ ‭of‬ ‭these‬ ‭fields‬ ‭are‬ ‭nested‬ ‭dictionary‬ ‭lists‬ ‭that‬ ‭were‬ ‭expanded‬ ‭into‬ ‭additional‬ ‭dataframes.‬ ‭The‬ ‭API‬ ‭request‬ ‭contained‬ ‭parameters‬ ‭to‬ ‭return‬ ‭data‬‭without‬ ‭any‬‭missing‬‭values‬‭for‬‭age,‬‭sex,‬‭and‬‭weight,‬ ‭and‬ ‭only‬ ‭reports‬ ‭from‬ ‭healthcare‬ ‭professionals,‬ ‭thereby‬ ‭returning‬ ‭a‬ ‭complete‬ ‭dataset‬ ‭for‬ ‭the‬ ‭latest‬ ‭three‬ ‭months‬ ‭of‬ ‭data‬ ‭(January‬ ‭23,‬ ‭2024‬ ‭to‬ ‭April‬ ‭23,‬ ‭2024‬ ‭at‬ ‭the‬ ‭time of this project).‬ ‭The‬ ‭historical‬ ‭record‬ ‭API‬ ‭contains‬ ‭press‬ ‭releases‬‭and‬‭public‬‭announcements‬‭from‬‭the‬ ‭FDA‬ ‭and‬ ‭its‬ ‭predecessors‬ ‭(3‬ ‭fields),‬ ‭the‬ ‭OpenFDA‬ ‭pharmaceutical‬ ‭drug‬ ‭label‬ ‭API‬ ‭contains‬ ‭drug‬ ‭marketing‬ ‭and‬ ‭label‬ ‭information‬ ‭(140‬ ‭fields),‬ ‭the‬ ‭National‬ ‭Drug‬ ‭Code‬‭API‬‭contains‬‭active‬‭drug‬‭manufacturer‬ ‭information‬ ‭and‬ ‭actively‬ ‭marketing‬ ‭drugs‬ ‭which is updated daily (FDA, n.d.a).‬ ‭The‬ ‭Data.Medicaid‬ ‭API‬ ‭was‬ ‭used‬ ‭for‬ ‭national‬ ‭average‬ ‭drug‬ ‭acquisition‬ ‭costs‬ ‭and‬ ‭is‬ ‭updated‬ ‭weekly‬ ‭(12‬ ‭variables;‬ ‭CMMS,‬ ‭2024‬‭).‬ ‭RxNorm‬ ‭is‬ ‭a‬ ‭database‬ ‭system‬ ‭that‬ ‭contains‬ ‭standardized‬ ‭information‬ ‭on‬ ‭drug‬ ‭compounds‬ ‭and‬ ‭their‬ ‭respective‬ ‭classifications‬ ‭(NIH,‬ ‭n.d.).‬ ‭openFDA‬ ‭and‬ ‭Data.Medicaid‬‭use‬‭different‬‭versions‬‭of‬‭drug‬ ‭codes.‬ ‭Thus,‬ ‭to‬ ‭link‬ ‭information‬ ‭between‬ ‭openFDA‬‭and‬‭Data.Medicaid,‬‭all‬‭drug‬‭codes‬

‭from‬‭compounds‬‭reported‬‭FAERS‬‭data‬‭were‬ ‭requested‬ ‭from‬ ‭RxNorm’s‬ ‭ndcproperties‬ ‭API endpoint.‬ ‭4.2 Preprocessing‬ ‭The‬ ‭raw‬ ‭data‬ ‭from‬ ‭the‬ ‭API‬ ‭endpoints‬ ‭went‬ ‭through‬‭a‬‭series‬‭of‬‭cleaning‬‭steps‬‭depending‬ ‭on‬ ‭data‬ ‭types.‬ ‭First,‬‭the‬‭JSON‬‭outputs‬‭from‬ ‭the‬ ‭API‬ ‭requests‬ ‭were‬ ‭converted‬ ‭to‬ ‭dataframes.‬‭The‬‭data‬‭frames‬‭were‬‭examined‬ ‭for data structure, quality, and redundancy.‬ ‭4.2.1‬ ‭Functions.‬ ‭Multiple‬ ‭custom‬ ‭functions‬ ‭(“Functions.ipynb”)‬ ‭were‬ ‭created‬ ‭to‬ ‭aid‬ ‭in‬ ‭general‬ ‭preprocessing‬ ‭(Staggs‬ ‭&‬ ‭van‬ ‭der‬ ‭Wagt,‬ ‭n.d.).‬ ‭These‬ ‭include‬ ‭functions‬ ‭for‬ ‭natural‬ ‭language‬ ‭processing‬ ‭(NLP)‬ ‭pipelines,‬ ‭adding‬ ‭index‬ ‭columns,‬ ‭exploring‬ ‭and‬ ‭handling‬ ‭null‬ ‭fields,‬ ‭removing‬ ‭duplicates,‬ ‭standardizing‬ ‭age‬ ‭based‬ ‭on‬ ‭unit‬ ‭of‬ ‭measurement,‬ ‭imputing‬ ‭missing‬ ‭value,‬ ‭examining‬ ‭text‬ ‭token‬ ‭length‬ ‭outliers,‬ ‭and‬ ‭calculating‬ ‭summary‬ ‭statistics‬ ‭for‬ ‭model‬ ‭performance‬ ‭and‬ ‭input‬ ‭features.‬ ‭Each‬ ‭text‬ ‭variable‬ ‭was‬ ‭processed‬ ‭with‬ ‭different‬ ‭cleaning‬ ‭steps‬ ‭based‬ ‭on‬ ‭the‬ ‭underlying‬ ‭structure and inconsistencies.‬ ‭4.2.1.1‬ ‭Optimization‬ ‭Wrapper‬ ‭Function‬ ‭.‬ ‭A‬ ‭wrapper‬ ‭function‬ ‭was‬ ‭created‬ ‭to‬ ‭allow‬ ‭for‬ ‭parallel‬ ‭processing‬ ‭on‬ ‭all‬ ‭available‬ ‭CPU‬ ‭cores‬ ‭for‬ ‭all‬ ‭custom‬ ‭functions.‬ ‭This‬ ‭improved‬ ‭processing‬‭times‬‭for‬‭text‬‭cleaning‬ ‭and text transformation functions.‬ ‭4.2.2 Cleaning and Data Quality by Table‬ ‭4.2.2.1‬‭Documents‬‭Table.‬ ‭The‬‭data‬‭extracted‬ ‭from‬ ‭the‬ ‭historical‬ ‭documents‬ ‭API‬ ‭released‬ ‭by‬ ‭the‬ ‭FDA‬ ‭contained‬ ‭no‬ ‭missing‬ ‭values.‬ ‭The‬ ‭text‬ ‭contents‬ ‭of‬ ‭the‬ ‭documents‬ ‭were‬ ‭scanned‬‭to‬‭find‬‭any‬‭drug‬‭names‬‭and‬‭adverse‬ ‭drug‬ ‭reactions.‬ ‭Drug‬ ‭names‬ ‭were‬ ‭matched‬ ‭against‬ ‭brand‬ ‭and‬ ‭generic‬ ‭names‬ ‭from‬ ‭the‬ ‭labels‬ ‭dataframe,‬ ‭and‬ ‭drug‬ ‭reactions‬ ‭were‬ ‭matched‬ ‭against‬ ‭all‬ ‭reactions‬ ‭in‬ ‭the‬ ‭patient‬

160

Made with FlippingBook - Online Brochure Maker