ADS Capstone Chronicles Revised
9
While training and assessing models, slight decreases in performance will betradedfor increased interpretability, ensuring that the output of the model can be easily usedand accessedbythegeneralpublic.Inincreasing accessibility, black-box models,whichrefer to models that have unclear mechanisms regardingpredictionsandresults,willnotbe used. Many of the models in the aforementioned literature (deep neural networks, recommender-basedmethods)are black box models. These models limit patientandproviderunderstandingandtrust ofresults,whichisimportantinthemedical field (Xu & Shuttleworth, 2024). This project prioritized transparency, to ultimately deliver a useful product to the generalpublic,unlikemuchoftheliterature that seeks to generate the best performing model. Themodelswillalsoincludeuptodatedata fromtheFAERSwhichgivesnovelinsights in real time; in contrast to much of the literaturewhichassessedmodelperformance against datasets over a decade old. An Apache Airflow trigger will update the model on a quarterly basis, therefore creating a living pipeline that stays current with the latest data releases from government APIs. This ensures that information pertaining to new drugs is continuously added, and that relevance of the tools (database, dashboard, application) are maintained for users (Figure 4). Figure 4 Surveillance System Architecture
4.1 Data Preparation The data used in this project is public, de-identified data, and therefore does not require informed consent or privacy protections. The data preparation phase leveraged the following open-source frameworks: MySQL (Oracle Corporation, 2022), Jupyter Notebook (Project Jupyter, 2023)withPythonversion3.9.18,openFDA API (FDA, n.d.a), and Data.Medicaid API (Centers for Medicare and Medicaid Services,2024).Alldatapreparationcodeis withinthe“DataProcessing.ipynb”notebook file. 4.1.1 Static Data Source. Two static files (version 3.3) were downloaded from ADReCS(Caietal.,2015)andstoredinour GitHub folder called ADReCS (Staggs & van der Wagt, n.d.). The first file contains adverse drug reaction ontology (ADR_ontology_3.3.xlsx) and the second contains standardized information on drug compounds (Drug_information_v3.3.xlsx). These files were used to understand drug names and terms in the event of ambiguity in FAERS data. 4.1.2APIDataRequests. APIrequestswere developed based on each API’s requirements. Execution times were intermittently paused inrandomintervalsto preventoverwhelminghostserversbasedon API-specifiedratelimits.Duringtestingand development, a small sample of data was pulled from each API to reduce computational load. Debugging statements are included in the code for each API request. Primary data was sourced from the FDA’s API endpoints (adverse events, labels, manufacturers, documents; FDA, n.d.a). Data was requested with API keys which
4 Methodology All code for this project is stored in a GitHub repository (Staggs &vanderWagt, n.d.).
159
Made with FlippingBook - Online Brochure Maker