M.S. Applied Data Science - Capstone Chronicles 2025
1
Early Detection of High-Risk Product Recalls: A Comparative Study of Multiclass Classification Approaches Lorena Dorado
Parisa Kamizi Applied Data Science Master’s Program Shiley Marcos School of Engineering /
Applied Data Science Master’s Program Shiley Marcos School of Engineering /
University of San Diego ldorado @sandiego.edu
University of San Diego pkamizi @sandiego.edu
ABSTRACT Timely identification and classification of product recalls are essential to safeguarding public health. This study explores the application of machine learning and natural language processing techniques to predict the severity of product recalls issued by the U.S. Food and Drug Administration (FDA). Using a dataset of over 95,000 FDA recall records, the study developed a multiclass classification system that categorizes recalls into Class I, II, or III based on structured features and textual recall descriptions. Feature engineering incorporated temporal patterns, categorical variables, and text-based features such as term frequency-inverse document frequency and word counts. Several classification models—including random forest, XGBoost, decision tree, multilayer perceptron, and logistic regression—were evaluated using metrics such as precision, recall, and F1-score. The random forest model achieved the best overall performance with an F1-score above 0.93. While the model effectively distinguished Class I and II recalls, Class III predictions proved more complex due to overlapping features. A Streamlit dashboard was deployed to demonstrate real-time classification capability. The findings highlight the potential for artificial intelligence-driven tools to enhance
regulatory decision-making, improve recall timeliness, and strengthen consumer protection. KEYWORDS product recalls, recall classification, risk prediction, machine learning, natural language processing, FDA, public health, model evaluation, regulatory analytics, classification modeling 1 Introduction Product recalls serve a critical role in protecting public health and safety across various industries, including food, pharmaceuticals, medical devices, and consumer goods. The process of identifying, classifying, and managing recalls is complex and involves regulatory bodies, manufacturers, and consumers. With the increasing frequency and complexity of recalls, there is a rising demand for efficiency and proactive approaches to recall management and risk prediction. The U.S. FDA categorizes recalls based on the severity of health risks posed by defective products: Class I : Products that could cause serious adverse health consequences or death. Class II : Products that might cause temporary or medically reversible adverse health consequences, with a remote probability of serious outcomes.
5
Made with FlippingBook flipbook maker