M.S. Applied Data Science - Capstone Chronicles 2025
39
time-related features like month_sin and month_cos consistently ranked high in importance, highlighting the relevance of both textual and temporal information. These insights support the interpretability of the model and offer valuable guidance for future data collection and refinement efforts. 6.3 Recommended Next Steps/Future Studies A valuable direction for future research lies in the improved modeling of text-based variables, which were underutilized in the current study. While basic preprocessing and simple representations (such as bag-of-words or TF-IDF) may be enough for preliminary analyses, these methods struggle to capture the deeper semantic and contextual nuances embedded in natural language. Future studies should consider leveraging advanced transformer-based language models—specifically the large language model meta artificial intelligence (LLaMA) architecture—which has shown substantial improvements in understanding and generating human-like text across diverse domains. LLaMA models are trained on large-scale corpora and can produce high-quality embeddings that reflect word usage, context, syntax, and tone. These features would be particularly beneficial for capturing nuanced patterns within user-generated text, reviews, or descriptions that might hold predictive value in our dataset (Touvron et al., 2023). One significant limitation of this project is the time constraint inherent to its scope. Conducted over a period of only seven weeks, the project faced practical challenges in terms of depth, refinement, and scalability. While the results provide meaningful insights and demonstrate the feasibility of using machine learning for FDA
recall classification, the condensed timeline limited the ability to fully explore advanced modeling techniques, conduct extensive hyperparameter tuning, or incorporate additional external data sources. In contrast, a longer-term project—spanning several months—would allow for more robust experimentation, iterative validation, and potentially more nuanced model deployment. Due to this short timeframe, certain enhancements such as incorporating deep learning-based NLP models (e.g., transformers), expanding the dataset beyond FDA sources, or integrating real-time data pipelines had to be deprioritized. Future iterations of this research, given more time and resources, could address these areas and further strengthen the model’s generalizability and practical application. ACKNOWLEDGMENTS Gratitude is extended to Dr. Ebrahim Tarshizi for his guidance and valuable feedback throughout the course of this research. Appreciation is also given to Dr. Argus Sun for providing foundational knowledge on FDA recall classification systems and regulatory frameworks. The authors would like to acknowledge the use of ChatGPT (OpenAI, 2025) and Claude (Anthropic, 2025) for grammar checking and language refinement.
References An, J. (2024). Structural topic modeling for corporate social responsibility of food
supply chain management: Evidence from FDA recalls on plant-based food products. Social Responsibility Journal, 20 (6), 1089–1100.
43
Made with FlippingBook flipbook maker