M.S. Applied Data Science - Capstone Chronicles 2025
11
Figure 5 Proportion of Class I recalls by Year-Month
Note. This figure illustrates the Heatmap of Class I recall by Year-Month.
Descriptive statistics and textual patterns in the “reason for recall” field of the FDA dataset were examined to explore how textual content may correlate with recall severity. The first step involved calculating the word count for each recall reason to assess the length and verbosity of these descriptions. The mean word count was approximately 21.7 words, with a standard
deviation of 17.5 and a maximum of 327 words, indicating substantial variance in the level of detail provided. This word count was subsequently used as a numerical feature in modeling recall severity. To enhance the feature set beyond raw text, we built a structured modeling dataset combining categorical dummies (e.g., product type, status,
15
Made with FlippingBook flipbook maker