ADS Capstone Chronicles Revised

7

3.5 Hypoglycemia event prediction from CGM using ensemble learning

packages, data frames were tested for missing values, outliers, and distributions. Upon findings, data frames were then adjusted and preprocessed accordingly. 4.1 Data Acquisition and Aggregation - FoodData The food data were collected in two distinct datasets: menu-level data (restaurant food items) and individual food items (single foods from different brands). The datasets were analyzed separately because they serve different purposes and have unique characteristics: ● Menu-Level Data: Represents food items from restaurant menus. This dataset is critical for understanding the nutritional quality of menu offerings from various restaurants. ● Individual Food Items: Contains nutritional data for individual ingredients or packaged goods typically sold in grocery stores. This dataset provides insight into standalone food products. The restaurant menu data is sourced from the Nutritionix API, which stores the largest verified nutritional information database (Nutritionix, n.d.). A free API key is available to users with a limited 200 calls per day. The acquisition process works by iterating through a list of specified restaurants, calling the restaurant URL with the Python requests library, then extracting the first twenty menu items and their associated nutritional facts such as carbohydrates, sugars, proteins, and fats. The restaurant meals and nutritional facts are stored in a dataframe for further preprocessing. Given the restraints on calls to the API, the restaurant data is gathered incrementally over several days and appended to the dataframe. The individual food data are sourced from the FoodData Central API, a comprehensive

Research and testing continues to advance with the goal of improving quality of life for all diabetic patients regardless of diagnosis: Type 1, Type 2, prediabetic, etc. In the research article Hypoglycemia event prediction from continuous glucose monitor (CGM) using ensemble learning, CGM data are tested and analyzed for the prediction of hypoglycemia in Type 1 diabetes patients. Hypoglycemia is what a patient experiences when their blood glucose levels drop significantly below the normal range. This project tested 225 diabetic real-world patient data and 11.5 million synthetic CGM records using the ensemble learning approach RUSBoost (Fleischer et al., 2022). This research team found 9 of 10 hypoglycemic events could be properly recognized by the predictive machine learning algorithm. Projects such as the above provide insight into where a food recommendation system such as the one being completed here can continue to evolve. As the team is looking to create a system that uses real time data for personalized recommendations, the potential for a hypoglycemic event is one that could be greatly beneficial to train the recommendation system (Fleischer et al., 2022). 4 Methodology The data sources for the given data science project include finalized data frames of food data and patient records. These two data frames are analyzed separately throughout the exploratory data analysis seeing as they are not comparable in nature. The food data provides an overview of different restaurants and the nutritional content of each presented food, while the patient data represents simulated patient records that include general health data and recorded glucose values. With the appropriate

209

Made with FlippingBook - Online Brochure Maker