ADS Capstone Chronicles Revised

8

database of nutritional information managed by the USDA. Using a free API key, which allows up to 1,000 calls per day, data were retrieved incrementally to ensure API usage limits were respected. The acquisition process involved iterating through a predefined list of diabetic-friendly foods and using the Python requests library to query the API. The search function retrieves a list of matching items for each food name, and the detailed nutritional information, such as calories, carbohydrates, fiber, sugars, fats, and proteins, is extracted using the food's unique identifier (FDC ID). The retrieved data are stored in a structured dataframe, including relevant fields like food description, brand, and category. The dataset is compiled over several iterations to accommodate the API’s rate limits, ensuring that a robust collection of nutritional information is gathered. Finally, the data are exported as a CSV file for further preprocessing and integration into the recommendation system. Ultimately, these two food datasets will contribute to creating a food recommendation system tailored for diabetic individuals. The system will assist users who are dining at a restaurant and want to complement their meal with individual food items, or those who need to select a snack prior to dining out. By analyzing these datasets separately, we ensure tailored insights. For instance, suitability scoring for menu items accounts for the variability in prepared meals, whereas scoring for individual food items focuses on single-ingredient products that can serve as dietary staples. 4.1.1 Exploratory Data Analysis - Food Data Exploratory Data Analysis (EDA) used both graphical and nongraphical methods to examine

the characteristics of the datasets. This step was crucial in understanding the data structure, identifying potential issues, and preparing an ideal dataset for model development. 4.1.1.1 Menu Food Data The restaurant menu dataset contained 1,300 rows and 14 columns, providing comprehensive nutritional information for various menu items. The dataset includes features such as calories, carbohydrates, sugars, fats, saturated fats, cholesterol, sodium, fiber, potassium, proteins, serving size, and units. To gain insights into the dataset's structure and handle missing values effectively, histograms and boxplots were generated for key nutrients. As illustrated in Figures 2 and 3, the distributions of nutrients are predominantly right-skewed, indicating the presence of outliers and a concentration of values on the lower end. These visualizations provided a deeper understanding of the data distributions and informed the decision to use the median for imputing missing values, ensuring robust handling of outliers and skewed data. After the missing values were handled (explained in section 4.1.2) more visualizations were observed. To better understand the nutritional profiles of restaurant menu items, macronutrient percentages of carbohydrates, fats, and proteins were calculated as a proportion of total calories. The percentages were computed using standard nutritional conversion factors: 4 calories per gram of carbohydrates and proteins, and 9 calories per gram of fats. These values were then aggregated for the top 10 restaurants by frequency in the dataset.

210

Made with FlippingBook - Online Brochure Maker