ADS Capstone Chronicles Revised
13
Figure 8 Average Macronutrient Composition By Food Categories
4.1.2 Data Quality - Food Data Ensuring data quality is a crucial step in preparing datasets for analysis. The food data were thoroughly examined to identify missing values, outliers, and other inconsistencies. Appropriate handling methods were implemented to maintain the integrity of the data and ensure the data were suitable for downstream analysis. 4.1.2.1 Menu Food Data The menu dataset was analyzed for missing values, which were observed in several columns: carbohydrates, sugars, fats, saturated fats, cholesterol, sodium, fiber, potassium, and proteins. Among these, sugar had the highest number of missing values at 100, while carbohydrates, fats, proteins, and sodium each had 32 missing values. As previously shown in Figures 2 and 3, the distributions of these features were right-skewed, prompting the decision to impute missing values with the median, which is less
sensitive to extreme values compared to the mean. Missing values in critical nutritional columns, such as carbohydrates, fats, and proteins, were imputed with their respective global medians to ensure robustness against outliers and alignment with the overall data distribution. For secondary columns, missing values were imputed using grouped medians by restaurant name to leverage contextually similar data. Any remaining missing values were filled with the global median to ensure completeness. Additionally, invalid entries, such as rows with zero or negative calorie values, were removed to maintain dataset reliability. No significant outliers were observed in this dataset. Given the right-skewed distributions of key nutrients, no values were removed solely based on extremity; instead, statistical summaries were used to identify meaningful trends.
215
Made with FlippingBook - Online Brochure Maker