ADS Capstone Chronicles Revised

25

Similarly to the menu data correlations, the direct relationship between calories and the macronutrients-carbohydrates, fats, and proteins-results in a high correlation between

these features. There also exists a strong correlation between the score and carbohydrates, proteins, and fiber, which aligns with the expectations as these are used to calculate the score.

Figure 19 Correlation Matrix of Individual Foods Including Scores

4.4 Modeling There are two avenues taken to ensure proper recommendations are made for a diabetic patient with both restaurant menu choices and individual food/ingredient choices. Regression modeling methods are first used to determine if the nutritional content of menu items and/or individual foods can properly predict the patient score described prior. Predicting this score is the initial step in determining if items are suitable for consumption based on personalized health factors that compose the patient scores. Regression models that are trained and tested include Linear Regression, Random Forest Regression, XGBoost. Support Vector Regression was also conducted but only for the Individual Foods dataset. Secondly, classification methods are used to determine if

4.3.2 Patient Data A new column–GlucoseRank– was engineered to categorize the patient glucose value into a ranking based on the expected value. The following rules are used to assign each patient to a glucose value rank:

● Low: Glucose Value <= 79 ● High: Glucose Value >= 130

If a value falls between the low and high distinctions, then the glucose value is considered normal. These values are assuming the glucose level is being checked before a meal. Lastly, an ID column is added as a unique identifier for each patient, based on the original index of the dataframe.

227

Made with FlippingBook - Online Brochure Maker