ADS Capstone Chronicles Revised

13

Figure 6 PCA Scree Plot

Following this, a heatmap (see Figure 7) was used to highlight the top 6 features that are most prominent in the dataset. Notably, among these features are total payment, number of users, number of providers, average number of providers per county, number of fee-for-service beneficiaries, and average number of users per provider. These features were identified as significant due to their high correlation with the principal components, indicating their importance in explaining the variance in the dataset. Redundant features, such as columns with the word ‘average,’ were removed to minimize multicollinearity and enhance feature importance. These ‘average’ columns were redundant as they derived their values from other columns within the dataset. This was determined through research and domain knowledge of the dataset.

The scree plot displays the eigenvalues associated with each principal component, examining Figure 6, it is possible to identify the “elbow” point which indicates a balance between capturing a sufficient variance and simplifying the data. In this case, three principal components were elected based on the scree plot.

137

Made with FlippingBook - Online Brochure Maker