ADS Capstone Chronicles Revised
9
Figure 3 Correlation Matrix of Numeric Features
than 50% of missing values (see Figure 4), such as ‘number_of_providers_change’ and ‘total_payment_change.’ Areas highlighted in a majority of yellow in Figure 4, were excluded as they exhibit high levels of missing data. This decision was based on evaluating the kurtosis of numeric features before and after the imputing of missing values. The comparison revealed that filling
4.2 Data Quality and Preparation To ensure high data quality is used for analysis, data completeness, consistency, accuracy, and integrity are important factors to be considered. During data cleaning and preprocessing, the dataset revealed several columns with more
133
Made with FlippingBook - Online Brochure Maker