ADS Capstone Chronicles Revised
10
missing values with zeros significantly altered the kurtosis of the data features; thus, columns with over 50% missing data were dropped. For the remaining rows with missing values that were less than 50%, the mode was imputed to avoid biased analysis.
Additionally, inconsistent data types were identified and corrected using a function that removes special characters such as percentages, commas, and parentheses. Numeric columns were also converted to the
appropriate data types as they were previously stored as string values, to facilitate easier data manipulation.
Figure 4 Missing Data Distribution
data, aiding in identifying data points that deviated from the expected range. Since there were many outliers due to the nature of the data, all outliers will be kept ensuring the full range of data is captured and analyzed.
The distribution of the numeric data was visualized using boxplot visualizations (see Figure 5) which also allowed for the detection of outliers. Figure 5 displays the central tendencies and variability within the
134
Made with FlippingBook - Online Brochure Maker