ADS Capstone Chronicles Revised

10

missing values with zeros significantly altered the kurtosis of the data features; thus, columns with over 50% missing data were dropped. For the remaining rows with missing values that were less than 50%, the mode was imputed to avoid biased analysis.

Additionally, inconsistent data types were identified and corrected using a function that removes special characters such as percentages, commas, and parentheses. Numeric columns were also converted to the

appropriate data types as they were previously stored as string values, to facilitate easier data manipulation.

Figure 4 Missing Data Distribution

data, aiding in identifying data points that deviated from the expected range. Since there were many outliers due to the nature of the data, all outliers will be kept ensuring the full range of data is captured and analyzed.

The distribution of the numeric data was visualized using boxplot visualizations (see Figure 5) which also allowed for the detection of outliers. Figure 5 displays the central tendencies and variability within the

134

Made with FlippingBook - Online Brochure Maker