ADS Capstone Chronicles Revised
1 3
These initial insights from EDA provide a foundational understanding of the dataset's characteristics and will guide further analytical approaches in optimizing operational efficiency and enhancing customer experience in e-commerce. 4.2 Data Quality To enhance data quality, the issue of missing values was first addressed to ensure consistency and completeness. For numerical fields such as year, total purchases, amount spent, and age, common imputation methods were employed, including the most frequent value (mode) and the average value (mean). For categorical text fields like product categories, feedback, and customer
segments, missing values were filled in with the most common values to maintain uniformity. In instances where product brand information was missing, the incomplete rows were removed to avoid introducing unreliable data. To ensure data format consistency, all date entries were standardized into a uniform format. Additionally, unique random phone numbers were generated for each customer ID to guarantee consistent and unique identifiers. As a result, the database is now clean, consistent, and complete, providing a robust foundation for subsequent analysis. This refined dataset is well-suited for building predictive models and extracting valuable insights.
57
Made with FlippingBook - Online Brochure Maker