ADS Capstone Chronicles Revised

6

by Jikadara (2024), found on Kaggle. The raw data was cleaned and preprocessed to ensure accuracy and consistency. This entailed addressing missing values in the dataset using various methods. Specifically, the ‘year’, ‘total_purchases’, ‘product_category’, ‘feedback, ratings’, ‘customer_segment’, and ‘income’ fields were filled with the mode (most frequent value). Missing values in ‘product_brand’(which accounted for 0.09% of the data), ‘month’, ‘time’, and ‘date’ were handled by dropping the corresponding rows. Additionally, missing values in fields such as ‘amount’ and ‘total_amount’ were filled with the mean value. For missing values in the ‘phone’ column, unique random 10-digit phone numbers were generated and used to fill in the missing entries. Special attention was given to aligning the data with the table schemas, including standardizing date formats and ensuring correct data types. For data insertion, Python was used along with libraries such as pandas, SQLAlchemy, and PyMySQL to automate the process. The cleaned data, stored in CSV files, was read into pandas DataFrames, converted into appropriate formats, and inserted into the MySQL database using SQLAlchemy. Post-insertion, SQL queries were run to verify the accuracy and completeness of the data. This verification process included listing all tables, retrieving sample data from each table, and checking foreign key relationships to ensure data integrity. The data were then subjected to advanced data mining techniques, such as clustering and association rule mining, to identify patterns and relationships. Predictive analytics models were employed to forecast future trends and customer preferences. Finally, the insights gained from data analysis were integrated into key operational workflows, including inventory management, supply chain logistics, and personalized marketing campaigns. This integration enabled real-time adjustments

based on emerging trends and marketing dynamics, leading to more competitive pricing strategies and improved profitability. 4.1 Data Acquisition and Aggregation The data acquisition and aggregation process in this study involves collecting and consolidating information from various sources within the operational process. Azure SQL Database is utilized as the primary platform for storing structured data. This platform is chosen for its robust capabilities in managing large datasets, scalability, and integration with other data processing tools. As illustrated in Figure 1, the data stored in the Azure SQL Database encompasses several critical domains, including customer information, transaction details, product information, feedback, and logistics. Customer information included personal details such as names, contact information, and demographic data, which are essential for analyzing customer behaviors and preferences. Transaction details cover data on individual sales, including transaction IDs, timestamps, purchased products, quantities, and payment methods. This information is crucial for tracking sales trends and customer purchasing patterns. Product information is another vital component, encompassing data on product categories, brands, specifications, and inventory levels. This data helps in managing the product catalog and analyzing product performance. Feedback data includes customer reviews and ratings, providing valuable insights into customer satisfaction and areas for improvement. Finally, logistics data captures information on shipping, delivery, and supply chain management, which is essential for optimizing operational efficiency and ensuring timely delivery of products. By aggregating data from these various sources, the study aims to provide a comprehensive view of the e-commerce ecosystem, enabling detailed

50

Made with FlippingBook - Online Brochure Maker