ADS Capstone Chronicles Revised
extraction phase. Another feature called ‘load_dt’ was added for data engineering purposes to understand when the last pull occurred. This addition is useful for moving data around and identifying any issues with pulling data. 4.0.2 Features Extracted The features extracted from the data include various vital metrics such as the ‘time,’ the lowest price within the day ‘low,’ the highest price within the specified period ‘high,’ the opening price at the beginning of the period ‘open,’ the closing price at the end of the period ‘close,’ the trading volume during the period ‘volume, and the unique identifier for each cryptocurrency ‘product_id.’ 4.0.3 Engineered Features Several features were engineered to enhance the analysis. These include ‘ price_change, ’ which is created by subtracting the open value from the close value; ‘ average_price, ’ calculated by adding the high and low values and dividing the sum by two; and ‘ volatility, ’ calculated by subtracting low from high, dividing that value by low, and multiplying the result by 100. The extraction process was split into multiple functions that are almost identical; however, they differ in the date range they pull. During the API testing phase, it was discovered the API could not handle large pulls. Since the process involved looping over ten different cryptocurrencies and pulling three year s’ worth of data at one time, the API timed out. To avoid a timeout, the pulls were split into six month periods starting three years ago, from July 16th, 2021, to July 8th, 2024. This approach prevented the API from being overloaded. The functions were designed to avoid duplicate values, ensuring each cryptocurrency only had 4.0.4 Data Extraction Process
one date field to maintain uniqueness and data integrity.
4.0.5 Final Data Preparation
The final function was created to merge all the data into one usable data frame for analysis and modeling. The data, when pulled, was stored in the ‘ RAW_Data ’ folder under the name ‘train_historic_updated_717.csv.’ The pull was stopped on the 8th to allow for the extraction of a testing sample one week ahead to use as a validation set for forecasting future values after the model is tested and trained. This data is in the same folder and saved as ‘test_historic_upda ted_717.csv. ’ Exploratory data analysis (EDA) is crucial in understanding and preparing the dataset for ‘ The Guardians of the Crypto ’ platform. It involves inspecting the data, cleaning it, preparing it for analysis, and conducting various analyses to uncover patterns, correlations, and insights vital for predictive modeling and informed decision making in the cryptocurrency market. 4.1.1 Data Inspection Data inspection is the initial step in the EDA process, where the dataset is examined to become familiar with its structure and contents. This involves loading the data into a suitable environment and exploring its basic structure. The dataset has 10,764 rows and 12 columns. Columns like ‘ time, ’ ‘ low, ’ ‘ high, ’ ‘ open, ’ ‘ close, ’ ‘ volume, ’ ‘ price_change, ’ ‘ average price, ’ ‘ volatility, ’ ‘ product_id, ’ and ‘ load_dt ’ are inspected. This step helps to understand the general makeup of the data and identify any obvious issues or patterns that may need further investigation. 4.1 Exploratory Data Analysis and Data Preparation (EDA)
73
Made with FlippingBook - Online Brochure Maker