ADS Capstone Chronicles Revised

15

multicollinearity, scaling, and evaluating feature importance, ensured the data was well-prepared for the subsequent modeling phase. This thorough preparation is crucial for achieving accurate and reliable forecasts. 6 Model Selection and Evaluation Model selection and evaluation are crucial steps in developing a reliable forecasting model. A diverse set of models known for their performance in time series forecasting and their ability to handle various challenges, such as multicollinearity and non-linear relationships, were selected for initial evaluation:

are sensitive to significant errors, offering insight into the models ’ prediction accuracy.

MAE provides a straightforward interpretation of average error magnitude, while R-squared indicates the proportion of variance explained by the model. Adjusted R-squared, considered a crucial metric, adjusts for the number of predictors in the model, providing a more accurate measure of model performance, especially in multiple regression contexts. MAPE, particularly useful for time series data, offers a percentage-based error measure, which makes it easier to understand relative to the magnitude of the predicted values. 6.1.2 Model Training and Evaluation The models were initially trained and evaluated on the training set. While straightforward, linear regression exhibited an excessively high R squared value, indicating potential overfitting. Support Vector Regressor and Lasso Regression showed moderate performance but were outperformed by more advanced models. Random Forest Regressor and XGBoost Regressor were among the top performers, with XGBoost showing strong predictive capabilities. However, XGBoost performed poorly on the test set for BTC-USD, highlighting its sensitivity to specific data characteristics. 6.1.3 Ridge Regression Performance With its consistent performance across all cryptocurrencies, Ridge Regression emerged as a robust contender. It showcased a resilient handling of multicollinearity and delivered stable predictions, demonstrating its adaptability in the face of prevalent multicollinearity in time series data. This adaptability is particularly crucial in time series data, where multicollinearity can be prevalent.

- Linear Regression - Ridge Regression - Lasso Regression - Support Vector Regressor (SVR) - Random Forest Regressor - XGBoost Regressor

- Prophet - ARIMA

6.1 Initial Model Selection These robust models, each with a proven track record in time series analysis, instill confidence in our approach. Ridge Regression, for instance, is renowned for its robustness in handling multicollinearity, often leading to better long term predictions. XGBoost, a powerful ensemble learning method, enhances generalization and reduces overfitting, further solidifying our model selection. 6.1.1 Evaluation Metrics For evaluation metrics, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), R-squared, Adjusted R-squared, and Mean Absolute Percentage Error (MAPE) were chosen. These metrics provide a comprehensive understanding of the models ’ performance. MSE and RMSE

82

Made with FlippingBook - Online Brochure Maker