AAI_2025_Capstone_Chronicles_Combined
Cinema Analytics and Prediction System
17
based on quantiles. These categories represented different production scales and were
included as a feature to help the models learn patterns related to budget levels.
The dataset was split into training, validation, and test sets using an 80:10:10 ratio. Both
models were trained using XGBoost’s training function, where the validation set was used to
monitor performance and stop training early if needed. For the regression model, we set the
loss function to absolute error, which measures the average size of the errors directly. This
choice was informed by the distribution of the revenue variable, which includes significant
outliers like blockbuster films.
The training dynamics of the XGBoost regressor were monitored using Mean Absolute
Error (MAE) on the validation set across boosting iterations. The validation MAE plot shows
(Figure 12) a sharp decline during the initial boosting
rounds, indicating effective early learning. As training
progressed, the MAE decreased more gradually and
stabilized around iteration 30, suggesting the model
had converged.
Figure 12: MAE over Iteration
We improved model performance by combining manual feature engineering with
careful tuning of key model settings. We first adjusted choices like the cutoff used to remove
unrealistic budget values and the number of budget bins, based on exploratory analysis. For
both regression and classification models, we used a grid search approach to find the best
combination of important XGBoost settings, such as the number of trees, tree depth, learning
rate, and regularization controls.
184
Internal
Made with FlippingBook - Share PDF online