AAI_2025_Capstone_Chronicles_Combined

Cinema Analytics and Prediction System

17

based on quantiles. These categories represented different production scales and were

included as a feature to help the models learn patterns related to budget levels.

The dataset was split into training, validation, and test sets using an 80:10:10 ratio. Both

models were trained using XGBoost’s training function, where the validation set was used to

monitor performance and stop training early if needed. For the regression model, we set the

loss function to absolute error, which measures the average size of the errors directly. This

choice was informed by the distribution of the revenue variable, which includes significant

outliers like blockbuster films.

The training dynamics of the XGBoost regressor were monitored using Mean Absolute

Error (MAE) on the validation set across boosting iterations. The validation MAE plot shows

(Figure 12) a sharp decline during the initial boosting

rounds, indicating effective early learning. As training

progressed, the MAE decreased more gradually and

stabilized around iteration 30, suggesting the model

had converged.

Figure 12: MAE over Iteration

We improved model performance by combining manual feature engineering with

careful tuning of key model settings. We first adjusted choices like the cutoff used to remove

unrealistic budget values and the number of budget bins, based on exploratory analysis. For

both regression and classification models, we used a grid search approach to find the best

combination of important XGBoost settings, such as the number of trees, tree depth, learning

rate, and regularization controls.

184

Internal

Made with FlippingBook - Share PDF online