AAI_2025_Capstone_Chronicles_Combined

Cinema Analytics and Prediction System

16

A model using BERT + Logistic Regression was trained for multi-label prediction. This

model performed well for dominant genres like Drama and Action but exhibited limitations for

underrepresented genres. To mitigate this, oversampling was applied during training to

improve minority class learning.

For the LSTM + Glove + Metadata model (Figure 11), the same text preprocessing

pipeline and GloVe-based embedding approach used in the recommendation model were

adapted here for classification:

Figure 11: LSTM + Glove + Metadata model architecture

This hybrid model significantly improved performance on rare genres by incorporating

both textual and numerical context. Final architecture tuning included dropout regularization

and validation-based hyperparameter adjustments (e.g., LSTM size and learning rate).

Revenue prediction and classification

For this module, we employed two tree-based gradient boosting models: XGBRegressor

for predicting continuous revenue values and XGBClassifier for predicting the categorical

success of a movie. We applied a log transformation to budget and revenue to reduce skewness

but kept the original values, as the transformation did not improve model performance.

However, we used the log-transformed budget values to group movies into budget categories

183

Internal

Made with FlippingBook - Share PDF online