AAI_2025_Capstone_Chronicles_Combined
Cinema Analytics and Prediction System
16
A model using BERT + Logistic Regression was trained for multi-label prediction. This
model performed well for dominant genres like Drama and Action but exhibited limitations for
underrepresented genres. To mitigate this, oversampling was applied during training to
improve minority class learning.
For the LSTM + Glove + Metadata model (Figure 11), the same text preprocessing
pipeline and GloVe-based embedding approach used in the recommendation model were
adapted here for classification:
Figure 11: LSTM + Glove + Metadata model architecture
This hybrid model significantly improved performance on rare genres by incorporating
both textual and numerical context. Final architecture tuning included dropout regularization
and validation-based hyperparameter adjustments (e.g., LSTM size and learning rate).
Revenue prediction and classification
For this module, we employed two tree-based gradient boosting models: XGBRegressor
for predicting continuous revenue values and XGBClassifier for predicting the categorical
success of a movie. We applied a log transformation to budget and revenue to reduce skewness
but kept the original values, as the transformation did not improve model performance.
However, we used the log-transformed budget values to group movies into budget categories
183
Internal
Made with FlippingBook - Share PDF online