AAI_2025_Capstone_Chronicles_Combined
Cinema Analytics and Prediction System
14
TF-IDF vectorization served as the baseline model. Sparse vectors were generated from
the engineered text feature, and similarity between movie vectors was calculated using cosine
similarity. While this approach was interpretable and simple, it struggled with semantic
understanding, often failing to associate movies with similar themes but different vocabularies
(e.g., Inception and The Matrix ).
The BERT-based approach utilized a library that simplifies creating sentence-level
embeddings using transformer models, allowing you to measure the semantic similarity
between texts efficiently and accurately. This model encoded each movie’s text features into
dense, context-aware embeddings. These embeddings were compared using cosine similarity to
generate semantically meaningful recommendations. For example, the system accurately
associated The Martian with Interstellar due to shared themes, despite differing vocabulary.
BERT provided the highest semantic fidelity among the models evaluated (Reimers & Gurevych,
2019).
A sequence-aware recommendation model was built using a single-layer LSTM with 128
hidden units, using GloVe-embedded token sequences as input. While it captured sequential
information, it was less effective than BERT in semantic alignment due to its reliance on word
order and weaker contextual modeling. To enhance LSTM performance, a hybrid model was
constructed that combined textual features with structured metadata (Figure 9 & 10):
181
Internal
Made with FlippingBook - Share PDF online