AAI_2025_Capstone_Chronicles_Combined

Cinema Analytics and Prediction System

14

TF-IDF vectorization served as the baseline model. Sparse vectors were generated from

the engineered text feature, and similarity between movie vectors was calculated using cosine

similarity. While this approach was interpretable and simple, it struggled with semantic

understanding, often failing to associate movies with similar themes but different vocabularies

(e.g., Inception and The Matrix ).

The BERT-based approach utilized a library that simplifies creating sentence-level

embeddings using transformer models, allowing you to measure the semantic similarity

between texts efficiently and accurately. This model encoded each movie’s text features into

dense, context-aware embeddings. These embeddings were compared using cosine similarity to

generate semantically meaningful recommendations. For example, the system accurately

associated The Martian with Interstellar due to shared themes, despite differing vocabulary.

BERT provided the highest semantic fidelity among the models evaluated (Reimers & Gurevych,

2019).

A sequence-aware recommendation model was built using a single-layer LSTM with 128

hidden units, using GloVe-embedded token sequences as input. While it captured sequential

information, it was less effective than BERT in semantic alignment due to its reliance on word

order and weaker contextual modeling. To enhance LSTM performance, a hybrid model was

constructed that combined textual features with structured metadata (Figure 9 & 10):

181

Internal

Made with FlippingBook - Share PDF online