AAI_2025_Capstone_Chronicles_Combined

Cinema Analytics and Prediction System

19

Figure 15: Similarity score example

TF-IDF relied on preprocessing and JSON string manipulation, using cosine similarity to

rank top-N similar films. While simple and interpretable, it often misses deeper thematic links,

struggling to pair films, in example, like Inception and The Matrix due to limited shared

vocabulary.

To further explore the structure of semantic similarity captured by TF-IDF embeddings,

KMeans clustering was applied to the vectorized movie data and projected the results using

Principal Component Analysis (PCA) into two dimensions. The plot below (Figure 16) shows the

clustering of movies into five distinct groups based on content similarity.

186

Internal

Made with FlippingBook - Share PDF online