AAI_2025_Capstone_Chronicles_Combined
Cinema Analytics and Prediction System
19
Figure 15: Similarity score example
TF-IDF relied on preprocessing and JSON string manipulation, using cosine similarity to
rank top-N similar films. While simple and interpretable, it often misses deeper thematic links,
struggling to pair films, in example, like Inception and The Matrix due to limited shared
vocabulary.
To further explore the structure of semantic similarity captured by TF-IDF embeddings,
KMeans clustering was applied to the vectorized movie data and projected the results using
Principal Component Analysis (PCA) into two dimensions. The plot below (Figure 16) shows the
clustering of movies into five distinct groups based on content similarity.
186
Internal
Made with FlippingBook - Share PDF online