AAI_2025_Capstone_Chronicles_Combined

Cinema Analytics and Prediction System

21

(2010) and Toy Story 2 (1999) highest, with similarity scores of 0.935 and 0.928, respectively,

showing BERT’s strength in capturing deep semantic relationships (Figure 18). It also identified

thematically related films like CJ7, Rushmore, and Jimmy Neutron: Boy Genius by recognizing

shared narrative and emotional elements beyond surface language.

Figure 18: BERT Similarity score

To ensure consistency, we fixed the number of clusters at k = 5 across all embedding

methods (Figure 19), enabling direct comparison of their impact on thematic grouping.

Although some overlap occurred in BERT-based clusters (notably Clusters 2 and 4), we

maintained this structure for methodological uniformity. Future research could apply adaptive

clustering or silhouette analysis for refinement. BERT’s sentence embeddings, visualized via K Means and PCA, effectively grouped movies by deep semantic themes rather than surface

keywords, for example, (Figure 20) emotional films like Stepmom and Submarine clustered

together, while comedies, sci- fi, and thrillers formed separate groups, demonstrating BERT’s

strength in capturing contextual meaning beyond vocabulary.

188

Internal

Made with FlippingBook - Share PDF online