AAI_2025_Capstone_Chronicles_Combined
Cinema Analytics and Prediction System
5
recommendation and classification tasks, while numeric features, especially popularity, vote
count, and runtime, boosted model performance when integrated in hybrid architectures.
To further explore the textual patterns in the dataset, we generated word clouds for the
first five movies (Figure 4). These visualizations highlighted the most frequent and prominent
words associated with each film, offering insight into recurring themes and keywords. For
instance, the word cloud for Avatar emphasized terms like “space,” “colony,” and “war,” while
Pirates of the Caribbean showcased words like “ocean,” “captain,” and “treasure.” These word
clouds provided a quick, intuitive sense o f each movie’s core narrative and confirmed the
relevance of our engineered text feature in capturing meaningful information for
recommendation and classification.
Figure 4: Word Cloud for movies
Text-based features provided strong signals for similarity modeling, while numerical
metadata added extra dimensions of comparison. While some variables like title serve mainly
for indexing and querying, others like vote average will be used to create pseudo-labels for
supervised learning experiments with LSTM. The relationships between variables such as
172
Internal
Made with FlippingBook - Share PDF online