M.S. Applied Data Science - Capstone Chronicles 2025

7

Figure 3 Average Views by Day of Week

Figure 4 Title Length vs. Views Scatterplot

Because video titles are one of the first things viewers see, we wanted to explore whether the length of a title impacts how often a video is viewed. A scatterplot was used to compare title length and view counts. This scatterplot compares the number of characters in a video title with the total views it received. Figure 4 shows title length does not have a consistent relationship with views. Although a few longer titles achieved high view counts, most videos, regardless of title length, are clustered around the lower end of the view spectrum. This suggests that character count alone is not a strong indicator of video performance. Other factors, such as content relevance or phrasing, may play a larger role in attracting viewers.

4.2 Data Quality Before feature engineering and modeling, we conducted a data quality assessment to ensure the integrity and usability of the dataset. The data were sourced via the YouTube Data API and included metadata such as titles, timestamps, and engagement metrics (views, likes, comments). Initial inspection confirmed that there were no missing values in critical columns, and all fields conformed to expected data types. However, due to the nature of publicly available video metadata, the dataset may contain outliers such as videos with exceptionally high view counts or zero engagement, which were preserved to reflect real-world distributional patterns. Text fields like title were normalized to lowercase and stripped of special characters to reduce noise. Overall, the dataset demonstrated high completeness and consistency, providing a reliable foundation for downstream feature engineering and model training. 4.3 Feature Engineering To prepare the dataset for supervised learning tasks, we performed systematic feature

119

Made with FlippingBook flipbook maker