M.S. Applied Data Science - Capstone Chronicles 2025
3
counts or likes. Although predictive modeling is planned for a later stage, the initial focus is on performing exploratory data analysis to identify patterns and descriptive statistics. In parallel, natural language preprocessing techniques will be applied to detect patterns within the text-based metadata. If the data supports the hypothesis, the findings may provide content creators with actionable insights to better predict the performance of their videos based on upload-time metadata. Conversely, if the hypothesis is disproven, the results may suggest that metadata and text-based features alone are insufficient, indicating the need for incorporating additional contextual information and further analysis. 3 Literature Review This literature review examines recent studies on predicting YouTube video popularity using machine learning. Its purpose is to understand existing approaches, highlight commonly used techniques, and identify opportunities for further research. The review is organized thematically to compare how different studies approach the problem. It focuses on shared methods, recurring ideas, and areas that have not been fully explored. 3.1 Using Early View Patterns to Predict the Popularity of YouTube Videos This study focuses on predicting video popularity based on early view patterns. Pinto et al. (2013) found that views in the first few days were strong indicators of long-term performance. A common pattern in their findings is the importance of early engagement in forecasting future trends. This supports later work that also uses early metrics as predictors. Unlike newer research, Pinto et al. (2013) do not use machine learning and rely on simpler models. This contrast shows how methods have evolved. The paper uses real YouTube data, which strengthens its credibility. However, it does not consider factors like content
type or algorithm influence. This creates a gap in understanding how other elements contribute to video success. 3.2 Predicting Popularity of Online Videos Using Support Vector Regression This study builds on early prediction research by using support vector regression to estimate video popularity. The authors included visual and social features such as thumbnail brightness and comment counts. A key pattern is the shift toward using more diverse data rather than relying only on views. This approach supports the idea that multiple factors contribute to engagement. Compared to Pinto et al. (2013), this paper uses more advanced modeling techniques and a wider set of inputs. The findings align with later research that focuses on content-based and user-based signals. Still, the study does not explore recommendation systems or user behavior and limits how well the model explains real-world trends. 3.3 Will This Video Go Viral? Explaining and Predicting the Popularity of YouTube Videos Kong et al. (2018) focused on both explaining and predicting why certain YouTube videos become popular. They used gradient boosting models with features related to social diffusion, timing, and video content. A common pattern in their work is the use of machine learning models that balance accuracy and interpretability. Their results support earlier findings that early engagement matters, but also show that content and timing add value. Compared to previous studies, this one emphasizes understanding why a video goes viral rather than just predicting that it will. The findings align with studies that look beyond basic metrics. However, the paper does not account for how platform algorithms or user feedback affect popularity. This creates a gap in
115
Made with FlippingBook flipbook maker