M.S. Applied Data Science - Capstone Chronicles 2025

2

2.1 Problem Identification and Motivation

upload and use the same predictions to drive informative decisions. To further investigate this problem, natural language processing and predictive modeling were applied to metadata collected using the YouTube Data API v3. The methodology is based on using features available upon upload, such as title, description, publish time, and channel name, without analyzing the actual video footage. By examining these preupload features, the goal is to understand which characteristics may correlate with engagement. The long-term objective is to identify early indicators of popularity that can be used to build predictive models. This work has the potential to provide insight into the dynamics of digital content performance using accessible and lightweight data inputs. 2 Background Recent research has shown that early view counts and metadata can help predict how well a video might perform over time. A study conducted by Pinto et al. (2013) demonstrated that early viewing patterns can be used to forecast future video performance. A study by Trzcinski and Rokita (2015), involving the use of support vector regression, showed that social and visual features can influence a video’s popularity. Although these studies are valuable, they rely on data that is not publicly available, which raises questions about the real-world applicability of their predictive models. This project builds on previously explored ideas but focuses strictly on using publicly accessible data. The primary data source will be collected through an API connected to YouTube. Features from video metadata will be used to examine their effectiveness in predicting popularity and engagement.

Despite YouTube being a large user platform with a wide range of content, creators still face challenges reaching the right audience and increasing user engagement. Many videos uploaded to YouTube receive little engagement, even when effort is put into crafting strong titles or producing high-quality content. This often leads to discouragement and inefficient use of time and resources. The issue is especially common among smaller creators, such as individuals and small businesses. These users often lack access to advanced analytics or marketing tools available to larger channels, which makes it more difficult to predict future engagement or allocate resources effectively. From a practical standpoint, the ability to forecast engagement using limited features such as title, description, or video length can help creators make more informed decisions before publishing content. The motivation for this project stems from the increasing relevance of digital engagement in the contemporary environment, where online visibility, influence, and income are closely connected. Creators often use these factors to drive further engagement, but smaller creators may not have equal access to them. This creates a disadvantage that this project aims to help address. 2.2 Definition of Objectives The primary objective of this project is to determine whether YouTube video views can be predicted using only metadata and text-based features. These features include video title, channel name, video description, video length, and publication time. The aim is to identify whether any of these variables consistently correlate with engagement metrics such as view

114

Made with FlippingBook flipbook maker