ADS Capstone Chronicles Revised

4

(2022) explores various levels of oversampling, including 500%, 1200%, and 1800%, using SMOTE across different machine learning models such as C4.5, Artificial Neural Networks, and Random Forest. This approach highlights the impact of class imbalance on classification performance, as well as the comparison of performance metrics across different oversampling levels. Notably, Roumani’s (2022) findings demonstrate the improved performance of the Random Forest model, as evidenced by enhanced metrics including AUC, recall, accuracy, and specificity, achieved through SMOTE oversampling. However, the study reveals gaps in its limited discussion on other potential factors influencing performance and the validation of SMOTE oversampling across diverse sets of variables (Roumani, 2022). 3.4 Analyzing the Predictive Power of NFL Statistics. The article delves into how NFL statistics can be used to predict game outcomes, blending sports, statistics, and predictive modeling. It highlights key metrics like offensive yards gained, defensive yards lost, and turnover differentials as potential indicators of game results. In showcasing the accuracy of predictions based on 2021 statistics, the article demonstrates the value of statistical analysis despite the inherent unpredictability of sports. The author uses pseudo R 2 values to rigorously validate the predictive models’ effectiveness. Ultimately, the model that performed the best utilized the following four predictors: yards gained, yards allowed, turnovers forced, and turnovers lost. This model had a pseudo R 2 value of .4150 and correctly predicted 132 of the first 150 games in the 2021 season. Comparing individual stats, pairs of stats, and multiple variables further validates the methodology and sheds light on the importance of different metrics.

However, the article acknowledges limitations. It recognizes the potential impact of rule changes on statistical predictors, causing fewer data points to be utilized in model training, and suggests exploring additional variables like red zone conversion rates to improve predictive accuracy. Another limitation discussed is the structure of the predictions stemming from completed games and thus needing to use information from halftime to aid the predictions (Pantle, 2022). 3.5 The Anatomy of American Football: Evidence from 7 Years of NFL Game Data. The article delves into predictive modeling within NFL game analysis, focusing on techniques such as the Bradley-Terry model and statistical bootstrap methods. These approaches aim to improve predictive accuracy, particularly when facing challenges like limited historical data availability. A notable trend in sports analytics is the utilization of statistical models for predictive purposes. Moreover, integrating statistical bootstrap techniques is becoming popular to enhance predictive capabilities by accounting for data constraints. The article emphasizes the importance of validating these models against established benchmarks to gauge their effectiveness. The descriptive model and FPM engine exhibit high accuracy in predicting game outcomes, achieving an impressive 84% cross-validation accuracy. However, the article also recognizes certain limitations. Specifically, there is a gap in methodology concerning the omission of game day decisions and the inadequate inclusion of schedule strength information. The current focus primarily revolves around historical performance data and simpler statistical measures (Pelechrinis & Papalexakis, 2016). 4 Methodology To ensure accuracy of our models, we have collected our data from a reputable source,

31

Made with FlippingBook - Online Brochure Maker