ADS Capstone Chronicles Revised

14

studies in that we aim to predict the outcome of every single game, rather than just identifying the winner of each NFL season. Consequently, we do not encounter the class imbalance issue typically associated with such analyses. To enhance the predictive power of our model, we incorporated feature engineering, particularly focusing on the home and away teams. This involved creating 64 one-hot encoded features to represent the 32 teams in both home and away scenarios. Through exploratory data analysis (EDA), we observed teams tend to perform better at home than away, and certain teams consistently outperform others. By incorporating this information into our model, we aim to make more accurate predictions. Additionally, we included the days rest difference between games for each team as a feature. Due to the physical nature of the sport, teams with more days of rest are likely to recover better and potentially perform better in subsequent games. By considering this factor, we expect to improve the model’s predictive capability. Along with our logistic regression modeling on the game outcomes from various features, we also have a linear regression model that performs well at predicting scoring on both training and testing sets. This model was split into home and away submodels with fewer predictive features in each: sacks, passing yards per attempt, and rushing yards per attempt. By performing the feature reduction, we can focus on the direct impact of passing and rushing on scoring. With comparable mean squared error, mean absolute error, and R squared metrics in the training and testing data, we are confident in extending the results to unseen future data. However, there are also some limitations of our approach. One is the potential removal of columns due to high correlation between features. This process may inadvertently exclude important

features related to the performance of home and away teams, thereby limiting the model’s ability to accurately predict game outcomes. Another limitation, more specifically related to the linear regression analysis on scoring, experienced in our research is the lack of ability of including game script play calling into the model; that is, teams utilize clock management to favor passing or running towards the end of games to increase their chance of winning with regards to the current score. 6.1 Conclusion As it pertains to predicting scoring in the NFL, passing effectiveness appears to have a greater weight than rushing effectiveness. Using the average feature coefficients from the home and away scoring linear regression models, the coefficients of 2.9 and 1.5 for passing yards per attempt and rushing yards per attempt, respectively, dictate each increase in one yard per attempt for passing has almost double the impact on scoring as compared to rushing efficiency increase. With this information, along with supporting correlation values, we can reasonably conclude passing has a greater influence on scoring, and thus winning, in the NFL. As such, teams should look to focus on the passing side of offense and defense through personnel building and schematics. Although there are several ways to do so: pass-protection-focused offensive linemen, wide receivers and quarterbacks for offense; coverage defensive backs and pass rushers for defense; and an offensive coordinator that schemes up more effective passing plays, building a team centered around the pass is crucial to maximize performance on the field. Our binary outcome analysis underscores the robust performance of the logistic regression model in accurately predicting the binary outcome of NFL games. This observation, consistent with prior research, reaffirms logistic regression as a dependable choice for predictive modeling in similar contexts.

41

Made with FlippingBook - Online Brochure Maker