M.S. Applied Data Science - Capstone Chronicles 2025
9
severe multicollinearity that necessitated removing npxG from subsequent analyses (see Table 5). Additionally, take-ons success and take-ons attempted show strong correlation ( r = .837), presenting moderate multicollinearity concerns. Figure 3 Forward Position Performance Metrics Correlation Matrix
Figure 4 Midfielder Position Performance Metrics Correlation Matrix
Defender correlations ( n = 1,146) revealed moderate to strong relationships primarily within tackle-related metrics. Total tackles correlated strongly with tackles won ( r = .801), producing infinite VIF values that required removing total tackles from the analysis. The composite metric Tkl+Int shows expected correlations with its components ( r = .753 with tackles, r = .661 with interceptions), also resulting in its removal due to multicollinearity. Interestingly, blocking actions demonstrated internal consistency, with blocks showing strong correlation with both blocks shots ( r = .658) and blocks passes ( r = .722), though these relationships remain below the severe multicollinearity threshold.
Midfielder metrics ( n = 1,079) exhibited the most pronounced collinearity issues, with passing-related variables showing near-perfect correlations: passes attempted and passes completed ( r = .992), touches and passes attempted ( r = .989), and touches and passes completed ( r = .978). These extreme correlations resulted in VIF values exceeding 300 (see Table 5), requiring removal of touches and passes attempted while retaining passes completed for the final model. Notably, key passes show moderate correlation with expected assists ( r = .692), validating its role as a creative performance indicator.
56
Made with FlippingBook flipbook maker