M.S. Applied Data Science - Capstone Chronicles 2025
17
comprehensive metric that balances theoretical expertise with empirical validation. Figure 17 Model Performance - xG Prediction -Defense
P(Win) = 1 / (1 + e^(-β₀ + β₁×Score)) (6) Where P(Win) represents the probability of winning, β₀ is the intercept, β₁ is the coefficient for the predictor variable (SPPS), and e is the mathematical constant approximately equal to 2.718. The complete validation results are presented in Figure 18. The validation of the SPPS against actual match outcomes represents a critical step in establishing the metric's predictive validity. We compiled a comprehensive match-level dataset from FBref containing team performance data and match results (i.e., wins/losses) to assess whether our SHAP-calibrated performance metric captures meaningful player contributions that translate to team success. The dataset was randomly split into training (70%) and testing (30%) sets to ensure unbiased model evaluation and prevent overfitting. The S-curve relationship exhibits the expected logistic pattern, with win probability approaching certainty as scores exceed 12 points. The distribution analysis reveals clear separation between winning (mean ≈ 8.5) and losing (mean ≈ 7.0) scenarios, with the rebalanced score by result chart demonstrating a significant difference between losses and wins, where winning teams consistently achieve higher SPPS values across all quartiles. Model calibration confirms excellent agreement between predicted probabilities and observed outcomes, validating that our position specific performance scores capture meaningful contributions that translate to competitive advantage in professional football.
Table 6 SHAP Performance Metrics Model Performance Metrics by Position
Position
R ²
MAE
RMSE
Forward
0.947 0.913 0.969 0.993
0.598 2.891 0.482 6.803
0.758 5.227 0.896 8.513
Midfielder Defender Goalkeeper
Note . Model performance metrics from XGBoost models trained with xG as target variable. Sample sizes: forward ( n = 1,695), midfielder ( n = 1,823), defender ( n = 1,823), goalkeeper ( n = 396) 4.3 Logistic Regression Validation of SPPS Against Match Outcomes Logistic regression is a statistical method used to model the probability of a binary outcome (i.e., win/loss) based on predictor variables. The logistic function transforms linear combinations of predictors into probabilities bounded between 0 and 1, making it ideal for binary classification problems. The fundamental equation for logistic regression is expressed as:
64
Made with FlippingBook flipbook maker