M.S. Applied Data Science - Capstone Chronicles 2025

7

4.1.1 Exploratory Data Analysis - Real Madrid Performance Data Exploratory data analysis (EDA) employed both univariate and multivariate analytical approaches to examine dataset characteristics, identify position specific performance patterns, and inform subsequent modeling decisions. This comprehensive analysis revealed critical insights into player performance distributions, tactical role differentiation, and performance metric relationships across different positional categories. 4.1.1.1 Performance Distributions The Real Madrid performance dataset demonstrated significant distributional variations across key performance metrics, with most variables exhibiting right-skewed distributions characteristic of football performance data. Analysis of forward-specific performance metrics revealed concentration of values at lower performance levels with extended tails representing exceptional individual performances. The shots on target distribution demonstrated a mean of 0.94 (±1.06), with 68% of observations recording zero to one shots per match and maximum values reaching five shots, indicating sporadic high-output performances. The modal value of 0 reflects matches where forwards faced limited scoring opportunities or were deployed as substitutes. This distribution pattern revealed the intermittent nature of shooting opportunities in modern tactical systems, where forwards must maximize limited chances (see Figure 1).

Defenders for two last seasons ( n = 534) exhibit increasing defensive actions over time where Interceptions increased from 0.63 to 0.79 Table 3 Descriptive Statistics of Defender Performance Metrics for 2023–24 and 2024–25 Seasons 2023–24 ( n = 258) 2024–25 ( n = 276) Metric M SD M SD Interceptions 0.63 0.9 0.79 1.02 Blocks 0.84 1.13 0.83 1.02 Clearances 1.98 1.9 2.32 2.36 Tackles won 0.72 0.92 0.87 1.05 Defensive third tackles 0.67 0.96 0.79 0.98 In Table 4, goalkeepers in the last two seasons ( n = 103) exhibited decreasing total pass completion percentage from 86.47% to 83.39% Table 4 Descriptive Statistics of Goalkeeper Performance Metrics for 2023–24 and 2024–25 Seasons 2023–24 ( n = 52) 2024–25 ( n = 51) Metric M SD M SD Total Pass completion % 86.5 8.53 83.39 11.5 errors Leading to shot 0.04 0.19 0.02 0.14 Progressive distance (m) 405.7 131.2 399.6 141. Data integration was performed using Python's panda’s library through concatenation operations, ensuring preservation of temporal ordering and match-specific contextual information. Duplicate removal processes were implemented to maintain data integrity, resulting in a final combined dataset of 1,550 unique match observations without temporal overlap or statistical redundancy.

54

Made with FlippingBook flipbook maker