M.S. Applied Data Science - Capstone Chronicles 2025

13

collection processes after a comprehensive data cleaning a null values population techniques. Optional advanced metrics showed limited missing values in percentage calculations where denominators approached zero (e.g., completion rates for players with minimal actions). Box plot analysis ( see Figure 11 ) identified performance outliers consistent with exceptional individual performances rather than data collection errors. Figure 11 Box Plots for Outlier Detection - Showing Box Plots Tackles

Figure 10 Goalkeeper Position Performance Comparison: Courtois vs. Lunin

These visualizations validate the position-specific metrics selected through correlation and VIF analysis, providing tactical insights for optimal player selection and formation strategies. 4.1.2 Data Quality Assessment and Performance Data Data quality validation procedures were implemented to ensure analytical reliability and model input appropriateness. The assessment revealed minimal missing value occurrence (0% across all critical variables) and identified systematic approaches for handling performance measurement inconsistencies. Data Type Validation: Systematic conversion procedures ensured appropriate data types for subsequent analysis. Numeric variables were validated for mathematical operations, and categorical variables (e.g., position, opponent, competition) were encoded appropriately for machine learning applications. Missing Value Analysis: Comprehensive examination revealed no missing values in fundamental performance metrics (goals, assists, minutes, defensive actions), indicating robust data

The boxplot reveals 55 outliers below the lower whisker in the Passes Completed metric (Mdn = 88.5, IQR = 11.40), indicating numerous match observations where players recorded significantly fewer completed passes than typical, likely representing substitute appearances or matches with limited possession (see Figure 12).

60

Made with FlippingBook flipbook maker