M.S. Applied Data Science - Capstone Chronicles 2025
14
5.1.1 Gender-Based Attrition Analysis Because sex was found to have the most significant feature importance in the XGBoost model, a gender-based attrition analysis was conducted to explore the difference in attrition between male and female employees in the data. Table 4 shows the proportions of attrition rate between both male and females in the dataset. Based on the proportions, male employees had a slightly higher attrition rate (20.04%) than female employees (18.29%). The analysis suggests males leave at a higher rate than female employees. Table 4 Proportions of Gender Attrition Gender (dsex) Stay Leave Male 79.96% 20.04% Female 81.71% 18.29% 5.1.2 Chi-Squared Analysis A chi-squared test was conducted to understand the relationship between gender and employee attrition. Chi-squared tests are used to measure categorical variables and whether there is a statistical significance (Agresti, 2013). In this case, the test was used to determine whether there was statistical significance between gender and an employee’s intent to leave. The test resulted in a chi-square statistic of 1503.44 with a p value < 0.001, indicating a statistically significant difference in attrition rates between male and female employees. Figure 8 displays the distribution of employee attrition rate by gender in the data.
Figure 8 Distribution of Employee Attrition by Gender
The strong significant association between gender and employee attrition suggests gender plays an important role in employee turnover and should be considered when developing strategies to minimize turnover in an organization. 5.2 Model’s Performance Comparison To evaluate the model’s performance, an Receiver-operating characteristic curve(ROC) analysis was conducted to compare the models. Figure 9 depicts the ROC curve. ROC curve analysis showed that logistic regression and decision tree models achieved the highest area under the curve (AUC) (0.71), slightly outperforming XGBoost (0.68). Although the ROC curve for the XGBoost model performed slightly lower compared to the other models, XGBoost demonstrated a higher accuracy overall (81%) compared to the logistic regression and decision tree models (both approx. between 68%-69%). Additionally, XGBoost received a much higher recall for the majority class (0.96), indicating it can accurately identify employees more likely to stay. Though the recall for the minority class
108
Made with FlippingBook flipbook maker