M.S. Applied Data Science - Capstone Chronicles 2025
1 A Predictive Model to Strengthen Retention in Government Agencies: Sentiment Factors Driving Employee Exits Using A Predictive Model Approach Segment Risk Level Sophia Jensen Applied Data Science Master’s Program Shiley Marcos School of Engineering / University of San Diego sophiajensen@sandiego.edu Duy Nguyen Applied Data Science Master’s Program Shiley Marcos School of Engineering / University of San Diego dnguyen1@sandiego.edu
ABSTRACT Factors that lead employees to consider leaving the organization are a challenge for organizations, resulting in lost productivity, increased hiring costs, and reduced team morale (De Winne et al., 2019). This project investigated factors leading to the exit consideration of the federal workforce by analyzing survey data from the Federal Employee Viewpoint Survey (FEVS) between 2020 and 2024. Using a combination of data preprocessing, feature engineering, and machine learning techniques, predictive models were developed to classify whether an employee would consider leaving their agency. Key steps included: (a) handling class imbalance through Synthetic Minority Oversampling Technique (SMOTE), (b) encoding categorical features, and (c) transforming ordinal Likert-scale survey items into binary predictors. Three classification algorithms—logistic regression, decision tree, and XGBoost—were evaluated using accuracy, precision, recall, and F1-score. Although the XGBoost model achieved the highest overall accuracy (81%) and provided meaningful feature insights, its recall for the minority leave class was limited, which highlights a challenge of detecting at-risk employees. Feature importance analysis revealed that employee recognition, role clarity,
and supervisor alignment are critical predictors of turnover intent. Notably, gender emerged as a top predictive feature, suggesting the need for future fairness analysis. This study provided a data driven framework for human resources (HR) teams to proactively identify and support employees who are considering leaving the organization. KEYWORDS Retention rates, employee turnover, sentiment survey, engagement, human resources, machine learning, predictive modeling, classification, gradient boosting 1 Introduction Employee turnover costs U.S. companies billions of dollars annually in lost productivity, training, and recruitment (Tatel & Wigert, 2024). Tech and healthcare industries are more affected by the downsides of turnover. Specifically, in tech, the replacement cost can cost up to 200% for managers and technical professionals up to 80% of their salary (Tatel & Wigert, 2024). Many organizations collect HR data; however, data is not being used proactively to prevent voluntary exits.
95
Made with FlippingBook flipbook maker