M.S. Applied Data Science - Capstone Chronicles 2025
5
(d) military, (e) racial category, (f) tenure, (g) sex, and (h) supervisory status. Additionally, each record included a randomized ID to maintain anonymity and the agency. Referenced in the appendix, Table 2 outlines the 29 commonly asked survey questions and its corresponding question_ID that were consistently requested across the 5 years. The dataset contains categorical and ordinal data. Categorical data was recorded with a response value of A, B, C, or D when applicable, where A represented a yes response, B represented a ‘no’ response, and C and D represented other category types. These fields were generally binary. The survey questions were answered on an ordinal Likert scale, measuring a respondent’s attitude, opinion, or perception on a specific statement. Responses either included a strongly disagree, disagree, neither agree nor disagree, agree, and strongly agree; or never, rarely, sometimes, most of the time, and always with a response value of 1 to 5, respectively (Britannica Academic, 2025). The target variable asked participants, “Are you considering leaving your organization in the next year, and if so, why?”, the responses were as follows: ● A = No, not considering leaving ● B = Yes, for other reasons ● C = Yes, to take another job in the Federal Government ● D = to take another job outside the Federal Government In 2020, the target variable was split into two columns, providing an opportunity to gain insights into the impacts of the COVID-19 global pandemic on an employee’s decision to leave. For this project, the additional insights were not considered as part of the analysis to remain consistent with the questions asked across the 5
years. Figure 1 represents the distribution of an employee’s consideration of leaving or not leaving.
Figure 1
Distribution of Considerations of Leaving (Including Missing Values)
A large portion of employees replied no when asked if they considered leaving their respective organization. 839,162 employees responded otherwise, and 176,939 had missing values. With the large dataset, missing records were removed and cleaned before further exploratory data analysis and data quality analysis. Figure 2 represents the distribution of survey questions consistently asked from 2020 to 2025. In response to each question, more employees replied favorably (agree, strongly agree, most of the time, or always) than unfavorably. Table 1 calculates the percentage distribution of each response by year. On average, 41.86% of the responses were a 4, and 30.99% of the responses were a 5. The top 5 survey questions with the most ‘strongly agree’ or ‘always’ were “Supervisors in my work until support employee development,” “My supervisor 4.1.2 Survey Response Variables
99
Made with FlippingBook flipbook maker