AAI_2025_Capstone_Chronicles_Combined
MENTAL HEALTH RISK DETECTION USING ML
5
Note. The chart shows that the United States has the highest participation with a male majority, and most countries exhibit a similar male-dominated gender imbalance.
To reduce sampling bias and strengthen the statistical power of gender-related variables, these entries were excluded, resulting in the removal of approximately 25,000 rows. Lastly, variable entries showed inconsistent formatting, which were standardized into uniform categorical or boolean types to ensure compatibility with encoding and model processing. These issues are common in large-scale, self-reported survey data and were addressed to support fairness and model reliability (Jikadara, 2024). After cleaning, the dataset included 260,986 entries and 17 variables. To enable risk prediction for individuals likely to experience or continue experiencing mental illness, we generated a target variable since the original dataset lacked one. Using k-modes clustering, we grouped responses into three categories, Low, Medium, and High risk, based on behavioral and psychological indicators. This unsupervised method enabled us to
205
Made with FlippingBook - Share PDF online