AAI_2025_Capstone_Chronicles_Combined
MENTAL HEALTH RISK DETECTION USING ML
6
synthesize multiple factors into a meaningful classification outcome. After creating the target, we computed feature variance to identify the most influential variables: occupation, days indoors, care options, change habits, and social weakness. These were removed from the training set to prevent data leakage and ensure that the models learn generalized patterns rather than memorize the clustering structure. Our variables were primarily categorical, so we evaluated their relationships using Chi-square tests (Figure 2) and Cramér’s V Heatmap (Table 1). The analysis revealed strong associations between family history, treatment, growing stress, coping struggles, and work interest with the constructed risk label, suggesting these features are informative for modeling. Conversely, gender, country, self-employed, and mental health interview showed weak associations, indicating limited predictive value. Importantly, no multicollinearity was detected among the categorical features, which supports the use of all retained variables in the model without distortion. These relationships informed our decision to drop the five most cluster-defining variables to avoid data leakage and focus modeling on more generalizable predictors, preserving model interpretability and fairness.
Figure 2 Cramér’s V Heatmap of Categorical Feature Correlation
206
Made with FlippingBook - Share PDF online