AAI_2025_Capstone_Chronicles_Combined

First page Table of contents Previous page 204 Next page Last page

MENTAL HEALTH RISK DETECTION USING ML

The goal is to develop a validated, interpretable machine learning model that classifies individuals into low, moderate, or high risk for mental health concerns. We hypothesize that a combination of demographic, occupational, and sentiment-based features can be used to accurately predict mental health risks. The final product will be a web-based interface where users can complete a survey, receive risk scores, and recommendations for intervention planning

and prevention. Data Summary

The dataset includes approximately 300,000 anonymized survey responses (Jikadara, 2024) and contains 17 columns. Most variables are categorical and stored as strings, covering demographic (gender, age group, country, state), occupational (occupation, self-employed), and psychological dimensions (family history, treatment, growing stress, coping difficulty, changes in habits, days indoors, etc.). A timestamp column is also present but will only be used for exploratory analysis, not model training. The dataset presented several quality issues that were resolved through systematic cleaning. A small portion of responses contained missing values, likely due to participants skipping questions or technical errors during submission. These rows were removed to prevent noise and preserve integrity. Exact duplicate entries were also identified and deleted, potentially caused by repeated submissions or system export errors. Additionally, several countries were severely underrepresented (Figure 1), contributing fewer than 300 responses each, and displayed strong gender bias, often overrepresenting male respondents.

Figure 1 Gender Distribution by Country

204

Made with FlippingBook - Share PDF online