M.S. Applied Data Science - Capstone Chronicles 2025

22

ACKNOWLEDGMENTS I would like to express my sincere gratitude to Professor Ebrahim Tarshizi for his valuable guidance and support throughout the ADS-599 Capstone course. I would also like to acknowledge my classmates Tori and Nolan Peters, whose encouragement and help throughout the Applied Data Science program played a meaningful role in my learning and success. Last but not least, I am deeply thankful to my family for their unwavering support, patience, and belief in me throughout this journey. I would like to acknowledge the assistance of ChatGPT (OpenAI, 2025) for support with grammar review, clarity improvements, and language refinement. Centers for Disease Control and Prevention. (2020). National Health and Nutrition Examination Survey (NHANES), 2017–2020 pre-pandemic data files. U.S. Department of Health & Human Services, Centers for Disease Control and Prevention. https://wwwn.cdc.gov/nchs/nhanes/ Chen, H., Liu, K., & Song, M. (2021). Predicting hypertension using machine learning models with electronic health records: A systematic review. International Journal of Medical Informatics, 153, 104524. https://doi.org/10.1016/j.ijmedinf.2021.1045 24 Huang, J., & Huang, Y. (2024). Predicting obesity status using machine learning and NHANES data: An interpretable approach. PLOS ONE, 19(2), e0304509. https://doi.org/10.1371/journal.pone.0304509 References

important details such as dosage, adherence, and treatment duration that may influence predictive value. Additionally, models were trained and evaluated on the same NHANES population, which may reduce their generalizability to other demographic or geographic groups. Finally, while interpretability tools like SHAP were applied, complex models such as MLP and XGBoost can still be challenging to fully interpret, which may limit clinical transparency and acceptance. Studies Future research should focus on refining medication categorization by including dosage, adherence, and treatment duration data to capture more clinical nuance. Expanding the dataset to encompass more diverse and longitudinal populations would improve generalizability and enable temporal modeling of metabolic syndrome risk. Advanced modeling approaches—such as Long Short-Term Memory (LSTM) networks—could be explored for time-series health data to detect early risk trajectories. Given the promising performance of MLP and XGBoost, further hyperparameter optimization, combined with feature selection techniques, may enhance predictive efficiency without sacrificing accuracy. Lastly, embedding these models into real-world clinical decision support systems could provide valuable insight into operational feasibility, user adoption, and the potential to guide targeted early interventions. 6.2 Recommend Next Steps/Future

173

Made with FlippingBook flipbook maker