Explainable AI (XAI) for Obesity Prediction: An Optimized MLP Approach with SHAP Interpretability on Lifestyle and Behavioral Data


Authors : Darren Kevin T. Nguemdjom; Alidor M. Mbayandjambe; Grevi B. Nkwimi; Fiston Oshasha; Célestin Muluba; Héritier I. Mbengandji; Ibsen G. Bazie

Volume/Issue : Volume 10 - 2025, Issue 4 - April


Google Scholar : https://tinyurl.com/a9p6kx5j

Scribd : https://tinyurl.com/bdfvz23x

DOI : https://doi.org/10.38124/ijisrt/25apr1962

Google Scholar

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Note : Google Scholar may take 15 to 20 days to display the article.


Abstract : Obesity represents a major public health challenge, requiring accurate and interpretable predictive tools. This study proposes an approach based on a Multilayer Perceptron (MLP) optimized to predict obesity levels from lifestyle data, eating habits, and physiological characteristics, using a comprehensive Kaggle dataset combining real and synthetic samples. After rigorous preprocessing, including normalization and class rebalancing, we compare the performance of the MLP with four classical algorithms (Logistic Regression, KNN, Random Forest, and XGBoost) using comprehensive metrics (accuracy, precision, recall, F1-score, AUC-ROC). The results demonstrate the superiority of the optimized MLP (98.4% accuracy, F1-score of 0.97) over the other models, with a significant improvement from hyperparameter optimization through GridSearchCV. The XAI analysis via SHAP identifies weight, gender, height, and physical activity as the most determinant factors, providing crucial transparent explanations for clinical applications. This combination of high predictive performance and interpretability makes the MLP a valuable tool for obesity prevention and diagnosis in public health.

Keywords : Obesity Prediction, Machine Learning, Lifestyle Data, SHAP, AUC-ROC, GridSearchCV, Interpretable AI, MLP, Health Analytics.

References :

  1. World Health Organization. (2021). Obesity and overweight. https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight
  2. Guh, D. P., et al. (2009). The incidence of co-morbidities related to obesity and overweight: A systematic review and meta-analysis. BMC Public Health, 9, 88. https://doi.org/10.1186/1471-2458-9-88
  3. Maria, A.S & Ramasamy, Sunder & Kumar, R.Satheesh. (2023). Obesity Risk Prediction Using Machine Learning Approach. 1-7. 10.1109/I CNWC57852.2023.10127434.
  4. Lin W, Shi S, Huang H, Wen J, Chen G. Predicting risk of obesity in overweight adults using interpretable machine learning algorithms. Front Endocrinol (Lausanne). 2023 Nov 17;14:1292167. doi: 10.3389/fendo.2023.1292167. PMID: 38047114; PMCID: PMC10693451.
  5. M. Dirik, “Application of machine learning techniques for obesity prediction: a comparative study,” Journal of Complexity in Health Sciences, Vol. 6, No. 2, pp. 16–34, Oct. 2023, https://doi.org/10.21595/chs.2023.23193
  6. Lundberg, Scott & Lee, Su-In. (2017). A Unified Approach to Interpreting Model Predictions. 10.48550/arXiv.1705.07874.
  7. Saxena, A.; Mathur, N.; Pathak, P.; Tiwari, P.; Mathur, S.K. Machine Learning Model Based on Insulin Resistance Metagenes Underpins Genetic Basis of Type 2 Diabetes. Biomolecules 2023, 13, 432. https://doi.org/10.3390/biom13030432
  8. Sadiku, Matthew & Eze, Kelechi & Musa, Sarhan. (2018). Data Mining in Healthcare. International Journal of Advances in Scientific Research and Engineering. 4. 90-92. 10.31695/IJASRE.2018.32881.
  9. E. Tjoa and C. Guan, "A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI," in IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 11, pp. 4793-4813, Nov. 2021, doi: 10.1109/TNNLS.2020.3027314.
  10. Mehrparvar, F. (2023). Obesity Level Dataset. Kaggle. https://www.kaggle.com/datasets/fatemehm ehrparvar/obesity-levels.
  11. Begum, N. B. M., et al. (2024). Predicting Obesity Levels: Machine Learning Approaches Analyzing Eating Habits and Lifestyle Factors. NeuroQuantology, 22(4), 305-313.https://www.n euroquantology.com/open-access/PREDICTING%2BOBESITY%2BLEVELS.DOI Number: 10.48047/nq.2024.22.4.nq24035.
  12. Alsareii, Saeed & Awais, Muhammad & Alamri, Abdulrahman & Alasmari, Mansour & Irfan, Muhammad & Raza, Mohsin & Manzoor, Umer. (2023). Machine-Learning-Enabled Obesity Level Prediction Through Electronic Health Records. Computer Systems Science and Engineering. 46. 3715-3728. 10.32604/csse.2023.035687.
  13. Helforoush Z, Sayyad H. Prediction and classification of obesity risk based on a hybrid metaheuristic machine learning approach. Front Big Data. 2024 Sep 30;7:1469981. doi: 10.3389/fdata.2024.1469981. PMID: 39403430; PMCID: PMC11471553.
  14. Genc, A. C., & Arıcan, E. (2025). Obesity classification: a comparative study of machine learning models excluding weight and height data. Revista da Associação Médica Brasileira, 71(1), e20241282. https://doi.org/10.1590/1806-9282.2024 1282.
  15. Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
  16. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010 933404324
  17. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785.
  18. Radovanovic, Milos & Nanopoulos, Alexandros & Ivanovic, Mirjana. (2009). Nearest neighbors in high-dimensional data: The emergence and influence of hubs. Proceedings of the 26th International Conference On Machine Learning, ICML 2009. 382. 109. 10.1145/1553374.1553485.
  19. Chaoyu Gong, Zhi-gang Su, Xinyi Zhang, Yang You, Adaptive evidential K-NN classification: Integrating neighborhood search and feature weighting, Information Sciences, Volume 648, 2023, 119620,ISSN 0020-0255, https://doi.org/10.1016/j.ins. 2023.119620.
  20. Zhang, Chao & Zhong, Peisi & Liu, Mei & Song, Qingjun & Liang, Zhongyuan & Wang, Xiao. (2022). Hybrid Metric K-Nearest Neighbor Algorithm and Applications. Mathematical Problems in Engineering. 2022. 1-15. 10.1155/2022/8212546.
  21. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (January 2014), 1929–1958.
  22. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15). JMLR.org, 448–456.
  23. Raghu, Maithra & Zhang, Chiyuan & Kleinberg, Jon & Bengio, Samy. (2019). Transfusion: Understanding Transfer Learning with Applications to Medical Imaging. 10.48550/arXiv.1902.07208.
  24. Kingma, Diederik & Ba, Jimmy. (2014). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.
  25. Prechelt, Lutz. (2000). Early Stopping - But When?. Lecture Notes in Computer Science. 10.1007/3-540-49430-8_3.
  26. M. Sajid et al., Deep learning approaches for predicting obesity using dietary and physical activity patterns. Computers in Biology and Medicine, vol. 149, 105962, 2022. DOI: 10.48047/nq.2 024.22.4.nq24035.
  27. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J. A guide to deep learning in healthcare. Nat Med. 2019 Jan;25(1):24-29. doi: 10.1038/s41591-018-0316-z. Epub 2019 Jan 7. PMID: 30617335.
  28. Brown, Williams & Noah, Asher & John, Ada. (2024). Optimizing Hyperparameters in Machine Learning Models: Techniques and Best Practices.
  29. Samek, Wojciech & Wiegand, Thomas & Müller, Klaus-Robert. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ITU Journal: ICT Discoveries - Special Issue 1 - The Impact of Artificial Intelligence (AI) on Communication Networks and Services. 1. 1-10. 10.48550/ar Xiv.1708.08296.

Obesity represents a major public health challenge, requiring accurate and interpretable predictive tools. This study proposes an approach based on a Multilayer Perceptron (MLP) optimized to predict obesity levels from lifestyle data, eating habits, and physiological characteristics, using a comprehensive Kaggle dataset combining real and synthetic samples. After rigorous preprocessing, including normalization and class rebalancing, we compare the performance of the MLP with four classical algorithms (Logistic Regression, KNN, Random Forest, and XGBoost) using comprehensive metrics (accuracy, precision, recall, F1-score, AUC-ROC). The results demonstrate the superiority of the optimized MLP (98.4% accuracy, F1-score of 0.97) over the other models, with a significant improvement from hyperparameter optimization through GridSearchCV. The XAI analysis via SHAP identifies weight, gender, height, and physical activity as the most determinant factors, providing crucial transparent explanations for clinical applications. This combination of high predictive performance and interpretability makes the MLP a valuable tool for obesity prevention and diagnosis in public health.

Keywords : Obesity Prediction, Machine Learning, Lifestyle Data, SHAP, AUC-ROC, GridSearchCV, Interpretable AI, MLP, Health Analytics.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe