Authors :
Darren Kevin T. Nguemdjom; Alidor M. Mbayandjambe; Grevi B. Nkwimi; Fiston Oshasha; Célestin Muluba; Héritier I. Mbengandji; Ibsen G. Bazie
Volume/Issue :
Volume 10 - 2025, Issue 4 - April
Google Scholar :
https://tinyurl.com/a9p6kx5j
Scribd :
https://tinyurl.com/bdfvz23x
DOI :
https://doi.org/10.38124/ijisrt/25apr1962
Google Scholar
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 15 to 20 days to display the article.
Abstract :
Obesity represents a major public health challenge, requiring accurate and interpretable predictive tools. This study
proposes an approach based on a Multilayer Perceptron (MLP) optimized to predict obesity levels from lifestyle data, eating habits, and
physiological characteristics, using a comprehensive Kaggle dataset combining real and synthetic samples. After rigorous preprocessing,
including normalization and class rebalancing, we compare the performance of the MLP with four classical algorithms (Logistic
Regression, KNN, Random Forest, and XGBoost) using comprehensive metrics (accuracy, precision, recall, F1-score, AUC-ROC). The
results demonstrate the superiority of the optimized MLP (98.4% accuracy, F1-score of 0.97) over the other models, with a significant
improvement from hyperparameter optimization through GridSearchCV. The XAI analysis via SHAP identifies weight, gender, height,
and physical activity as the most determinant factors, providing crucial transparent explanations for clinical applications. This
combination of high predictive performance and interpretability makes the MLP a valuable tool for obesity prevention and diagnosis in
public health.
Keywords :
Obesity Prediction, Machine Learning, Lifestyle Data, SHAP, AUC-ROC, GridSearchCV, Interpretable AI, MLP, Health Analytics.
References :
- World Health Organization. (2021). Obesity and overweight. https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight
- Guh, D. P., et al. (2009). The incidence of co-morbidities related to obesity and overweight: A systematic review and meta-analysis. BMC Public Health, 9, 88. https://doi.org/10.1186/1471-2458-9-88
- Maria, A.S & Ramasamy, Sunder & Kumar, R.Satheesh. (2023). Obesity Risk Prediction Using Machine Learning Approach. 1-7. 10.1109/I CNWC57852.2023.10127434.
- Lin W, Shi S, Huang H, Wen J, Chen G. Predicting risk of obesity in overweight adults using interpretable machine learning algorithms. Front Endocrinol (Lausanne). 2023 Nov 17;14:1292167. doi: 10.3389/fendo.2023.1292167. PMID: 38047114; PMCID: PMC10693451.
- M. Dirik, “Application of machine learning techniques for obesity prediction: a comparative study,” Journal of Complexity in Health Sciences, Vol. 6, No. 2, pp. 16–34, Oct. 2023, https://doi.org/10.21595/chs.2023.23193
- Lundberg, Scott & Lee, Su-In. (2017). A Unified Approach to Interpreting Model Predictions. 10.48550/arXiv.1705.07874.
- Saxena, A.; Mathur, N.; Pathak, P.; Tiwari, P.; Mathur, S.K. Machine Learning Model Based on Insulin Resistance Metagenes Underpins Genetic Basis of Type 2 Diabetes. Biomolecules 2023, 13, 432. https://doi.org/10.3390/biom13030432
- Sadiku, Matthew & Eze, Kelechi & Musa, Sarhan. (2018). Data Mining in Healthcare. International Journal of Advances in Scientific Research and Engineering. 4. 90-92. 10.31695/IJASRE.2018.32881.
- E. Tjoa and C. Guan, "A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI," in IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 11, pp. 4793-4813, Nov. 2021, doi: 10.1109/TNNLS.2020.3027314.
- Mehrparvar, F. (2023). Obesity Level Dataset. Kaggle. https://www.kaggle.com/datasets/fatemehm ehrparvar/obesity-levels.
- Begum, N. B. M., et al. (2024). Predicting Obesity Levels: Machine Learning Approaches Analyzing Eating Habits and Lifestyle Factors. NeuroQuantology, 22(4), 305-313.https://www.n euroquantology.com/open-access/PREDICTING%2BOBESITY%2BLEVELS.DOI Number: 10.48047/nq.2024.22.4.nq24035.
- Alsareii, Saeed & Awais, Muhammad & Alamri, Abdulrahman & Alasmari, Mansour & Irfan, Muhammad & Raza, Mohsin & Manzoor, Umer. (2023). Machine-Learning-Enabled Obesity Level Prediction Through Electronic Health Records. Computer Systems Science and Engineering. 46. 3715-3728. 10.32604/csse.2023.035687.
- Helforoush Z, Sayyad H. Prediction and classification of obesity risk based on a hybrid metaheuristic machine learning approach. Front Big Data. 2024 Sep 30;7:1469981. doi: 10.3389/fdata.2024.1469981. PMID: 39403430; PMCID: PMC11471553.
- Genc, A. C., & Arıcan, E. (2025). Obesity classification: a comparative study of machine learning models excluding weight and height data. Revista da Associação Médica Brasileira, 71(1), e20241282. https://doi.org/10.1590/1806-9282.2024 1282.
- Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010 933404324
- Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785.
- Radovanovic, Milos & Nanopoulos, Alexandros & Ivanovic, Mirjana. (2009). Nearest neighbors in high-dimensional data: The emergence and influence of hubs. Proceedings of the 26th International Conference On Machine Learning, ICML 2009. 382. 109. 10.1145/1553374.1553485.
- Chaoyu Gong, Zhi-gang Su, Xinyi Zhang, Yang You, Adaptive evidential K-NN classification: Integrating neighborhood search and feature weighting, Information Sciences, Volume 648, 2023, 119620,ISSN 0020-0255, https://doi.org/10.1016/j.ins. 2023.119620.
- Zhang, Chao & Zhong, Peisi & Liu, Mei & Song, Qingjun & Liang, Zhongyuan & Wang, Xiao. (2022). Hybrid Metric K-Nearest Neighbor Algorithm and Applications. Mathematical Problems in Engineering. 2022. 1-15. 10.1155/2022/8212546.
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (January 2014), 1929–1958.
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15). JMLR.org, 448–456.
- Raghu, Maithra & Zhang, Chiyuan & Kleinberg, Jon & Bengio, Samy. (2019). Transfusion: Understanding Transfer Learning with Applications to Medical Imaging. 10.48550/arXiv.1902.07208.
- Kingma, Diederik & Ba, Jimmy. (2014). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.
- Prechelt, Lutz. (2000). Early Stopping - But When?. Lecture Notes in Computer Science. 10.1007/3-540-49430-8_3.
- M. Sajid et al., Deep learning approaches for predicting obesity using dietary and physical activity patterns. Computers in Biology and Medicine, vol. 149, 105962, 2022. DOI: 10.48047/nq.2 024.22.4.nq24035.
- Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J. A guide to deep learning in healthcare. Nat Med. 2019 Jan;25(1):24-29. doi: 10.1038/s41591-018-0316-z. Epub 2019 Jan 7. PMID: 30617335.
- Brown, Williams & Noah, Asher & John, Ada. (2024). Optimizing Hyperparameters in Machine Learning Models: Techniques and Best Practices.
- Samek, Wojciech & Wiegand, Thomas & Müller, Klaus-Robert. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ITU Journal: ICT Discoveries - Special Issue 1 - The Impact of Artificial Intelligence (AI) on Communication Networks and Services. 1. 1-10. 10.48550/ar Xiv.1708.08296.
Obesity represents a major public health challenge, requiring accurate and interpretable predictive tools. This study
proposes an approach based on a Multilayer Perceptron (MLP) optimized to predict obesity levels from lifestyle data, eating habits, and
physiological characteristics, using a comprehensive Kaggle dataset combining real and synthetic samples. After rigorous preprocessing,
including normalization and class rebalancing, we compare the performance of the MLP with four classical algorithms (Logistic
Regression, KNN, Random Forest, and XGBoost) using comprehensive metrics (accuracy, precision, recall, F1-score, AUC-ROC). The
results demonstrate the superiority of the optimized MLP (98.4% accuracy, F1-score of 0.97) over the other models, with a significant
improvement from hyperparameter optimization through GridSearchCV. The XAI analysis via SHAP identifies weight, gender, height,
and physical activity as the most determinant factors, providing crucial transparent explanations for clinical applications. This
combination of high predictive performance and interpretability makes the MLP a valuable tool for obesity prevention and diagnosis in
public health.
Keywords :
Obesity Prediction, Machine Learning, Lifestyle Data, SHAP, AUC-ROC, GridSearchCV, Interpretable AI, MLP, Health Analytics.