A hybrid machine learning and mathematical modeling approach for predicting academic performance using latent dimensions| International Journal of Innovative Science and Research Technology

A Hybrid Machine Learning and Mathematical Modeling Approach for Predicting Academic Performance Using Latent Dimensions

Authors : Rajoelison Andrianarimalala Abel; Rakotomalala Vololona Harinoro; Rasolomampiandry Gilbert

Volume/Issue : Volume 11 - 2026, Issue 4 - April

Google Scholar : https://tinyurl.com/4f8a45n5

Scribd : https://tinyurl.com/y2c89pa9

DOI : https://doi.org/10.38124/ijisrt/26apr1548

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Predicting academic performance is a major challenge in higher education, particularly in contexts where student monitoring systems are limited. While machine learning models have demonstrated strong predictive capabilities, their lack of interpretability limits their applicability in educational settings. This study proposes an interpretable model, the Academic Performance Index (API), based on an integrated framework combining mathematical modeling, cognitive sciences, and machine learning. Using real student data, a Random Forest model is employed to identify the most relevant variables and estimate their relative importance. These weights are then used to construct a composite index structured around four latent dimensions : cognitive, motivational, socio-economic, and environmental. The API is subsequently incorporated into a logistic regression model to estimate the probability of academic success. The results show that the proposed model achieves high predictive performance (AUC = 0.897; F1-score = 0.84), comparable to advanced approaches, while providing improved interpretability. The analysis highlights the central role of cognitive factors in academic success.

Keywords : Machine Learning, Mathematical Modeling, Latent Variables, Random Forest, Academic Performance.

References :

Siemens, G. & Baker, R.S. (2012). Learning analytics and educational data mining: towards communication and collaboration. LAK. https://doi.org/10.1145/2330601.2330661
Romero, C. & Ventura, S. (2010). Educational data mining: a review of the state of the art. IEEE Trans. SMC. https://doi.org/10.1109/TSMCC.2010.2053532
Arnold, K.E. & Pistilli, M.D. (2012). Course signals at Purdue: using learning analytics to increase student success. LAK. https://doi.org/10.1145/2330601.2330666
Breiman, L. (2001). Random Forests. Machine Learning. https://doi.org/10.1023/A:1010933404324
Chen, T. & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. KDD. https://doi.org/10.1145/2939672.2939785
Ribeiro, M.T., Singh, S. & Guestrin, C. (2016). “Why Should I Trust You?” Explaining the Predictions of Any Classifier. KDD. https://doi.org/10.1145/2939672.2939778
Pintrich, P.R. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. JEP. https://doi.org/10.1037/0022-0663.95.4.667
Lundberg, S.M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS. https://doi.org/10.48550/arXiv.1705.07874
Baker, R.S. & Inventado, P.S. (2014). Educational data mining and learning analytics. In Learning Analytics. https://doi.org/10.1007/978-1-4614-3305-7_1
Liaw, A. & Wiener, M. (2002). Classification and regression by randomForest. R News. https://doi.org/10.32614/RJ-2002-022
Strobl, C. et al. (2007). Bias in random forest variable importance measures. BMC Bioinformatics. https://doi.org/10.1186/1471-2105-8-25
Eccles, J.S. & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology. https://doi.org/10.1146/annurev.psych.53.100901.135153
Han, J., Kamber, M. & Pei, J. (2012). Data Mining: Concepts and Techniques. https://doi.org/10.1016/C2009-0-61819-5
Jolliffe, I.T. & Cadima, J. (2016). Principal component analysis: a review and recent developments. https://doi.org/10.1098/rsta.2015.0202
Breiman, L. (2001). Statistical modeling: The two cultures. https://doi.org/10.1214/ss/1009213726
Hosmer, D.W., Lemeshow, S. & Sturdivant, R.X. (2013). Applied Logistic Regression. https://doi.org/10.1002/9781118548387
Pedregosa, F. et al. (2011). Scikit-learn: Machine Learning in Python. https://doi.org/10.5555/1953048.2078195

Predicting academic performance is a major challenge in higher education, particularly in contexts where student monitoring systems are limited. While machine learning models have demonstrated strong predictive capabilities, their lack of interpretability limits their applicability in educational settings. This study proposes an interpretable model, the Academic Performance Index (API), based on an integrated framework combining mathematical modeling, cognitive sciences, and machine learning. Using real student data, a Random Forest model is employed to identify the most relevant variables and estimate their relative importance. These weights are then used to construct a composite index structured around four latent dimensions : cognitive, motivational, socio-economic, and environmental. The API is subsequently incorporated into a logistic regression model to estimate the probability of academic success. The results show that the proposed model achieves high predictive performance (AUC = 0.897; F1-score = 0.84), comparable to advanced approaches, while providing improved interpretability. The analysis highlights the central role of cognitive factors in academic success.

Keywords : Machine Learning, Mathematical Modeling, Latent Variables, Random Forest, Academic Performance.

Paper Submission Last Date
30 - June - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.