Authors :
Rajoelison Andrianarimalala Abel; Rakotomalala Vololona Harinoro; Rasolomampiandry Gilbert
Volume/Issue :
Volume 11 - 2026, Issue 4 - April
Google Scholar :
https://tinyurl.com/4f8a45n5
Scribd :
https://tinyurl.com/y2c89pa9
DOI :
https://doi.org/10.38124/ijisrt/26apr1548
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Predicting academic performance is a major challenge in higher education, particularly in contexts where
student monitoring systems are limited. While machine learning models have demonstrated strong predictive capabilities,
their lack of interpretability limits their applicability in educational settings. This study proposes an interpretable model,
the Academic Performance Index (API), based on an integrated framework combining mathematical modeling, cognitive
sciences, and machine learning. Using real student data, a Random Forest model is employed to identify the most relevant
variables and estimate their relative importance. These weights are then used to construct a composite index structured
around four latent dimensions : cognitive, motivational, socio-economic, and environmental. The API is subsequently
incorporated into a logistic regression model to estimate the probability of academic success. The results show that the
proposed model achieves high predictive performance (AUC = 0.897; F1-score = 0.84), comparable to advanced
approaches, while providing improved interpretability. The analysis highlights the central role of cognitive factors in
academic success.
Keywords :
Machine Learning, Mathematical Modeling, Latent Variables, Random Forest, Academic Performance.
References :
- Siemens, G. & Baker, R.S. (2012). Learning analytics and educational data mining: towards communication and collaboration. LAK. https://doi.org/10.1145/2330601.2330661
- Romero, C. & Ventura, S. (2010). Educational data mining: a review of the state of the art. IEEE Trans. SMC. https://doi.org/10.1109/TSMCC.2010.2053532
- Arnold, K.E. & Pistilli, M.D. (2012). Course signals at Purdue: using learning analytics to increase student success. LAK. https://doi.org/10.1145/2330601.2330666
- Breiman, L. (2001). Random Forests. Machine Learning. https://doi.org/10.1023/A:1010933404324
- Chen, T. & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. KDD. https://doi.org/10.1145/2939672.2939785
- Ribeiro, M.T., Singh, S. & Guestrin, C. (2016). “Why Should I Trust You?” Explaining the Predictions of Any Classifier. KDD. https://doi.org/10.1145/2939672.2939778
- Pintrich, P.R. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. JEP. https://doi.org/10.1037/0022-0663.95.4.667
- Lundberg, S.M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS. https://doi.org/10.48550/arXiv.1705.07874
- Baker, R.S. & Inventado, P.S. (2014). Educational data mining and learning analytics. In Learning Analytics. https://doi.org/10.1007/978-1-4614-3305-7_1
- Liaw, A. & Wiener, M. (2002). Classification and regression by randomForest. R News. https://doi.org/10.32614/RJ-2002-022
- Strobl, C. et al. (2007). Bias in random forest variable importance measures. BMC Bioinformatics. https://doi.org/10.1186/1471-2105-8-25
- Eccles, J.S. & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology. https://doi.org/10.1146/annurev.psych.53.100901.135153
- Han, J., Kamber, M. & Pei, J. (2012). Data Mining: Concepts and Techniques. https://doi.org/10.1016/C2009-0-61819-5
- Jolliffe, I.T. & Cadima, J. (2016). Principal component analysis: a review and recent developments. https://doi.org/10.1098/rsta.2015.0202
- Breiman, L. (2001). Statistical modeling: The two cultures. https://doi.org/10.1214/ss/1009213726
- Hosmer, D.W., Lemeshow, S. & Sturdivant, R.X. (2013). Applied Logistic Regression. https://doi.org/10.1002/9781118548387
- Pedregosa, F. et al. (2011). Scikit-learn: Machine Learning in Python. https://doi.org/10.5555/1953048.2078195
Predicting academic performance is a major challenge in higher education, particularly in contexts where
student monitoring systems are limited. While machine learning models have demonstrated strong predictive capabilities,
their lack of interpretability limits their applicability in educational settings. This study proposes an interpretable model,
the Academic Performance Index (API), based on an integrated framework combining mathematical modeling, cognitive
sciences, and machine learning. Using real student data, a Random Forest model is employed to identify the most relevant
variables and estimate their relative importance. These weights are then used to construct a composite index structured
around four latent dimensions : cognitive, motivational, socio-economic, and environmental. The API is subsequently
incorporated into a logistic regression model to estimate the probability of academic success. The results show that the
proposed model achieves high predictive performance (AUC = 0.897; F1-score = 0.84), comparable to advanced
approaches, while providing improved interpretability. The analysis highlights the central role of cognitive factors in
academic success.
Keywords :
Machine Learning, Mathematical Modeling, Latent Variables, Random Forest, Academic Performance.