Machine learning based classification of polb gene mutations with shap based interpretability| International Journal of Innovative Science and Research Technology

Machine Learning Based Classification of POLB Gene Mutations with SHAP Based Interpretability

Authors : Baratam Sai Lahari; Balasani Manasini; Rachapudi Reshma; Sri. Ch. Ratna Babu

Volume/Issue : Volume 11 - 2026, Issue 4 - April

Google Scholar : https://tinyurl.com/yc5wckn2

Scribd : https://tinyurl.com/y7nae92t

DOI : https://doi.org/10.38124/ijisrt/26apr1516

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : This research focuses on developing a machine learning method to study mutations that are related to the development of cancer in the POLB gene based on the features associated with single nucleotide polymorphisms(SNPs). Initially, a dataset made up of bioinformatics-derived features like SIFT, PolyPhen2, CADD, and REVEL was pre-processed and subsequently used as a foundation for the creation of predictive models. Five types of classification algorithms were applied and assessed: Logistic Regression, Random Forest, Support Vector Machine, Multilayer Perceptron and XGBoost. To ensure that performance estimates were valid, bootstrap resampling techniques were employed and metrics including accuracy, precision, recall, F1 score and specificity were calculated. Results from the experiments showed that both ensemble models (Random Forest and XGBoost) produced the most accurate results approximately 83 percent which indicated that these models can capture complex relations in SNP data. In addition, SHAP explanation methods were used to explain model predictions and determine the features that had the largest effects on classification decisions. The study indicated that machine learning techniques have many applications in genomic research, particularly when it comes to outcomes associated with mutations that lead to cancers.

Keywords : POLB, Single Nucleotide Polymorphism, Machine Learning, Cancer Mutation Prediction, Random Forest, XGBoost, SHAP Explainability.

References :

R. Alkhanbouli, A. Al-Aamri, M. Maalouf, K. Taha, A. Henschel, and D. Homouz, “Analysis of cancer-associated mutations of POLB using machine learning and bioinformatics,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 21, no. 5, pp. 1436–1444, 2024.
L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
S. M. Lundberg and S. I. Lee, “A unified approach to interpreting model predictions,” Advances in Neural Information Processing Systems, vol. 30, pp. 4765–4774, 2017.
P. C. Ng and S. Henikoff, “SIFT: Predicting amino acid changes that affect protein function,” Nucleic Acids Research, vol. 31, no. 13, pp. 3812–3814, 2003.
I. A. Adzhubei, S. Schmidt, L. Peshkin, V. E. Ramensky, A. Gerasimova, P. Bork, A. S. Kondrashov, and S. R. Sunyaev, “A method and server for predicting damaging missense mutations,” Nature Methods, vol. 7, no. 4, pp. 248–249, 2010.
M. Kircher, D. M. Witten, P. Jain, B. J. O’Roak, G. M. Cooper, and J. Shendure, “A general framework for estimating the relative pathogenicity of human genetic variants,” Nature Genetics, vol. 46, no. 3, pp. 310–315, 2014.
N. Ioannidis, V. J. Rothstein, V. Pejaver, J. Middha, S. McDonnell, J. Baheti, A. Musolf, H. Li, S. E. Pendergrass, D. A. Bick, et al., “REVEL: An ensemble method for predicting the pathogenicity of rare missense variants,” American Journal of Human Genetics, vol. 99, no. 4, pp. 877–885, 2016.
C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

This research focuses on developing a machine learning method to study mutations that are related to the development of cancer in the POLB gene based on the features associated with single nucleotide polymorphisms(SNPs). Initially, a dataset made up of bioinformatics-derived features like SIFT, PolyPhen2, CADD, and REVEL was pre-processed and subsequently used as a foundation for the creation of predictive models. Five types of classification algorithms were applied and assessed: Logistic Regression, Random Forest, Support Vector Machine, Multilayer Perceptron and XGBoost. To ensure that performance estimates were valid, bootstrap resampling techniques were employed and metrics including accuracy, precision, recall, F1 score and specificity were calculated. Results from the experiments showed that both ensemble models (Random Forest and XGBoost) produced the most accurate results approximately 83 percent which indicated that these models can capture complex relations in SNP data. In addition, SHAP explanation methods were used to explain model predictions and determine the features that had the largest effects on classification decisions. The study indicated that machine learning techniques have many applications in genomic research, particularly when it comes to outcomes associated with mutations that lead to cancers.

Keywords : POLB, Single Nucleotide Polymorphism, Machine Learning, Cancer Mutation Prediction, Random Forest, XGBoost, SHAP Explainability.

Paper Submission Last Date
31 - July - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.