Bioinformatics-Driven Identification of Enriched Pathways in Cancer: GO and KEGG Insights


Authors : Paras R. Parekh; Dr. Nilofer K. Shaikh

Volume/Issue : Volume 11 - 2026, Issue 2 - February


Google Scholar : https://tinyurl.com/bdz9w9jj

Scribd : https://tinyurl.com/5b8rnw8e

DOI : https://doi.org/10.38124/ijisrt/26feb656

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Breast cancer is a biologically heterogeneous malignancy and the leading cause of cancer-related mortality among women worldwide. Its clinical complexity arises from distinct molecular subtypes—Luminal A, Luminal B, HER2-enriched, and Basal-like—each exhibiting unique gene expression profiles and therapeutic responses. This study presents an integrative bioinformatics and machine learning (ML) pipeline for subtype classification and biomarker discovery using publicly available microarray datasets (GSE65194, GSE42568, GSE45827). The workflow incorporates R-based preprocessing for probe-to-gene annotation, normalization, and differential expression analysis (DEA), followed by survival modeling via Kaplan–Meier curves and protein–protein interaction (PPI) network construction using STRING. Annotated gene features were used to train multiple ML models—Random Forest, XGBoost, Support Vector Machine (SVM), and LASSO regression—implemented in Python. Model performance was evaluated using cross-validation and regression metrics, achieving high predictive accuracy (R² > 0.90) across subtypes. The pipeline identified clinically relevant biomarkers, including COL10A1, EGFR, FN1, COL1A1, BGN, ERBB2, COL5A1, COL5A2, and COL11A1— consistent with known subtype characteristics and survival outcomes. Its modular design ensures reproducibility, scalability, and adaptability to other cancer types or omics platforms. By integrating statistical rigor with ML interpretability, this study provides a biologically informed framework for precision oncology, enhancing diagnostic accuracy, patient stratification, and targeted therapy selection in breast cancer management

Keywords : Breast Cancer Subtypes; Differential Gene Expression; Machine Learning; Biomarker Discovery; Survival Analysis; DBSCAN Clustering; Precision Oncology

References :

  1. MDPI Diagnostics, “Multimodal deep learning model for breast cancer subtype classification integrating imaging and clinical metadata,” Diagnostics, vol. 13, no. 5, 2023.
  2. T. Gill, “Bioinformatics_GSE65194_Breast_Cancer_Resistance/optimus.R,” GitHub repository, commit 1e6290dd248b07bcf0a23635b3b4afccfd623eb1.
  3. T. Gill, “optimus.R script,” GitHub, [Online]. Available: https://github.com/tahagill/Bioinformatics_GSE65194_Breast_Cancer_Resistance/blob/1e6290dd248b07bcf0a23635b3b4afccfd623eb1/optimus.R.
  4. Drippypale, “microarray-aml,” GitHub repository, [Online]. Available: https://github.com/drippypale/microarray-aml.
  5. Futureomics, “Machine learning in drug discovery,” GitHub repository, [Online]. Available: https://github.com/futureomics/Machine-learning-in-drug-discovery_ (github.com in Bing).
  6. “Analysis of the microarray gene expression for breast cancer progression after the application of modified logistic regression,” ScienceDirect, [Online].
  7. Y. Wang, et al., “CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization,” Bioinformatics, PubMed, 2016.
  8. pgbio99, “BRCA-Subtype-Classification_ML: Integrative ML framework for breast cancer subtype classification using GEO microarray data,” GitHub repository.
  9. Frontiers in Physiology, “Review of ML and deep learning techniques for cancer classification using microarray gene expression,” Front. Physiol., vol. 13, 2022.
  10. MDPI Medicine, “Article,” Medicina, vol. 59, no. 10, p. 1705, 2023. [Online]. Available: https://www.mdpi.com/1648-9144/59/10/1705.
  11. Dr. Nilofer, “FUTUREOMICS,” GitHub repository.
  12. National Center for Biotechnology Information, “Gene Expression Omnibus (GEO),” NCBI, [Online]. Available: https://www.ncbi.nlm.nih.gov/geo/ (ncbi.nlm.nih.gov in Bing).

Breast cancer is a biologically heterogeneous malignancy and the leading cause of cancer-related mortality among women worldwide. Its clinical complexity arises from distinct molecular subtypes—Luminal A, Luminal B, HER2-enriched, and Basal-like—each exhibiting unique gene expression profiles and therapeutic responses. This study presents an integrative bioinformatics and machine learning (ML) pipeline for subtype classification and biomarker discovery using publicly available microarray datasets (GSE65194, GSE42568, GSE45827). The workflow incorporates R-based preprocessing for probe-to-gene annotation, normalization, and differential expression analysis (DEA), followed by survival modeling via Kaplan–Meier curves and protein–protein interaction (PPI) network construction using STRING. Annotated gene features were used to train multiple ML models—Random Forest, XGBoost, Support Vector Machine (SVM), and LASSO regression—implemented in Python. Model performance was evaluated using cross-validation and regression metrics, achieving high predictive accuracy (R² > 0.90) across subtypes. The pipeline identified clinically relevant biomarkers, including COL10A1, EGFR, FN1, COL1A1, BGN, ERBB2, COL5A1, COL5A2, and COL11A1— consistent with known subtype characteristics and survival outcomes. Its modular design ensures reproducibility, scalability, and adaptability to other cancer types or omics platforms. By integrating statistical rigor with ML interpretability, this study provides a biologically informed framework for precision oncology, enhancing diagnostic accuracy, patient stratification, and targeted therapy selection in breast cancer management

Keywords : Breast Cancer Subtypes; Differential Gene Expression; Machine Learning; Biomarker Discovery; Survival Analysis; DBSCAN Clustering; Precision Oncology

Paper Submission Last Date
28 - February - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe