Authors :
Dr. P. Umamaheswari; N. Purusothaman; P. Gayathridevi
Volume/Issue :
Volume 11 - 2026, Issue 1 - January
Google Scholar :
https://tinyurl.com/292j5uk5
Scribd :
https://tinyurl.com/3tzau29r
DOI :
https://doi.org/10.38124/ijisrt/26jan183
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
A key challenge in modern biology is integrating different types of molecular data. This study examines the specific
relationship between gene copy number and protein expression levels. Using data from the Cancer Cell Line Encyclopedia
(CCLE), we find that this relationship varies significantly by gene. For the MYC oncogene, copy number strongly predicts
protein levels (R2 = 0.37), indicating that more gene copies generally lead to more protein. However, for the TP53 tumor
suppressor, copy number poorly predicts protein abundance (R2 = 0.08), suggesting that other regulatory mechanisms
dominate. These results show that simple statistical models are often insufficient for biological data, and more advanced
approaches are needed to understand complex gene-protein relationships.
Keywords :
Data Integration, Genomics, Proteomics, Copy Number Variation, MYC, TP53, Predictive Modeling, Statistical Analysis.
References :
- Ghandi, M., et al. (2019). Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature, 569(7757), 503–508.
- Liu, Y., Beyer, A., & Aebersold, R. (2016). On the dependency of cellular protein levels on mRNA abundance. Cell, 165(3), 535-550.
- Maier, T., Güell, M., & Serrano, L. (2009). Correlation of mRNA and protein in complex biological samples. FEBS letters, 583(24), 3966-3973.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.
A key challenge in modern biology is integrating different types of molecular data. This study examines the specific
relationship between gene copy number and protein expression levels. Using data from the Cancer Cell Line Encyclopedia
(CCLE), we find that this relationship varies significantly by gene. For the MYC oncogene, copy number strongly predicts
protein levels (R2 = 0.37), indicating that more gene copies generally lead to more protein. However, for the TP53 tumor
suppressor, copy number poorly predicts protein abundance (R2 = 0.08), suggesting that other regulatory mechanisms
dominate. These results show that simple statistical models are often insufficient for biological data, and more advanced
approaches are needed to understand complex gene-protein relationships.
Keywords :
Data Integration, Genomics, Proteomics, Copy Number Variation, MYC, TP53, Predictive Modeling, Statistical Analysis.