A Dynamic MPI-Based Memory-Efficient Framework for Longest Common Subsequence Computation on Massive DNA Sequence


Authors : Shubham Kumar Singh ; Dharanya T ; Aarthi N ; Nagadevi S

Volume/Issue : Volume 10 - 2025, Issue 5 - May


Google Scholar : https://tinyurl.com/2umebfhz

DOI : https://doi.org/10.38124/ijisrt/25may1297

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Abstract—The rapid expansion of genomic datasets, particularly those encompassing entire human chromosomes, presents formidable computational challenges for conventional sequence alignment techniques. Among these, the Longest Common Subsequence (LCS) problem plays a fundamental role in comparative genomics and DNA sequence alignment. However, its inherent time and space complexity—typically quadratic in nature—renders traditional sequential and statically parallelized implementations insufficient for large-scale genomic analysis. In response to this limitation, we propose a dynamic, memory- efficient parallel architecture based on the Message Passing Interface (MPI) framework, specifically optimized for large DNA sequence comparisons. Our approach introduces a master-worker model that dynamically distributes computational workload at runtime. By dividing the input sequences into smaller, manageable segments and assigning these chunks to worker processes on demand, the system ensures effective load balancing across processing units. The architecture leverages a space-optimized dynamic programming technique, where only two rows of the LCS matrix are stored at any time, significantly reducing memory consumption without sacrificing correctness. To evaluate the scalability and performance of our method, we conducted extensive experiments using complete human chromosome datasets across an MPI cluster with eight processes. The results indicate that while the dynamic strategy introduces moderate communication overhead, it consistently outperforms static distribution methods in terms of scalability, adaptability to heterogeneous environments, and memory efficiency. Notably, the proposed solution maintains stability and performance even as sequence sizes grow, making it suitable for deployment in high- performance computing (HPC) environments and cloud-based bioinformatics platforms.

Keywords : Longest Common Subsequence (LCS), Parallel Computing, Message Passing Interface (MPI), Dynamic Load Balancing, DNA Sequence Alignment, Bioinformatics, High- Performance Computing (HPC), Genome-Scale Processing, Memory Optimization, Master-Worker Architecture.

References :

  1. G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz- Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955. (references)
  2. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol.
  3. Oxford: Clarendon, 1892, pp.68–73.
  4. I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
  5. K. Elissa, “Title of paper if known,” unpublished.
  6. R. Nicole, “Title of paper with only first word capitalized,” J. Name Stand. Abbrev., in press.
  7. Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron spectroscopy studies on magneto-optical media and plastic substrate interface,” IEEE Transl. J. Magn. Japan, vol. 2, pp. 740–741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
  8. M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  9. K. Eves and J. Valasek, “ Adaptive control for singularly perturbed systems examples, ” Code Ocean, Aug. 2023. [Online]. Available: https://codeocean.com/capsule/4989235/tree
  10. D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” 2013, arXiv:1312.6114. [Online]. Available: https://arxiv.org/abs/1312.6114
  11. S. Liu, “Wi-Fi Energy Detection Testbed (12MTC),” 2023, gitHub repository. [Online]. Available: https://github.com/liustone99/Wi-Fi-
  12. Energy-Detection-Testbed-12MTC
  13. “Treatment episode data set: discharges (TEDS-D): concatenated, 2006 to 2009.” U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Office of Applied Studies, August, 2013, DOI:10.3886 /ICPSR30122.v2

Abstract—The rapid expansion of genomic datasets, particularly those encompassing entire human chromosomes, presents formidable computational challenges for conventional sequence alignment techniques. Among these, the Longest Common Subsequence (LCS) problem plays a fundamental role in comparative genomics and DNA sequence alignment. However, its inherent time and space complexity—typically quadratic in nature—renders traditional sequential and statically parallelized implementations insufficient for large-scale genomic analysis. In response to this limitation, we propose a dynamic, memory- efficient parallel architecture based on the Message Passing Interface (MPI) framework, specifically optimized for large DNA sequence comparisons. Our approach introduces a master-worker model that dynamically distributes computational workload at runtime. By dividing the input sequences into smaller, manageable segments and assigning these chunks to worker processes on demand, the system ensures effective load balancing across processing units. The architecture leverages a space-optimized dynamic programming technique, where only two rows of the LCS matrix are stored at any time, significantly reducing memory consumption without sacrificing correctness. To evaluate the scalability and performance of our method, we conducted extensive experiments using complete human chromosome datasets across an MPI cluster with eight processes. The results indicate that while the dynamic strategy introduces moderate communication overhead, it consistently outperforms static distribution methods in terms of scalability, adaptability to heterogeneous environments, and memory efficiency. Notably, the proposed solution maintains stability and performance even as sequence sizes grow, making it suitable for deployment in high- performance computing (HPC) environments and cloud-based bioinformatics platforms.

Keywords : Longest Common Subsequence (LCS), Parallel Computing, Message Passing Interface (MPI), Dynamic Load Balancing, DNA Sequence Alignment, Bioinformatics, High- Performance Computing (HPC), Genome-Scale Processing, Memory Optimization, Master-Worker Architecture.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe