Authors :
Shubham Kumar Singh ; Dharanya T ; Aarthi N ; Nagadevi S
Volume/Issue :
Volume 10 - 2025, Issue 5 - May
Google Scholar :
https://tinyurl.com/2umebfhz
DOI :
https://doi.org/10.38124/ijisrt/25may1297
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Abstract—The rapid expansion of genomic datasets, particularly those encompassing entire human chromosomes,
presents formidable computational challenges for conventional sequence alignment techniques. Among these, the Longest
Common Subsequence (LCS) problem plays a fundamental role in comparative genomics and DNA sequence alignment.
However, its inherent time and space complexity—typically quadratic in nature—renders traditional sequential and
statically parallelized implementations insufficient for large-scale genomic analysis. In response to this limitation, we
propose a dynamic, memory- efficient parallel architecture based on the Message Passing Interface (MPI) framework,
specifically optimized for large DNA sequence comparisons. Our approach introduces a master-worker model that
dynamically distributes computational workload at runtime. By dividing the input sequences into smaller, manageable
segments and assigning these chunks to worker processes on demand, the system ensures effective load balancing across
processing units. The architecture leverages a space-optimized dynamic programming technique, where only two rows of
the LCS matrix are stored at any time, significantly reducing memory consumption without sacrificing correctness. To
evaluate the scalability and performance of our method, we conducted extensive experiments using complete human
chromosome datasets across an MPI cluster with eight processes. The results indicate that while the dynamic strategy
introduces moderate communication overhead, it consistently outperforms static distribution methods in terms of
scalability, adaptability to heterogeneous environments, and memory efficiency. Notably, the proposed solution maintains
stability and performance even as sequence sizes grow, making it suitable for deployment in high- performance computing
(HPC) environments and cloud-based bioinformatics platforms.
Keywords :
Longest Common Subsequence (LCS), Parallel Computing, Message Passing Interface (MPI), Dynamic Load Balancing, DNA Sequence Alignment, Bioinformatics, High- Performance Computing (HPC), Genome-Scale Processing, Memory Optimization, Master-Worker Architecture.
References :
- G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz- Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955. (references)
- J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol.
- Oxford: Clarendon, 1892, pp.68–73.
- I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
- K. Elissa, “Title of paper if known,” unpublished.
- R. Nicole, “Title of paper with only first word capitalized,” J. Name Stand. Abbrev., in press.
- Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron spectroscopy studies on magneto-optical media and plastic substrate interface,” IEEE Transl. J. Magn. Japan, vol. 2, pp. 740–741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
- M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
- K. Eves and J. Valasek, “ Adaptive control for singularly perturbed systems examples, ” Code Ocean, Aug. 2023. [Online]. Available: https://codeocean.com/capsule/4989235/tree
- D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” 2013, arXiv:1312.6114. [Online]. Available: https://arxiv.org/abs/1312.6114
- S. Liu, “Wi-Fi Energy Detection Testbed (12MTC),” 2023, gitHub repository. [Online]. Available: https://github.com/liustone99/Wi-Fi-
- Energy-Detection-Testbed-12MTC
- “Treatment episode data set: discharges (TEDS-D): concatenated, 2006 to 2009.” U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Office of Applied Studies, August, 2013, DOI:10.3886 /ICPSR30122.v2
Abstract—The rapid expansion of genomic datasets, particularly those encompassing entire human chromosomes,
presents formidable computational challenges for conventional sequence alignment techniques. Among these, the Longest
Common Subsequence (LCS) problem plays a fundamental role in comparative genomics and DNA sequence alignment.
However, its inherent time and space complexity—typically quadratic in nature—renders traditional sequential and
statically parallelized implementations insufficient for large-scale genomic analysis. In response to this limitation, we
propose a dynamic, memory- efficient parallel architecture based on the Message Passing Interface (MPI) framework,
specifically optimized for large DNA sequence comparisons. Our approach introduces a master-worker model that
dynamically distributes computational workload at runtime. By dividing the input sequences into smaller, manageable
segments and assigning these chunks to worker processes on demand, the system ensures effective load balancing across
processing units. The architecture leverages a space-optimized dynamic programming technique, where only two rows of
the LCS matrix are stored at any time, significantly reducing memory consumption without sacrificing correctness. To
evaluate the scalability and performance of our method, we conducted extensive experiments using complete human
chromosome datasets across an MPI cluster with eight processes. The results indicate that while the dynamic strategy
introduces moderate communication overhead, it consistently outperforms static distribution methods in terms of
scalability, adaptability to heterogeneous environments, and memory efficiency. Notably, the proposed solution maintains
stability and performance even as sequence sizes grow, making it suitable for deployment in high- performance computing
(HPC) environments and cloud-based bioinformatics platforms.
Keywords :
Longest Common Subsequence (LCS), Parallel Computing, Message Passing Interface (MPI), Dynamic Load Balancing, DNA Sequence Alignment, Bioinformatics, High- Performance Computing (HPC), Genome-Scale Processing, Memory Optimization, Master-Worker Architecture.