A Galaxy Workflow for Taxonomic Profiling of Host-Contaminated Microbiomes Using Nanopore Sequencing: Validation with Public Ena Datasets


Authors : Vignesh Kumar Kaipa; Mohammed Bilal M; Sanju H K; Meghana B R; Shivandappa; Narendra Kumar S

Volume/Issue : Volume 10 - 2025, Issue 2 - February


Google Scholar : https://tinyurl.com/s9e7nhmx

Scribd : https://tinyurl.com/56b23kct

DOI : https://doi.org/10.5281/zenodo.14891723


Abstract : Host-contaminated microbiomes, such as those found in mouse fecal samples, pose challenges for taxonomic profiling due to the high abundance of host DNA. Nanopore sequencing, with its long-read capabilities, enhances resolution but suffers from higher error rates and host contamination. This study presents a reproducible Galaxy workflow for taxonomic profiling of host-contaminated microbiomes using Nanopore sequencing data. The workflow integrates preprocessing (FastQC, Porechop, fastp), taxonomic classification (Kraken2 with a custom GTDB + mouse gut taxa database), and visualization (Krona pie charts) to provide a scalable and user-friendly analysis pipeline. Using the public ENA dataset PRJNA559386, the workflow processed 365,314 raw reads, yielding 267,615 high-quality reads. Taxonomic profiling identified Acetobacterium sp. KB-1 (13%) and Acetivibrio clariflavus DSM 19732 (12%) as dominant taxa, consistent with their roles as acetogenic and cellulolytic bacteria. Rare taxa, such as Acetobacter senegalensis (0.8%), were also detected, demonstrating the workflow’s sensitivity. The proposed workflow provides a robust, reproducible, and scalable framework for taxonomic profiling of host-contaminated microbiomes, addressing key challenges in Nanopore- based microbiome analysis. This approach has significant implications for clinical and environmental studies where host contamination is inevitable, enabling more accurate microbial community assessments.

Keywords : Nanopore Sequencing, Microbiome, Galaxy Workflow, Kraken2, Taxonomic Profiling, Host Contamination.

References :

  1. L. Fehlmann et al., "Low biomass microbiomes: Issues of contamination and reliability," Nat. Rev. Microbiol., vol. 20, pp. 201–215, 2022.
  2. M. Jain et al., "Nanopore sequencing and assembly of a human genome with ultra-long reads," Nat. Biotechnol., vol. 36, pp. 338–345, 2018.
  3. J. Simpson et al., "Nanopore sequencing: Review of potential sources of error," Nat. Methods, vol. 14, no. 12, pp. 1187–1192, 2017.
  4. D. E. Wood et al., "Improved metagenomic analysis with Kraken 2," Genome Biol., vol. 20, no. 1, p. 257, 2019.
  5. A. M. E. Jones et al., "Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes," Nat. Biotechnol., vol. 40, pp. 342–346, 2022.
  6. S. Jain et al., "Hybrid assembly techniques for microbiome profiling," BMC Genomics, vol. 22, p. 356, 2022.
  7. K. McLaren et al., "Host contamination in metagenomic sequencing: Challenges and solutions," Microbiome, vol. 10, p. 45, 2022.
  8. T. N. Phan et al., "Host DNA depletion efficiency for microbiome studies," Sci. Rep., vol. 12, p. 12056, 2022.
  9. A. Batut et al., "Community-driven development for computational biology: Lessons from Galaxy," PLoS Comput. Biol., vol. 19, p. e1010342, 2023.
  10. P. Chaumeil et al., "GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database," Bioinformatics, vol. 36, pp. 1925–1927, 2020.
  11. ENA Dataset PRJNA559386, 2023. [Online]. Available: https://www.ebi.ac.uk/ena
  12. S. Andrews, "FastQC: A quality control tool for high throughput sequence data," Babraham Institute, 2010.
  13. S. De Coster et al., "Nanoplot: Visualization tools for Oxford Nanopore data," GitHub, 2018.
  14. R. Wick, "Porechop: Adapter trimmer for Oxford Nanopore reads," GitHub, 2017.
  15. S. Chen et al., "fastp: An ultra-fast all-in-one FASTQ preprocessor," Bioinformatics, vol. 34, pp. i884–i890, 2018.
  16. Y. Wang et al., "Optimizing read filtering for low-quality nanopore data," BMC Bioinform., vol. 22, p. 537, 2021.
  17. D. H. Parks et al., "GTDB: An ongoing census of bacterial and archaeal diversity," Nucleic Acids Res., vol. 50, pp. D785–D794, 2022.
  18. L. Xiao et al., "Mouse gut microbiota reference genomes for metagenomic analysis," Sci. Data, vol. 9, p. 203, 2022.
  19. B. Langmead et al., "Bowtie2: Fast gapped-read alignment," Nat. Methods, vol. 9, pp. 357–359, 2012.
  20. J. Zhang et al., "Host DNA depletion in microbiome sequencing," Front. Microbiol., vol. 13, p. 891928, 2022.
  21. J. Lu et al., "Bracken: Estimating species abundance in metagenomics data," PeerJ, vol. 5, p. e3208, 2017.
  22. B. Ondov et al., "Krona: Interactive metagenomic visualization in a web browser," mSystems, vol. 6, e01115-21, 2021.
  23. M. J. Nobu et al., "Acetobacterium: A key acetogen in anaerobic carbon cycling," Environ. Microbiol., vol. 24, pp. 357–369, 2022.
  24. H. J. Flint et al., "Cellulolytic bacteria in the gut microbiome," Nat. Rev. Microbiol., vol. 20, pp. 32–46, 2022.
  25. C. C. García et al., "Acholeplasma diversity in mammalian guts," ISME J., vol. 16, pp. 123–135, 2022.
  26. B. Grüning et al., "Galaxy workflows for reproducible analysis," Nat. Biotechnol., vol. 40, pp. 1–3, 2022.
  27. Oxford Nanopore, "What’s In My Pot (WIMP) workflow," 2023. [Online]. Available: https://nanoporetech.com
  28. M. Manni et al., "BUSCO: Assessing genome assembly completeness," Mol. Biol. Evol., vol. 38, pp. 4647–4654, 2021.
  29. L. Breitwieser et al., "KrakenUniq: Confident metagenomics classification using unique k-mer counts," Genome Biol., vol. 19, p. 198, 2018.
  30. R. R. Wick et al., "Unicycler: Resolving bacterial genome assemblies," PLoS Comput. Biol., vol. 13, p. e1005595, 2017.
  31. M. Kolmogorov et al., "Flye: De novo assembler for single-molecule sequencing reads," Nat. Methods, vol. 16, pp. 1087–1088, 2019.

Host-contaminated microbiomes, such as those found in mouse fecal samples, pose challenges for taxonomic profiling due to the high abundance of host DNA. Nanopore sequencing, with its long-read capabilities, enhances resolution but suffers from higher error rates and host contamination. This study presents a reproducible Galaxy workflow for taxonomic profiling of host-contaminated microbiomes using Nanopore sequencing data. The workflow integrates preprocessing (FastQC, Porechop, fastp), taxonomic classification (Kraken2 with a custom GTDB + mouse gut taxa database), and visualization (Krona pie charts) to provide a scalable and user-friendly analysis pipeline. Using the public ENA dataset PRJNA559386, the workflow processed 365,314 raw reads, yielding 267,615 high-quality reads. Taxonomic profiling identified Acetobacterium sp. KB-1 (13%) and Acetivibrio clariflavus DSM 19732 (12%) as dominant taxa, consistent with their roles as acetogenic and cellulolytic bacteria. Rare taxa, such as Acetobacter senegalensis (0.8%), were also detected, demonstrating the workflow’s sensitivity. The proposed workflow provides a robust, reproducible, and scalable framework for taxonomic profiling of host-contaminated microbiomes, addressing key challenges in Nanopore- based microbiome analysis. This approach has significant implications for clinical and environmental studies where host contamination is inevitable, enabling more accurate microbial community assessments.

Keywords : Nanopore Sequencing, Microbiome, Galaxy Workflow, Kraken2, Taxonomic Profiling, Host Contamination.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe