Authors :
Vignesh Kumar Kaipa; Mohammed Bilal M; Sanju H K; Meghana B R; Shivandappa; Narendra Kumar S
Volume/Issue :
Volume 10 - 2025, Issue 2 - February
Google Scholar :
https://tinyurl.com/s9e7nhmx
Scribd :
https://tinyurl.com/56b23kct
DOI :
https://doi.org/10.5281/zenodo.14891723
Abstract :
Host-contaminated microbiomes, such as those found in mouse fecal samples, pose challenges for taxonomic
profiling due to the high abundance of host DNA. Nanopore sequencing, with its long-read capabilities, enhances
resolution but suffers from higher error rates and host contamination. This study presents a reproducible Galaxy
workflow for taxonomic profiling of host-contaminated microbiomes using Nanopore sequencing data. The workflow
integrates preprocessing (FastQC, Porechop, fastp), taxonomic classification (Kraken2 with a custom GTDB + mouse gut
taxa database), and visualization (Krona pie charts) to provide a scalable and user-friendly analysis pipeline. Using the
public ENA dataset PRJNA559386, the workflow processed 365,314 raw reads, yielding 267,615 high-quality reads.
Taxonomic profiling identified Acetobacterium sp. KB-1 (13%) and Acetivibrio clariflavus DSM 19732 (12%) as dominant
taxa, consistent with their roles as acetogenic and cellulolytic bacteria. Rare taxa, such as Acetobacter senegalensis (0.8%),
were also detected, demonstrating the workflow’s sensitivity. The proposed workflow provides a robust, reproducible, and
scalable framework for taxonomic profiling of host-contaminated microbiomes, addressing key challenges in Nanopore-
based microbiome analysis. This approach has significant implications for clinical and environmental studies where host
contamination is inevitable, enabling more accurate microbial community assessments.
Keywords :
Nanopore Sequencing, Microbiome, Galaxy Workflow, Kraken2, Taxonomic Profiling, Host Contamination.
References :
- L. Fehlmann et al., "Low biomass microbiomes: Issues of contamination and reliability," Nat. Rev. Microbiol., vol. 20, pp. 201–215, 2022.
- M. Jain et al., "Nanopore sequencing and assembly of a human genome with ultra-long reads," Nat. Biotechnol., vol. 36, pp. 338–345, 2018.
- J. Simpson et al., "Nanopore sequencing: Review of potential sources of error," Nat. Methods, vol. 14, no. 12, pp. 1187–1192, 2017.
- D. E. Wood et al., "Improved metagenomic analysis with Kraken 2," Genome Biol., vol. 20, no. 1, p. 257, 2019.
- A. M. E. Jones et al., "Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes," Nat. Biotechnol., vol. 40, pp. 342–346, 2022.
- S. Jain et al., "Hybrid assembly techniques for microbiome profiling," BMC Genomics, vol. 22, p. 356, 2022.
- K. McLaren et al., "Host contamination in metagenomic sequencing: Challenges and solutions," Microbiome, vol. 10, p. 45, 2022.
- T. N. Phan et al., "Host DNA depletion efficiency for microbiome studies," Sci. Rep., vol. 12, p. 12056, 2022.
- A. Batut et al., "Community-driven development for computational biology: Lessons from Galaxy," PLoS Comput. Biol., vol. 19, p. e1010342, 2023.
- P. Chaumeil et al., "GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database," Bioinformatics, vol. 36, pp. 1925–1927, 2020.
- ENA Dataset PRJNA559386, 2023. [Online]. Available: https://www.ebi.ac.uk/ena
- S. Andrews, "FastQC: A quality control tool for high throughput sequence data," Babraham Institute, 2010.
- S. De Coster et al., "Nanoplot: Visualization tools for Oxford Nanopore data," GitHub, 2018.
- R. Wick, "Porechop: Adapter trimmer for Oxford Nanopore reads," GitHub, 2017.
- S. Chen et al., "fastp: An ultra-fast all-in-one FASTQ preprocessor," Bioinformatics, vol. 34, pp. i884–i890, 2018.
- Y. Wang et al., "Optimizing read filtering for low-quality nanopore data," BMC Bioinform., vol. 22, p. 537, 2021.
- D. H. Parks et al., "GTDB: An ongoing census of bacterial and archaeal diversity," Nucleic Acids Res., vol. 50, pp. D785–D794, 2022.
- L. Xiao et al., "Mouse gut microbiota reference genomes for metagenomic analysis," Sci. Data, vol. 9, p. 203, 2022.
- B. Langmead et al., "Bowtie2: Fast gapped-read alignment," Nat. Methods, vol. 9, pp. 357–359, 2012.
- J. Zhang et al., "Host DNA depletion in microbiome sequencing," Front. Microbiol., vol. 13, p. 891928, 2022.
- J. Lu et al., "Bracken: Estimating species abundance in metagenomics data," PeerJ, vol. 5, p. e3208, 2017.
- B. Ondov et al., "Krona: Interactive metagenomic visualization in a web browser," mSystems, vol. 6, e01115-21, 2021.
- M. J. Nobu et al., "Acetobacterium: A key acetogen in anaerobic carbon cycling," Environ. Microbiol., vol. 24, pp. 357–369, 2022.
- H. J. Flint et al., "Cellulolytic bacteria in the gut microbiome," Nat. Rev. Microbiol., vol. 20, pp. 32–46, 2022.
- C. C. García et al., "Acholeplasma diversity in mammalian guts," ISME J., vol. 16, pp. 123–135, 2022.
- B. Grüning et al., "Galaxy workflows for reproducible analysis," Nat. Biotechnol., vol. 40, pp. 1–3, 2022.
- Oxford Nanopore, "What’s In My Pot (WIMP) workflow," 2023. [Online]. Available: https://nanoporetech.com
- M. Manni et al., "BUSCO: Assessing genome assembly completeness," Mol. Biol. Evol., vol. 38, pp. 4647–4654, 2021.
- L. Breitwieser et al., "KrakenUniq: Confident metagenomics classification using unique k-mer counts," Genome Biol., vol. 19, p. 198, 2018.
- R. R. Wick et al., "Unicycler: Resolving bacterial genome assemblies," PLoS Comput. Biol., vol. 13, p. e1005595, 2017.
- M. Kolmogorov et al., "Flye: De novo assembler for single-molecule sequencing reads," Nat. Methods, vol. 16, pp. 1087–1088, 2019.
Host-contaminated microbiomes, such as those found in mouse fecal samples, pose challenges for taxonomic
profiling due to the high abundance of host DNA. Nanopore sequencing, with its long-read capabilities, enhances
resolution but suffers from higher error rates and host contamination. This study presents a reproducible Galaxy
workflow for taxonomic profiling of host-contaminated microbiomes using Nanopore sequencing data. The workflow
integrates preprocessing (FastQC, Porechop, fastp), taxonomic classification (Kraken2 with a custom GTDB + mouse gut
taxa database), and visualization (Krona pie charts) to provide a scalable and user-friendly analysis pipeline. Using the
public ENA dataset PRJNA559386, the workflow processed 365,314 raw reads, yielding 267,615 high-quality reads.
Taxonomic profiling identified Acetobacterium sp. KB-1 (13%) and Acetivibrio clariflavus DSM 19732 (12%) as dominant
taxa, consistent with their roles as acetogenic and cellulolytic bacteria. Rare taxa, such as Acetobacter senegalensis (0.8%),
were also detected, demonstrating the workflow’s sensitivity. The proposed workflow provides a robust, reproducible, and
scalable framework for taxonomic profiling of host-contaminated microbiomes, addressing key challenges in Nanopore-
based microbiome analysis. This approach has significant implications for clinical and environmental studies where host
contamination is inevitable, enabling more accurate microbial community assessments.
Keywords :
Nanopore Sequencing, Microbiome, Galaxy Workflow, Kraken2, Taxonomic Profiling, Host Contamination.