/MAGenie

MAGenie: A bioinformatic pipeline to reconstruct draft metagenome-assembled genomes (MAGs).

MAGenie

A bioinformatic pipeline to reconstruct draft metagenome-assembled genomes (MAGs).

Introduction

MAGenie is a bioinformatic pipeline designed to reconstruct draft MAGs for downstream pathogen identification within a metagenomic context using Illumina short reads or Oxford Nanopore long reads. It includes several sequential steps facilitated by publicly available bioinformatic tools, including metagenome assembly, taxonomic sequence classification, and classified sequence extraction. While the pipeline described herein has not been integrated into a single software package, each step has been carefully curated and executed using established bioinformatic tools.

This pipeline requires the following tools:

  1. Metagenome assembly: MEGAHIT (Tested with version 1.2.9), SPAdes (Tested with version 3.15.4), or Ray (Tested with version 2.3.1) for Illumina short reads; Flye (Tested with version 2.9.2) for Oxford Nanopore long reads.
  2. Taxonomic sequence classification: Kraken 2 (Tested with version 2.1.3 and the standard database created in 09/2019).
  3. Classified sequence extraction: KrakenTools (Tested with version 1.2).

Usage

  1. Metagenome assembly: MEGAHIT, SPAdes (metaspades.py), or Ray and Flye (--meta) will be used to assemble Illumina short reads and Oxford Nanopore long reads, respectively, into contiguous sequences (contigs). The assemblers were previously benchmarked and selected based on their performance in generating high-quality assemblies for downstream genomic analyses.
  2. Taxonomic sequence classification: Following metagenome assembly, taxonomic classification of the assembled contigs will be performed using Kraken 2. This step involves assigning taxonomic labels to individual sequences based on their similarity to reference sequences in a predefined database.
  3. Classified sequence extraction: Subsequently, sequences corresponding to specific taxonomic groups of interest will be extracted from the assembled contigs. This extraction process is conducted using the sequence extraction module of KrakenTools (extract_kraken_reads.py). The extracted sequences are compiled to generate draft MAGs representing the targeted taxonomic groups. This extraction encompasses reads classified at both parent (--include-parents) and child (--include-children) taxonomic levels.

The draft MAGs serve as valuable resources for downstream genomic analyses, such as identifications of mobile genetic elements (plasmids, prophages, etc.) and genes (antimicrobial resistance genes, virulence genes, etc.), serotyping, multilocus sequence typing, and phylogenetic inference.

Citations

If you find MAGenie useful, please cite:

Illumina short reads: Chen, Z., & Meng, J. (2022). Critical assessment of short-read assemblers for the metagenomic identification of foodborne and waterborne pathogens using simulated bacterial communities. Microorganisms, 10(12), 2416.

Oxford Nanopore long reads: Chen, Z., Grim, C.J., Ramachandran, P., & Meng, J. (2024). Advancing metagenome-assembled genome-based pathogen identification: unraveling the power of long-read assembly algorithms in Oxford Nanopore sequencing. Microbiology Spectrum, 12(6), e00117-24.

Shotgun metagenomic data sets of simulated bacterial communities used to benchmark MAGenie:

Illumina short reads: Chen, Z., & Meng, J. (2024). Illumina short read-based shotgun metagenomic data sets of simulated bacterial communities derived from fresh spinach and surface water. Microbiology Resource Announcements, 13(7), e00375-24.

Oxford Nanopore long reads: Chen, Z., Grim, C. J., Ramachandran, P., & Meng, J. (2024). Oxford Nanopore long read-based shotgun metagenomic data sets of simulated bacterial communities originating from fresh spinach and surface water. Microbiology Resource Announcements, 13(9), e00586-24.

You may also consider citing the following (tools used by MAGenie):

MEGAHIT: Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674-1676.

SPAdes: Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P. A. (2017). metaSPAdes: a new versatile metagenomic assembler. Genome Research, 27(5), 824-834.

Ray: Boisvert, S., Raymond, F., Godzaridis, É., Laviolette, F., & Corbeil, J. (2012). Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biology, 13, 1-13.

Flye: Kolmogorov, M., Bickhart, D. M., Behsaz, B., Gurevich, A., Rayko, M., Shin, S. B., ... & Pevzner, P. A. (2020). metaFlye: scalable long-read metagenome assembly using repeat graphs. Nature Methods, 17(11), 1103-1110.

Kraken 2: Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20, 1-13.

KrakenTools: Lu, J., Rincon, N., Wood, D. E., Breitwieser, F. P., Pockrandt, C., Langmead, B., ... & Steinegger, M. (2022). Metagenome analysis using the Kraken software suite. Nature Protocols, 17(12), 2815-2839.