A bioinformatic pipeline to reconstruct draft metagenome-assembled genomes (MAGs).
MAGenie is a bioinformatic pipeline designed to reconstruct draft MAGs for downstream pathogen identification within a metagenomic context using Illumina short reads or Oxford Nanopore long reads. It includes several sequential steps facilitated by publicly available bioinformatic tools, including metagenome assembly, taxonomic sequence classification, and classified sequence extraction. While the pipeline described herein has not been integrated into a single software package, each step has been carefully curated and executed using established bioinformatic tools.
This pipeline requires the following tools:
- Metagenome assembly: MEGAHIT (Tested with version 1.2.9), SPAdes (Tested with version 3.15.4), or Ray (Tested with version 2.3.1) for Illumina short reads; Flye (Tested with version 2.9.2) for Oxford Nanopore long reads.
- Taxonomic sequence classification: Kraken 2 (Tested with version 2.1.3 and the standard database created in 09/2019).
- Classified sequence extraction: KrakenTools (Tested with version 1.2).
- Metagenome assembly: MEGAHIT, SPAdes (
metaspades.py
), or Ray and Flye (--meta
) will be used to assemble Illumina short reads and Oxford Nanopore long reads, respectively, into contiguous sequences (contigs). The assemblers were previously benchmarked and selected based on their performance in generating high-quality assemblies for downstream genomic analyses. - Taxonomic sequence classification: Following metagenome assembly, taxonomic classification of the assembled contigs will be performed using Kraken 2. This step involves assigning taxonomic labels to individual sequences based on their similarity to reference sequences in a predefined database.
- Classified sequence extraction: Subsequently, sequences corresponding to specific taxonomic groups of interest will be extracted from the assembled contigs. This extraction process is conducted using the sequence extraction module of KrakenTools (
extract_kraken_reads.py
). The extracted sequences are compiled to generate draft MAGs representing the targeted taxonomic groups. This extraction encompasses reads classified at both parent (--include-parents
) and child (--include-children
) taxonomic levels.
The draft MAGs serve as valuable resources for downstream genomic analyses, such as identifications of mobile genetic elements (plasmids, prophages, etc.) and genes (antimicrobial resistance genes, virulence genes, etc.), serotyping, multilocus sequence typing, and phylogenetic inference.
If you find MAGenie useful, please cite:
Shotgun metagenomic data sets of simulated bacterial communities used to benchmark MAGenie:
You may also consider citing the following (tools used by MAGenie):