/metaGEM

A Snakemake pipeline for the generation of MAGs, reconstruction of GEMs, and simulation of cross-feeding interactions within microbial communities from lab cultures, human gut, ocean, plant-associated, and bulk soil microbiomes

Primary LanguagePython

metaGEM

A Snakemake-based workflow to generate high quality metagenome assembled genomes from short read paired-end data, reconstruct genome scale metabolic models, and perform community metabolic interaction simulations on high performance computing clusters.

metawrapfigs_v2 002

metaGEM integrates an array of existing bioinformatics and metabolic modeling tools using Snakemake, for the purpose of interrogating social interactions in bacterial communities of the human gut microbiome. From WMGS datasets, metagenome assembled genomes (MAGs) are reconstructed, which are then converted into genome-scale metabolic models (GEMs) for in silico simulations of cross feeding interactions within sample based communities. Additional outputs include abundance estimates, taxonomic assignment, growth rate estimation, pangenome analysis, and eukaryotic MAG identification.

Workflow

Core:

  1. metaGEM setup
  2. Quality filter reads with fastp
  3. Assembly with megahit
  4. Draft bin sets with CONCOCT,MaxBin2, and MetaBAT2
  5. Refine & reassemble bins with metaWRAP
  6. Taxonomic assignment with GTDB-tk
  7. Relative abundances with bwa and samtools
  8. Reconstruct & evaluate genome-scale metabolic models with CarveMe and memote
  9. Species metabolic coupling analysis with SMETANA

Bonus:

  1. Growth rate estimation with GRiD, soon to be replaced by SMEG or CoPTR
  2. Pangenome analysis with roary
  3. Eukaryotic draft bins with EukRep and EukCC

Usage

_________________________________________________________________________/\\\\\\\\\\\\___/\\\\\\\\\\\\\\\___/\\\\____________/\\\\_        
 _______________________________________________________________________/\\\//////////___\/\\\///////////___\/\\\\\\________/\\\\\\_       
  __________________________________________/\\\________________________/\\\______________\/\\\______________\/\\\//\\\____/\\\//\\\_      
   ____/\\\\\__/\\\\\________/\\\\\\\\____/\\\\\\\\\\\___/\\\\\\\\\_____\/\\\____/\\\\\\\__\/\\\\\\\\\\\______\/\\\\///\\\/\\\/_\/\\\_     
    __/\\\///\\\\\///\\\____/\\\/////\\\__\////\\\////___\////////\\\____\/\\\___\/////\\\__\/\\\///////_______\/\\\__\///\\\/___\/\\\_    
     _\/\\\_\//\\\__\/\\\___/\\\\\\\\\\\______\/\\\_________/\\\\\\\\\\___\/\\\_______\/\\\__\/\\\______________\/\\\____\///_____\/\\\_   
      _\/\\\__\/\\\__\/\\\__\//\\///////_______\/\\\_/\\____/\\\/////\\\___\/\\\_______\/\\\__\/\\\______________\/\\\_____________\/\\\_  
       _\/\\\__\/\\\__\/\\\___\//\\\\\\\\\\_____\//\\\\\____\//\\\\\\\\/\\__\//\\\\\\\\\\\\/___\/\\\\\\\\\\\\\\\__\/\\\_____________\/\\\_ 
        _\///___\///___\///_____\//////////_______\/////______\////////\//____\////////////_____\///////////////___\///______________\///__
        
        
Usage: bash metaGEM.sh [-t|--task TASK] 
                       [-j|--nJobs NUMBER OF JOBS] 
                       [-c|--cores NUMBER OF CORES] 
                       [-m|--mem GB RAM] 
                       [-h|--hours MAX RUNTIME]

Snakefile wrapper/parser for metaGEM. 

Options:
  -t, --task        Specify task to complete:

                        SETUP
                            createFolders
                            downloadToy
                            organizeData

                        WORKFLOW
                            fastp 
                            megahit 
                            crossMap 
                            concoct 
                            metabat
                            maxbin 
                            binRefine 
                            binReassemble 
                            extractProteinBins
                            carveme
                            memote
                            organizeGEMs
                            smetana
                            extractDnaBins
                            gtdbtk
                            abundance 
                            grid
                            prokka
                            roary

                        VISUALIZATION (in development)
                            qfilterVis
                            assemblyVis
                            binningVis
                            taxonomyVis
                            modelVis
                            interactionVis
                            growthVis

  -j, --nJobs       Specify number of jobs to run in parallel
  -c, --nCores      Specify number of cores per job
  -m, --mem         Specify memory in GB required for job
  -h, --hours       Specify number of hours to allocated to job runtime

Automated installation

Clone this repository to your HPC or local computer and run the env_setup.sh script:

git clone https://github.com/franciscozorrilla/metaGEM.git
cd metaGEM
bash env_setup.sh

This script will set up 3 conda environments, metagem, metawrap, and prokkaroary, which will be activated as required by Snakemake jobs.

CheckM

CheckM is used extensively to evaluate the output of various intermediate steps. Although the CheckM package is installed in the metawrap environment, the user is required to download the CheckM database and run checkm data setRoot <db_dir> as outlined in the CheckM installation guide.

CPLEX

Unfortunately CPLEX cannot be automatically installed in the env_setup.sh script, you must install this dependency manually within the metagem conda environment. GEM reconstruction and GEM community simulations require the IBM CPLEX solver, which is free to download with an academic license. Refer to the CarveMe and SMETANA installation instructions for further information or troubleshooting. Note: CPLEX v.12.8 is recommended.

Manual installation

You can manually set up the environments with the following chunks of code.

metaGEM

conda create -n metagem mamba
source activate metagem
mamba install python snakemake fastp megahit bwa samtools=1.9 kallisto concoct=1.1 metabat2 maxbin2 gtdbtk eukrep eukcc smeg motus
pip install --user memote carveme smetana

metaWRAP

conda create -n metawrap
source activate metawrap
conda install -c ursky metawrap-mg=1.3.2

prokka-roary

conda create -n prokkaroary
source activate prokkaroary
conda install prokka roary

Tutorial

metaGEM can be used to explore your own gut microbiome based on at-home-test-kit seqencing data from services such as unseen bio. The following demo showcases the metaGEM workflow on two unseenbio samples.

Publications

The metaGEM workflow has been used in some capacity in the following publications:

Plastic-degrading potential across the global microbiome correlates with recent pollution trends
Jan Zrimec, Mariia Kokina, Sara Jonasson, Francisco Zorrilla, Aleksej Zelezniak
bioRxiv 2020.12.13.422558; doi: https://doi.org/10.1101/2020.12.13.422558 

Please cite

metaGEM: reconstruction of genome scale metabolic models directly from metagenomes
Francisco Zorrilla, Kiran R. Patil, Aleksej Zelezniak
bioRxiv 2020.12.31.424982; doi: https://doi.org/10.1101/2020.12.31.424982