metaGEM
A Snakemake-based workflow to generate high quality metagenome assembled genomes from short read paired-end data, reconstruct genome scale metabolic models, and perform community metabolic interaction simulations on high performance computing clusters.
metaGEM integrates an array of existing bioinformatics and metabolic modeling tools using Snakemake, for the purpose of interrogating social interactions in bacterial communities of the human gut microbiome. From WMGS datasets, metagenome assembled genomes (MAGs) are reconstructed, which are then converted into genome-scale metabolic models (GEMs) for in silico simulations of cross feeding interactions within sample based communities. Additional outputs include abundance estimates, taxonomic assignment, growth rate estimation, pangenome analysis, and eukaryotic MAG identification.
Workflow
Core:
- metaGEM setup
- Quality filter reads with fastp
- Assembly with megahit
- Draft bin sets with CONCOCT,MaxBin2, and MetaBAT2
- Refine & reassemble bins with metaWRAP
- Taxonomic assignment with GTDB-tk
- Relative abundances with bwa and samtools
- Reconstruct & evaluate genome-scale metabolic models with CarveMe and memote
- Species metabolic coupling analysis with SMETANA
Bonus:
- Growth rate estimation with GRiD, soon to be replaced by SMEG or CoPTR
- Pangenome analysis with roary
- Eukaryotic draft bins with EukRep and EukCC
Usage
_________________________________________________________________________/\\\\\\\\\\\\___/\\\\\\\\\\\\\\\___/\\\\____________/\\\\_
_______________________________________________________________________/\\\//////////___\/\\\///////////___\/\\\\\\________/\\\\\\_
__________________________________________/\\\________________________/\\\______________\/\\\______________\/\\\//\\\____/\\\//\\\_
____/\\\\\__/\\\\\________/\\\\\\\\____/\\\\\\\\\\\___/\\\\\\\\\_____\/\\\____/\\\\\\\__\/\\\\\\\\\\\______\/\\\\///\\\/\\\/_\/\\\_
__/\\\///\\\\\///\\\____/\\\/////\\\__\////\\\////___\////////\\\____\/\\\___\/////\\\__\/\\\///////_______\/\\\__\///\\\/___\/\\\_
_\/\\\_\//\\\__\/\\\___/\\\\\\\\\\\______\/\\\_________/\\\\\\\\\\___\/\\\_______\/\\\__\/\\\______________\/\\\____\///_____\/\\\_
_\/\\\__\/\\\__\/\\\__\//\\///////_______\/\\\_/\\____/\\\/////\\\___\/\\\_______\/\\\__\/\\\______________\/\\\_____________\/\\\_
_\/\\\__\/\\\__\/\\\___\//\\\\\\\\\\_____\//\\\\\____\//\\\\\\\\/\\__\//\\\\\\\\\\\\/___\/\\\\\\\\\\\\\\\__\/\\\_____________\/\\\_
_\///___\///___\///_____\//////////_______\/////______\////////\//____\////////////_____\///////////////___\///______________\///__
Usage: bash metaGEM.sh [-t|--task TASK]
[-j|--nJobs NUMBER OF JOBS]
[-c|--cores NUMBER OF CORES]
[-m|--mem GB RAM]
[-h|--hours MAX RUNTIME]
Snakefile wrapper/parser for metaGEM.
Options:
-t, --task Specify task to complete:
SETUP
createFolders
downloadToy
organizeData
WORKFLOW
fastp
megahit
crossMap
concoct
metabat
maxbin
binRefine
binReassemble
extractProteinBins
carveme
memote
organizeGEMs
smetana
extractDnaBins
gtdbtk
abundance
grid
prokka
roary
VISUALIZATION (in development)
qfilterVis
assemblyVis
binningVis
taxonomyVis
modelVis
interactionVis
growthVis
-j, --nJobs Specify number of jobs to run in parallel
-c, --nCores Specify number of cores per job
-m, --mem Specify memory in GB required for job
-h, --hours Specify number of hours to allocated to job runtime
Automated installation
Clone this repository to your HPC or local computer and run the env_setup.sh
script:
git clone https://github.com/franciscozorrilla/metaGEM.git
cd metaGEM
bash env_setup.sh
This script will set up 3 conda environments, metagem
, metawrap
, and prokkaroary
, which will be activated as required by Snakemake jobs.
CheckM
CheckM is used extensively to evaluate the output of various intermediate steps. Although the CheckM package is installed in the metawrap
environment, the user is required to download the CheckM database and run checkm data setRoot <db_dir>
as outlined in the CheckM installation guide.
CPLEX
Unfortunately CPLEX cannot be automatically installed in the env_setup.sh
script, you must install this dependency manually within the metagem conda environment. GEM reconstruction and GEM community simulations require the IBM CPLEX solver, which is free to download with an academic license. Refer to the CarveMe and SMETANA installation instructions for further information or troubleshooting. Note: CPLEX v.12.8 is recommended.
Manual installation
You can manually set up the environments with the following chunks of code.
metaGEM
conda create -n metagem mamba
source activate metagem
mamba install python snakemake fastp megahit bwa samtools=1.9 kallisto concoct=1.1 metabat2 maxbin2 gtdbtk eukrep eukcc smeg motus
pip install --user memote carveme smetana
metaWRAP
conda create -n metawrap
source activate metawrap
conda install -c ursky metawrap-mg=1.3.2
prokka-roary
conda create -n prokkaroary
source activate prokkaroary
conda install prokka roary
Tutorial
metaGEM can be used to explore your own gut microbiome based on at-home-test-kit seqencing data from services such as unseen bio. The following demo showcases the metaGEM workflow on two unseenbio samples.
Publications
The metaGEM workflow has been used in some capacity in the following publications:
Plastic-degrading potential across the global microbiome correlates with recent pollution trends
Jan Zrimec, Mariia Kokina, Sara Jonasson, Francisco Zorrilla, Aleksej Zelezniak
bioRxiv 2020.12.13.422558; doi: https://doi.org/10.1101/2020.12.13.422558
Please cite
metaGEM: reconstruction of genome scale metabolic models directly from metagenomes
Francisco Zorrilla, Kiran R. Patil, Aleksej Zelezniak
bioRxiv 2020.12.31.424982; doi: https://doi.org/10.1101/2020.12.31.424982