
Snakemake pipeline for Resistome and Mobilome annotation of unassembled metagenomes

Primary LanguagePythonMIT LicenseMIT

ARG and Mobilome annotation - read-based pipeline

Read-based pipeline for the annotation of Antibiotic resistance and mobilome genes.

Authors: Ahmed Bargheet, Alise Ponsero, Hanna Noordzij, Ching Jian, Katri Korpela, Mireia Valles-Colomer, Justine Debelius, Alexander Kurilshikov, Veronika K. Pettersen

Installation and requirements

This pipeline requires the use of Snakemake and usearch v11. If not previously installed run the following code:

git clone https://github.com/aponsero/Resistome_ReadBased_Snakemake.git
cd Resistome_ReadBased_Snakemake

## Snakemake installation in a conda environment
conda env create snakemake_env --file envs/env_snakemake.yml
# Alternatively you can create the environment manually:
conda env create -n snakemake_env
conda activate snakemake_env
conda install snakemake

## Installation of usearch v11
mkdir bin
cd bin
wget https://drive5.com/downloads/usearch11.0.667_i86linux32.gz
chmod +x usearch11.0.667_i86linux32.gz
gunzip usearch11.0.667_i86linux32.gz
mv usearch11.0.667_i86linux32 usearch

Additionally, custom shortbred databases should be downloaded from Zenodo and from Metaphlan4.

mkdir databases
cd databases
wget https://zenodo.org/records/11491353/files/CARD_V.3.2.7_markers.faa
wget https://zenodo.org/records/11491353/files/mobile_OG_V.1.6_markers.faa
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar
tar -xvf mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.tar
tar -xvf mpa_vOct22_CHOCOPhlAnSGB_202212.tar
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212_marker_info.txt.bz2
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212_species.txt.bz2

Overview of the pipeline

Below is the overview of the steps included in the pipeline. The pipeline requires reads after quality control and trimming. The pipeline include 3 steps :

  • Taxonomic profiling of the microbial community using Metaphlan4 v4.0.6
  • ARG profiling using Shortbred v0.9.5 against the CARD database v.3.2.7
  • Mobilome profiling using Shortbred v0.9.5 against the MobileOG database v1.6
graph LR
    reads[(QC reads)] --> concat_reads ;
    concat_reads --> Metaphlan4 ;
    concat_reads --> Shortbred_ARG ;
    concat_reads --> Shortbred_MobileOG;
    Metaphlan4 --> Taxonomic_profiles ;
    Shortbred_ARG --> Resistome_profiles ;
    Shortbred_MobileOG --> Mobilome_profiles ;
    subgraph Rule Concat
        concat_reads ;
    subgraph Rule Metaphlan
        Metaphlan4 ;
    subgraph Rule Resistome
        Shortbred_ARG ;
        Resistome_profiles ;
    subgraph Rule Mobilome
        Shortbred_MobileOG ;
        Mobilome_profiles ;

How to run the pipeline

The pipeline expects the inputs to be provided in the "input" folder as paired files ({sample_id}_1.fastq.gz and {sample_id}_2.fastq.gz) and can be run in its entirety by providing the expected final output : {sample_id}__finallog.txt

As an example, to run the pipeline on the test file (TestPipeline_1.fastq.gz and TestPipeline_2.fastq.gz):

# run snakemake
snakemake --cores 8 --use-conda results/TestPipeline_finallog.txt

Expected outputs

The pipeline will generate 3 main output files in the results folder:

  • Taxonomic profile for each sample : results/Metaphlan/${sample_id}_profile.txt
  • Resistome profile for each sample : results/Resistome/{sample}_resistome.txt
  • Mobilome profile for each sample : results/Mobilome/{sample}_mobilome.txt