Source code for analysis of hospital wastewater metagenomic data
Welcome to ATTACK-AMR - a bioinformatics pipeline for analysis of hospital wastewater antimicrobial resistance metagenomic data.
This pipeline requires the package manager Conda and the workflow management system Snakemake. Additional dependencies not handled by Snakemake are described in Section 1.3.
Download Miniconda3 installer for Linux from here. Installation instructions are here. Once installation is complete, you can test your Miniconda installation by running:
$ conda list
Snakemake recommends installation via Conda:
$ conda install -c conda-forge mamba
$ mamba create -c conda-forge -c bioconda -n snakemake snakemake
This creates an isolated enviroment containing the latest Snakemake. To activate it:
$ mamba init
$ mamba activate snakemake
To test snakemake:
$ snakemake --help
We require gawk to process the filtering stage of our databases.
$ sudo apt update && sudo apt upgrade
$ sudo apt install gawk
To test gawk:
$ gawk --version
Download ATTACK-AMR from the online repository, or using the command line:
git clone https://github.com/bioinfodlsu/attack_amr
The pipeline requires, at the very least: (1) Metagenomic sequences (sample sequences can be downloaded at ENA, and (2) reference protein databases for (Resfinder, MGE). These and other input parameters are specified via a YAML-format config file -- config.yaml is provided in the config folder.
After constructing a config.yaml file and with the snakemake conda environment you created earlier activated, you can call the pipeline from the top-level directory of ATTACK-AMR:
cd attack_amr
snakemake --configfile config.yaml --use-conda --cores all
Outputs are stored the top-level directory of ATTACK-AMR.