/MINUUR

Pipeline to pull microbial reads from WGS data and perform metagenomic analysis

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

MInUUR - Microbial INsight Using Unmapped Reads

Code Count Main Code Base Version License Last Commit Open Issues Repo Size

MInUUR is still in development - any feedback is welcome! Please contact: 248064@lstmed.ac.uk or dm me on Twitter: https://twitter.com/fooheesoon

MInUUR is a snakemake pipeline to extract unmapped whole genome shotgun sequencing reads and utilise a range of metagenomic analyses to characterise host-associated microbes. Orginally, MInUUR was intended to be used for the extraction of mosquito-associated bacterial symbionts, however, its application can be applied to other host-associated WGS data. MInUUR aims to leverage pre-existing WGS data to 'scavenge' for microbial information pertaining to host associated microbiomes - the key advantage being metagenomic reads as inputs to produce genus & species level classifications, functional inference and assembly of metagenome assembled genomes (MAGs).

MInUUR utilises several pieces of software in its pipeline:

  • Kraken2 to classify microbial taxa to species level from read sequences
  • KrakenTools to extract classified reads pertaining to microbes for downstream analysis
  • Bracken to reestimate taxonomic abundance from Kraken2 taxonomic report
  • MetaPhlan3 to classify microbial taxa using marker genes
  • HUMMan3 to functionally profile read sequences against the ChocoPhlan and Uniref databases
  • Megahit to perform metagenome assembly
  • Quast to generate assembly statistics
  • MetaBat2 to bin assembled contigs
  • CheckM to assess bin quality

In addition, MInUUR will produce 'tidy' data suitable for parsing to R or Python.

workflow_fig_crop

Installation of Snakemake

MInUUR is run using the bioinformatics workflow manager Snakemake

Snakemake is best installed using the package manager Mamba

Once Mamba is installed run

mamba create -c bioconda -c conda-forge --name snakemake snakemake

Installation of MInUUR

Use git clone https://github.com/aidanfoo96/MINUUR/ and cd MINUUR/workflow. This is the reference point from which the pipeline will be run. See the WIKI page for a full tutorial on establishing the configuration file to run the pipeline

Host Genome

MINUUR separates unmapped reads from typical host whole genome sequences. A high quality host genome is required for raw fastq inputs in order to separate reads. Download a high quality reference database of your choosing (in fasta format) and follow Bowtie2 build tutorial here to create the index.

Database requirements

For host removal, read classification, taxonomic abundance estimation and functional read profiling, MINUUR requires several databases to be installed on the users system. All databases can be downloaded at their following repositories

Kraken2 Database

Download the indexed Kraken2 and Bracken database of your choosing. The standard Kraken2 database may omit for important taxa, as such MINUUR also supports classification using a larger database of Bacterial and Archaeal sequences are available from the struo2 github repository, which prodvides indexed Kraken2 databases from the GTDB taxonomy database available here.

MetaPhlAn3 Database

MetaPhlAn3 requires a database file containing clade specific marker genes. Installation instructions of metaphlan are found here.

HUMAnN3 Database

Humann3 requires two databases, the ChocoPhlAn database and UniRef90 translated search databases. The choices of databases and download links are available in the github page above.

Running Snakemake

Once the configuration file has been configured to the users choosing (see WIKI), navigate to the workflow directory and run snakemake -np to test the pipeline will run as expected. If the user is happy all rules generate the desired output, use snakemake --cores N --use-conda to run the pipeline, with N denoting the number of cores for parrelization. If no parrelization is required, use --cores 1. Each rule of the pipeline will be run within individual conda environments that deploy the appropriate software where required. Run this using snakemake --cores N --use-conda

Docker image repositories & hosting

We host all of our docker images on two different repositories and periodically sync the images between the two:

  1. Docker repo for MINUUR - https://hub.docker.com/repository/docker/lcerdeira/minuur