Metagenome_assembly

WDL Workflow for metagenome assembly

Python script to generate mapping between non-redundant gene catalogue and MAGS

Introduction to WDL workflow

This pipeline uses docker image

All the inputs needed by the workflow are provided through a JSON file and can be generated using Womtool with the following command

java -jar womtool.jar inputs workflow.wdl > inputs.json

The pipeline can be run using Cromwell

java -jar cromwell.jar run workflow.wdl -i inputs.json

This pipeline will produce a number of directories and files

assemble; contains assembled contigs
predictgenes; gene coordinates file (GFF), protein translations and nucleotide sequences in fasta format
metabat2; binned contigs and a summary report
CheckM; genome assessment summary report
gtdbtk; taxonomic classification summary file
cluster_genes; representative sequences and list of clusters

Python3 script to map non-redundant gene catalogue back to contigs, MAGS and eggNOG annotations

The following softwares are required by python script:

python genes_MAGS_eggNOG_mapping.py --help

mapping table (tsv file) that links the non-redundant gene catalogue back to contigs, MAGs and to eggNOG annotations