metaGT: A Nextflow repository from Ex Center for Algorithmic Biotechnology

Assembly and quantification metatranscriptome using metagenome data.

Version: see VERSION

Introduction

MetaGT is a bioinformatics analysis pipeline used for improving and quantification metatranscriptome assembly using metagenome data. The pipeline supports Illumina sequencing data and complete metagenome and metatranscriptome assemblies. The pipeline involves the alignment of metatranscriprome assembly to the metagenome assembly with further extracting CDSs, which are covered by transcripts.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

Quick Start

Install nextflow
Install any of Conda for full pipeline reproducibility
Download the pipeline, e.g. by cloning metaGT GitHub repository:
```
git clone git@github.com:ablab/metaGT.git
```
Test it on a minimal dataset by running:
```
nextflow run metaGT -profile test,conda
```

Start running your own analysis!

Typical command for analysis using reads:

nextflow run metaGT -profile <conda> --dna_reads '*_R{1,2}.fastq.gz' --rna_reads '*_R{1,2}.fastq.gz'

Typical command for analysis using multiple files with reads:

nextflow run metaGT -profile <conda> --dna_reads '*.yaml' --rna_reads '*.yaml' --yaml

Typical command for analysis using assemblies:

nextflow run metaGT -profile <conda> --genome '*.fasta' --transcriptome '*.fasta'

Pipeline Summary

Optionally, if raw reades are used:

Sequencing quality control (FastQC)
Assembly metagenome or metatranscriptome (metaSPAdes, rnaSPAdes )

By default, the pipeline currently performs the following:

Annotation metagenome (Prokka)
Aligning metatranscriptome on metagenome (minimap2)
Annotation unaligned transcripts (TransDecoder)
Clustering covered CDS and CDS from unaligned transcripts (MMseqs2)
Quantifying abundances of transcripts (kallisto)

Citation

MetaGT was developed by Daria Shafranskaya and Andrey Prjibelski. If you use it in your research please cite:

MetaGT: A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data

Feedback and bug report

If you have any questions, please leave an issue at out GitHub page.