/megagta

HMM-guided metagenomic gene-targeted assembler using iterative de Bruijn graphs

Primary LanguageC++

MegaGTA

Getting Started

git clone https://github.com/HKU-BAL/megagta.git
cd megagta
git submodule update --init --recursive
./make.sh

# assemble with default parameters
bin/megagta.py -r reads.fq -g gene_list.txt -o output_dir

# post-processing with clustering similarity threshold 0.99 (i.e. 1-0.01)
bin/post_proc.sh -g gene_resources_dir -d output_dir/contigs -m 16G -c 0.01

Required Tools

Simple Settings

  • In bin/post_proc.sh users are supposed to modidy the paths for HMMER & UCHIME to their own ones.

  • gene_list.txt is a config file. Each line of the file specifying interested genes and some of their information required by the assembly, including forward/reverse HMM, and the reference multiple sequence alignment of that gene family. Following is an example.

rplB share/RDPTools/Xander_assembler/gene_resource/rplB/for_enone.hmm share/RDPTools/Xander_assembler/gene_resource/rplB/rev_enone.hmm share/RDPTools/Xander_assembler/gene_resource/rplB/ref_aligned.faa
nirK share/RDPTools/Xander_assembler/gene_resource/nirK/for_enone.hmm share/RDPTools/Xander_assembler/gene_resource/nirK/rev_enone.hmm share/RDPTools/Xander_assembler/gene_resource/nirK/ref_aligned.faa
  • gene_resources_dir is an input folder for post-processing which follows the format Xander assembler used. It can be found through:
megagta/share/RDPTools/Xander_assembler/gene_resource
  • If genes not included are in interest, please follow the README file of Xander to prepare the resource.