/Gene_Prediction

Pipeline for predicting coding and non-coding genes from assembled genomes

Primary LanguagePython

Gene Prediction Pipeline

Overview:

A Gene Prediction pipeline that predicts coding and non-coding genes from assembled genomes using various ab-initio and homology based programs and tools. For predicting coding genes the pipeline uses GeneMarkS-2 and Prodigal, meanwhile, for predicting non-coding genes it uses ARAGORN, BARRNAP, RNAmmer and Infernal. BLAST is used to validate the results of the coding genes and provides results as false-positive or true-positives in FASTA/.fna format.

Pipeline Requirements:

  1. PRODIGAL. Or: conda install -c bioconda prodigal
  2. GeneMarkS-2.

NOTE: If GeneMarkS-2 is being ran/downloaded on a MacOS then you would have to download the "64 bit key" along with GeneMarkS-2 and execute the following command once the files have been downloaded: cp gm_key_64 ~/.gm_key_64

  1. BLAST. Or: install -c bioconda blast
  2. BEDTools. Or: conda install -c bioconda bedtools
  3. Perl. Or: conda install -c anaconda perl

NOTE: Once downloaded, all tools are assumed to be installed onto your PATH.

Script execution:

python3 pipeline.py -i <assembled genome(s)> -org_cds <organism of interest's CDS file> -o <output directory name>