Gene Prediction Pipeline
Overview:
A Gene Prediction pipeline that predicts coding and non-coding genes from assembled genomes using various ab-initio and homology based programs and tools. For predicting coding genes the pipeline uses GeneMarkS-2 and Prodigal, meanwhile, for predicting non-coding genes it uses ARAGORN, BARRNAP, RNAmmer and Infernal. BLAST is used to validate the results of the coding genes and provides results as false-positive or true-positives in FASTA/.fna format.
Pipeline Requirements:
- PRODIGAL. Or:
conda install -c bioconda prodigal
- GeneMarkS-2.
NOTE: If GeneMarkS-2 is being ran/downloaded on a MacOS then you would have to download the "64 bit key" along with GeneMarkS-2 and execute the following command once the files have been downloaded:
cp gm_key_64 ~/.gm_key_64
- BLAST. Or:
install -c bioconda blast
- BEDTools. Or:
conda install -c bioconda bedtools
- Perl. Or:
conda install -c anaconda perl
NOTE: Once downloaded, all tools are assumed to be installed onto your PATH.
Script execution:
python3 pipeline.py -i <assembled genome(s)> -org_cds <organism of interest's CDS file> -o <output directory name>