/glimmer

Primary LanguagePython

================================
Instructions to run GLIMMER
================================

NOTE: You must have a FASTA sequence file and coding sequences file. These can be downloaded from GenBank. This program can be run on eukarytic genomes, but the results will not be very accurate.

First, you must parse the coding sequences file to isolate the gene bounds. To do this, run
> ./readCodingSeqs.py path/to/coding/seqs.txt output_file

To run GLIMMER:
> ./build-imm --genome path/to/genome/sequence.fasta --max-length 8 --trueORFs path/to/trueORFs

where the max-length argument can be set as desired and trueORFs should be the path to the file generated by readCodingSeqs.py. You can also include the optional binary argument --fixed, which will run a fixed-length Markov model instead of IMM. --iterative is another optional binary argument. If present, the algorithm will run the Markov model for multiple iterations.


=================================
Instructions to run HIMM
=================================

NOTE: You must have a FASTA sequence file and coding sequences file. These can be downloaded from GenBank. This program is intended for use on eukaryotic cells, since it finds exons and introns.

First, you must parse the coding sequences file to obtain a list of genes with their introns and exons. To do this, run
> ./parseGenes.py path/to/coding/seqs.txt output_file

To run HIMM:
./findExons.py path/to/genome/sequence.fasta path/to/genes

where the genes argument should be the path to the file generated by parseGenes.py.