/RepEx

A CLI DNA sequence analyzer tool

Primary LanguageC

					EEEEEE                    EEEEEEE
					EE   EE	                  EE
					EE   EE   EEEEEE  EEEEEE  EE       EE   EE
					EEEEE	  EE  EE  EE  EE  EEEEE     EE EE
					EE  EE    EEEEEE  EEEEEE  EE         EEE
					EE   EE   EE      EE      EE        EE EE
					EE    EE  EEEEEE  EE      EEEEEEE  EE   EE
					
						    General Instructions
						   ======================
Abstract
========
Genomic sequences are far from being random but are made up of systematically ordered and information rich patterns. These repeated sequence patterns have been vastly utilized for their fundamental importance in understanding the genome functioning and organization. To this end, a comprehensive toolkit, RepEx, has been developed which would extract all possible repeat (direct, inverted, everted and mirror) patterns from the given genomic DNA sequences. The tool kit can also be used to fetch the direct and inverted repeats present in the protein sequences. Further, RepEx is capable of extracting identical and gapped similar/degenerated repeats with a user defined spacer limits. The proposed toolkit is proved to be more robust and efficient compared to the existing methods.

Usage
=====

repex [options]

Program options
---------------

-m	Type of molecular sequence(s): DNA [n] or Protein [p] {Default: [n]}
-t	Type of repeat to be extracted: Direct [d] or Inverted [i] or Palindrome [ip] or Mirror [m] or Everted [e] (Mirror and everted repeat doesn't exist in protein sequence(s), thus -t m or -t e will not work for protein sequence(s)) {Default: [i]}
-l	Minimum length of repeats to be extracted [positive integers]. (Caution !! Any lower than around 15 can significantly increase the number of spurious matches and therefore burst up the runtime) {Default: [20]}
-s	Spacer intervals i.e., the number of bases or residues between the repeat pattern and its copy. All [a] or Local [l] (within 100 bases) or Global [g] (outside 100 bases) or Manual (For manual option, enter your length (x) of spacers preceding with appropriate letter (greater: g, lesser: l, equal: e) -s [gx or lx or ex]) {Default: [l] for DNA and [a] for proteins}
-c	Class of repeat to be extracted: Identical [i] or Degenerative [d] or both [b]. {Default: [i]}
-f	Input file path.

Example: repex -m n -t m -l 20 -s l50 -c b -f /home/User/Documents/fasta.seq

The above example can be use to extract both identical and degenerative (-c b) mirror repeats (-t m) from DNA sequences (-m n) with minimum length of 20 (-l 20), where the spacer intervals should be less than 50 bases(-s l50) from file named fasta.seq located at /home/User/Documents.

Installation Guide
==================

For installation, see INSTALL.

SYSTEM REQUIREMENTS
===================

   The RepEx package requires the following to run successfully. In the
absence of one or more of these utilities, RepEx may fail to run correctly. 
Listed in parenthesis are the minimum versions required to run the
RepEx package. These versions, or subsequent versions should assure the proper
execution of RepEx.

    - make (GNU make 3.79.1)
    - perl (PERL     5.6.0)
    - sh   (GNU sh   1.14.7)
    - csh  (tcsh     6.10.00)
    - g++  (GNU gcc  2.95.3)
    - sed  (GNU sed  3.02)
    - awk  (GNU awk  3.0.4)
    - ar   (GNU ar   2.9.5)

Repex can be ran almost on any Linux based operating systems, provided the above mentioned utilities are installed properly.

   Sufficient memory and disk space are necessary, but required sizes vary
with input size. Be aware of your disk and memory usage, because insufficient
capacities will result in incorrect or missing output. Required resources
differ depending on the input size, but in general 512 MB of RAM and 1 GB of
disk space is sufficient.
   It is possible to port the toolkit to any system with a C++ compiler but
this has not been tested and will not be supported. In addition, you may need
to alter the Makefile to direct 'make' to your native compiler and other
system resources.
   For Mac OSX, the Mac development kit must be downloaded and installed. This
kit will include 'gcc', 'ar', and 'make' which are necessary for building
RepEx. RepEx is not supported for any Mac operating system other than OSX.
   For Windows users, Cygwin or other Unix-like environment and command-line interface
for Microsoft Windows can be installed with the above mentioned utilities but
this has not been tested and will not be supported. We apologies for the inconvenience.

Availability 
=============

RepEx should be used only for pure academic purpose and commercial use is strictly prohibited.

#####################################################################################################

				  Thank you for doing Science !!!

#####################################################################################################

Please address queries and bug reports to: <satnamrsm@gmail.com> with "RepEx" as the subject heading.

Last update: Jan 31, 2013