RepARK: A Perl repository from PhKoch

## License ##
This software was developed at the Leibniz Institute on Aging - Fritz Lipmann Institute (FLI; http://www.leibniz-fli.de/) under a mixed licensing model. This means that researchers at academic and non-profit organizations can use it for free, while for-profit organizations are required to purchase a license. By downloading the package you agree with conditions of the FLI Software License Agreement for Academic Non-commercial Research (FLI-LICENSE).


## Description ##
RepARK.pl is a wrapper script for constructing a repeat library from sequencing reads using Jellyfish and either Velvet or CLC.


## System Requirements ##
- Jellyfish (tested with versions 1.1.6, 2.1.1, 2.1.4)
	http://www.cbcb.umd.edu/software/jellyfish
- Velvet (tested with version 1.2.08)
	http://www.ebi.ac.uk/~zerbino/velvet
- CLC Assembler (optional, tested with version 4.4.0)
	http://www.clcbio.com/

These programs should be in the PATH, or the appropriate variables (e.g. $jellyfish_path) in RepARK.pl should be modified accordingly.


## Input Data ##
Input data can be one or more fasta or fastq files. They can be given as arguments with the option -l (multiple times) or included in the RepARK.pl
The reads can be in gzipped format ending with ".gz". Note that it is currently not supported to mix compressd and uncompressed libraries.

## Example Demo Data ##
In order to test the pipeline run ./RepARK.pl without arguments.
The command line parameters are optional. If either are omitted, RepARK.pl will use the defaults which are adapted to the demo data.
Check if the assembly results in 133 sequences of a size between 27,902-27,917bp. Due to velvet creates slightly different assemblies with the same data, we provided this size range and 10 libraries based on the demo data with defauld parameters that can be downloaded from [1]. If you discover greater disagreements please contact the authors (see Support).


## Output ##
The repeat consensuses can be found in the file repeat_lib.fasta within the working directory.


## Usage ##

RepARK.pl [ options ]

Options: [default_values]
  -o  output dir [RepARK_working]
  -l  library file - can be used multiple times [pe400.fq.gz]
  -a  assembler  (options: clc,velvet) [velvet]
  -s  jellyfish hash size [100000000]
  -k  jellyfish kmer size [31]
  -p  number of threads used by jellyfish [1]
  -t  set manually a threshold, if automatic prediction is not working (this step will then be skipped)
  -d  debug mode gives some more information
  -n --nojf  skip Jellyfish computation (jf_RepARK.kmers|histo must exist in the working dir)
  -h --help  this help


## Notes ##
For 40x human data, we set -s to 54100000000 which reserves 370Gb of RAM for jellyfish (v 1.1.6) but there is no merging of the intermediate files needed.

In order to discriminate between the error peak and the unique-kmer peak, sufficient read coverage depth is required (>10x).

## Support ##
In case of problems, please run the script in debugging mode (-d) and send the output to philippk@fli-leibniz.de.



[1] ftp://genome.fli-leibniz.de/pub/repeat-assemblies/RepARKdemoruns.tar.gz


Change Log
---------
1.3.0	- clc_assembler support
	- gzipped input file support
	
1.2.2	- Checking for k <= velvet's hash size

1.2.1	- Included support for jellyfish v2.x

1.2	- Initial release
PhKoch/RepARK