/Trawler

Trawler is a integrated pipeline for the analysis of large scale chromatin-immunoprecipitation (ChIP) data and other large scale DNA-RNA datasets.

OtherNOASSERTION

===============
Welcome to Trawler standalone 1.2
===============

The BSD/MIT license.
     Authors: L. Ettwiller, B. Paten, M. Ramialison, Y. Haudry


Reference: Ettwiller L., Paten B.,Ramialison M., Birney E., Wittbrodt J.
     Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation.
          Nat Methods. 2007 Jul;4(7):563-5. Epub 2007 Jun 24.
          PMID: 17589518


More information about Trawler can be found at:

     http://ani.embl.de/laurence/trawler/


*****************************************************************
***                                                           ***
***               Installation procedure                      ***
***                                                           ***
***                                                           ***
***                                                           ***
***       System Requirements                                 ***
***       Installation                                        ***
***       Dependencies                                        ***
***       Licensing                                           ***
***       Usage                                               ***
***                                                           ***
***    Documentation available through                        ***
***       http://ani.embl.de/trawler/doc                      ***
***                                                           ***
*****************************************************************

*********************************
***                           ***
***     System Requirements   ***
***                           ***
*********************************

The program has been tested successfully on Unix/Linux (Fedora 9),
Mac OS X (versions 10.4, 10.5) and Windows (XP).

The following software packages are required for running Trawler:
perl 5.6 or above
Java 5.0 or above

*********************************
***                           ***
***     Installation          ***
***                           ***
*********************************

please see the INSTALL.txt file for detailed installation instructions.


*********************************
***                           ***
***     Dependencies          ***
***                           ***
*********************************

The following software packages are required:
- Algorithm::Cluster (CPAN: http://search.cpan.org/~mdehoon/)

The following software packages are directly provided in the Trawler distribution:
- treg_comparator (http://treg.molgen.mpg.de)
- Jalview (http://www.jalview.org/)
- WebLogo (http://weblogo.berkeley.edu/)


*********************************
***                           ***
***     Licensing             ***
***                           ***
*********************************

Please see the file called LICENSE.txt


*********************************
***                           ***
***     USAGE                 ***
***                           ***
*********************************

USAGE
=====

trawler -sample [file containing the enriched sequences] -background [file containing the background sequences]

    -sample (FASTA format) better to be repeat-masked.
    -background (FASTA format)

OPTIONAL PARAMETERS
===================

    [MOTIF DISCOVERY]
    -occurrence (optional) minimum occurrence in the sample sequences. [DEFAULT = 10]
    -mlength (optional) minimum motif length. [DEFAULT = 6]
    -wildcard (optional) number of wild card in motifs. [DEFAULT = 2]
    -strand (optional) single or double. [DEFAULT = single]

    [CLUSTERING]
    -overlap (optional) in percentage. [DEFAULT = 70]
    -motif_number (optional) total number of motifs to be clustered. [DEFAULT = 200]
    -nb_of_cluster (optional) fixed number of cluster. if this option is used, the k-mean clustering algorithm with fixed k will be used instead of the self organizing map (SOM). [DEFAULT = NULL]

    [VISUALIZATION]
    -directory (optional) output directory. [DEFAULT = "$TRAWLER_HOME/myResults"]
    -dir_id (optional) gives an id to the results directory. [DEFAULT = NULL]
    -xtralen (optional) add bases around the motifs for the logo. [DEFAULT = 0]
    -alignments (optional) file containing the list of files containing the aligned sequences (see README file for more info) [DEFAULT = NULL]
    -ref_species (optional) name of the reference species [DEFAULT = NULL]
    -clustering (optional) if 1 the program clusters the instances, if 0 no clustering. [DEFAULT = 1]
    -web (optional) if 1 the output will be a web page with all the information [DEFAULT = 1]

USAGE EXAMPLES
==============

Trawler-standalone comes in two flavors: [1] with the conservation information or
[2] without. For the conservation you need to provide Trawler-standalone with a file
containing all the orthologous sequences (not aligned).

[1] with conservation

> trawler.pl -sample file_sample -background file_background -alignments file_alignments -ref_species reference_species_name

[2] without conservation

> trawler.pl -sample file_sample -background file_background


By default Trawler-standalone performs the full analysis nevertheless it is possible to

[1] get only the over-represented motifs:

> trawler.pl -sample file_sample -background file_bakground -clustering 0

[2] get only the over-represented motifs and the cluster (PWM):

> trawler.pl -sample file_sample -background file_bakground -web 0

OPTIONS DESCRIPTION
===================

    [MOTIF DISCOVERY]

    -occurrence : [DEFAULT = 10] The minimum number of time the motif occurs in the sample sequence.
            The lower limit defined by the suffix tree is 2 motifs. Nevertheless if your chromatin IP
            contains more than just a few sequences, you would expect to see the motif occurring quite often.
            For typical chromatin IP data, a good minimum number of motif occurrence is about 10 to 20.
            If you do not get anything interesting with this values, try a higher or lower number according
            to your sample size.
    -mlength : [DEFAULT = 6] Trawler-standalone does not assess motifs of fixed length.
            The maximum length has been set to 20 nucleotides but the minimum length is a user defined parrameter.
            A good minimum length is around 5-6 bp for typical transcription factor binding sites.
    -wildcard : [DEFAULT = 2] The number of wildcards is the maximum number of positions in the motif
            that can have a degenerate nucleotide. In IUPAC code, the possible wildcards used in Trawler-standalone are
            the following :
            M = [AC], R = [AG], Y = [CT], W = [AT], S = [CG], K = [GT], N = [ATCG]
            (the latest counts as 2 mismatches). A possible motif with -wildcard option set
            to 2 can be AGCMTWA for example.
    -strand : [DEFAULT = single] The analysis can be performed either on single-stranded or double-stranded sequences.

    [CLUSTERING]

    -overlap : [DEFAULT = 70] The minimum percentage overlap for two motifs to be considered in the same cluster.
            The value is by default 70 %; lowering down this value will cluster more distant motifs.
    -motif_number : [DEFAULT = 200] The maximum number of motifs to be considered for clustering.
            If the option is set to 200, the best 200 motifs returned by the discovery step will be used
            for the clustering step.
    -nb_of_cluster : [DEFAULT = NULL] By default, the number of cluster(s) is defined by the
            self organizing map algorithm (SOM). By using this option, the user set the number of
            expected cluster(s). In this case, the SOM algorithm is replaced by the k-mean clustering algorithm.

    [VISUALIZATION]

    -directory : [DEFAULT = "$TRAWLER_HOME/myResults"] name of the directory where all the outputs
            will be stored. If no directory is specified, then a default directory "myResults" will be created,
            Individual results will be stored in a separate directory with the following pattern
            "tmp_yyyy-mm-dd_hh:mm_ss" (ex : tmp_2009-05-20_12h13_20).
    -dir_id : [DEFAULT = NULL] gives a meaningful ID to the results directory. For instance, if this option
            is set to "myID", this will create a directory named "myID_yyyy-mm-dd_hh:mm_ss".
    -xtralen : [DEFAULT = 0] The number of additional nucleotides included in the PWM flanking the core motif.
    -alignments : [DEFAULT = NULL] Alignment file in the correct format. If alignments are provided,
            the motifs are visualized in the context of an alignment using Jalview.
            If the -alignment option is activated, the option -ref_species needs to be set.
    -ref_species : [DEFAULT = NULL] Name of the reference species. Only use this option if
            the -alignment option is activated.
    -clustering : [DEFAULT = 1]. By default, Trawler proceeds to the clustering step.
            To only obtain the over-represented motifs, set the -clustering option to 0.
            If the -clustering option is set to 0, no webpage is produced.
    -web : [DEFAULT = 1] By default, Trawler proceeds to create a web page summarizing the result.
            If this option is set to 0, Trawler proceeds until the clustering step only.
            Useful for batch  analysis.


*********************************
***                           ***
*** Additional features       ***
***                           ***
*********************************


Additional reference matrices can be added to the known matrices used by Treg comparator by adding
to the file matrix_set_all your PWM in the following format :

ID#name#text#text#0.3548828125 ...
Matrix entries (float) are separated by a whitespace. The matrix entries
start with the frequency of "A", followed by that of "C", "G", and "T" in the
first position (5'), then the same for the second position and so on.


*********************************
***                           ***
*** Trawler Team              ***
***                           ***
*********************************

Laurence Ettwiller  <ettwille@embl.de>
Benedict Paten      <benedict@soe.ucsc.edu>
Mirana Ramialison   <ramialis@embl.de>
Yannick Haudry      <haudry@embl.de>