Description: AW is an implementation of a non-trivial linear-time and linear-space algorithm to compute all avoided or all overabundant words in a given DNA or proteins sequence. The definitions used for expectation and variance are described and biologically justified in:
V. Brendel, J.S. Beckmann, and E.N. Trifonov:
Linguistics of nucleotide sequences: morphology and comparison of vocabularies.
Journal of Biomolecular Structure and Dynamics 4(1), 11-21 (1986).
Installation: To compile AW, please follow the instructions given in file INSTALL.
Usage: aw <options>
Standard (Mandatory):
-a, --alphabet <str> `DNA' for nucleotide sequences or `PROT'
for protein sequences.
-i, --input-file <str> (Multi)FASTA input filename.
-o, --output-file <str> Output filename.
-t, --threshold <dbl> The threshold.
Optional:
-w, --words-class <int> `0' to check for AVOIDED words or `1' to
check for OVERABUNDANT (default: 0).
-k, --length <int> Fixed length of words (default: search all).
-A, --absent <int> `1' to check also for ABSENT AVOIDED words
or `0' otherwise (default: 0).
-r, --reverse <int> `1' to check for the reverse complement or
`0' otherwise (default: 0).
Citations:
Y. Almirantis, P. Charalampopoulos, J. Gao, C. S. Iliopoulos, M. Mohamed, S. P. Pissis, D. Polychronopoulos:
Optimal Computation of Avoided Words.
WABI 2016: 1-13.
Y. Almirantis, P. Charalampopoulos, J. Gao, C. S. Iliopoulos, M. Mohamed, S. P. Pissis, D. Polychronopoulos:
Optimal Computation of Overabundant Words.
WABI 2017: 4:1-4:14.
License: GNU GPLv3 License; Copyright (C) 2016 Jia Gao and Solon P. Pissis