Currently implements Needleman-Wunsch, Smith-Waterman and Needleman-Wunsch-Gotoh algorithms based on the Dynamic Programming (DP).
To treat (i,j) as 1-based coordinates (instead of Ruby's 0-based), use the following accessors:
- DP::Matrix#get(i,j), DP::Matrix#set(i,j)
- DP::Sequence#[i], DP::Sequence#[j]
M(i,j) | Target A C C A G T
-------+--------------------------------------------------
Query | (0,0) i=1 i=2 i=3 i=4 i=5 i=6
A | j=1 (1,1) (2,1) (3,1) (4,1) (5,1) (6,1)
C | j=2 (1,2) (2,2) (3,2) (4,2) (5,2) (6,2)
A | j=3 (1,3) (2,3) (3,3) (4,3) (5,3) (6,3)
G | j=4 (1,4) (2,4) (3,4) (4,4) (5,4) (6,4)
C | j=5 (1,5) (2,5) (3,5) (4,5) (5,5) (6,5)
Each sub-class has the following accessors to modify parameters before the calculation of DP:
aligner = DP.new(target, query)
aligner.gap_penalty = 2 # gap open penalty
aligner.ext_penalty = 1 # gap extention penalty
aligner.match = 1
aligner.mis_match = -1
In the DP#dp method, init_matrix, calc_matrix and trace_back functions are called which are required to be implemented in each sub-class.
aligner.dp
# => call init_matrix(), calc_matrix() and trace_back() then print alignment
- Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48:443-53.
- Smith TF, Waterman MS: Identification of common molecular subsequences. J Mol Biol 1981, 147:195-7.
- Gotoh O: An improved algorithm for matching biological sequences. J Mol Biol 1982, 162:705-8.
MIT license (C) 2019 Toshiaki Katayama ktym@dbcls.jp and Toshiyuki Yokoyama