Failing simple alignment with gop=10 (default) but not with gop=8
Opened this issue · 8 comments
from gotoh2 import Aligner, map_coordinates
g1 = Aligner(gop=10) # default gop=10
g2 = Aligner(gop=8)
ref = 'TACGTA'
query = 'TACTA' # G removed
a1 = g1.align(ref, query)
a2 = g2.align(ref, query)
print(a1)
print(a2)
('TACGTA', 'TACT-A', 14)
('TACGTA', 'TAC-TA', 16)
Thanks for reporting this @davidchampredon
I've reproduced the issue.
Score is consistent with the correct alignment
T A C G T A
T A C - T A
+5 +5 +5 -11 +5 +5 = 14
It wasn't those lines - the problem was that local alignment requires one to zero out positive costs in the R matrix.
Crap. Traceback failed while I was validating these changes on SARS-COV-2 sequences:
Process 0 of 1 running hCoV19/pangolin/Guangxi/P4L/2017|EPI_ISL_410538|2017
traceback failed: i=1 j=1 bit=0
Traceback (most recent call last):
File "scripts/first-align.py", line 135, in <module>
main()
File "scripts/first-align.py", line 119, in main
trim_seq, inserts = pair_align(refseq, s, threshold=args.threshold)
File "scripts/first-align.py", line 66, in pair_align
aref, aquery, ascore = aligner.align(ref, query)
File "/usr/local/lib/python3.6/dist-packages/gotoh2-0.1-py3.6-linux-x86_64.egg/gotoh2.py", line 99, in align
self.matrix
RuntimeError: Traceback failed, try local alignment
Temporary patch:
-
revert these lines of code:
Lines 190 to 192 in 65a29a5
-
SARS-COV-2 alignments looked fine but let's switch to another alignment program and write a wrapper for gotoh2 to use that program by default (C program optional)
-
time permitting, diagnose traceback issue...
Cost matrix:
* A C T A
* 0 0 0 0 0
A 0 -5 0 0 -5
C 0 0 -10 0 0
G 0 0 0 -6 0
T 0 0 0 -4 -2
A 0 -5 0 0 -9
Bit matrix before edge assignment:
* A C T A
* 80 80 80 80 16 4
A 80 84 44 44 20 4
C 80 44 84 50 10 4
G 80 44 73 84 14 4
T 80 44 105 76 28 4
A 64 68 33 37 4 4
4 4 4 4 4 4
Bit matrix after edge assignment:
* A C T A
* 0 80 80 0 0 4
A 80 4 8 32 4 4
C 80 32 4 2 34 4
G 80 40 1 4 6 4
T 0 8 41 21 4 4
A 0 4 9 13 4 4
4 4 4 4 4 4
I think that 21
in the last matrix is problematic - encoding 16, 4 and 1 (e, c, a), where a
indicates a vertical edge is on the optimal path.