[Bug]: `Variation`s from different starting places give incorrect positions
MillironX opened this issue · 0 comments
MillironX commented
This bug is best explained by an example. Consider a conserved substitution at a reference position on two reads. These reads start at different positions in the reference sequence. We would expect that the Variation
s comprising the Variant
s of these reads would be the same.
Steps to reproduce
using BioAlignments, BioSequences, SequenceVariation
refseq = dna"ACAACTTTATCT"
mutseq = dna"ACATCTTTATCT"
@show PairwiseAlignment(AlignedSequence(mutseq, refseq), refseq)
read01 = AlignedSequence(mutseq[1:10], Alignment("10M", 1, 1))
read02 = AlignedSequence(mutseq[3:12], Alignment("10M", 1, 3))
aln01 = PairwiseAlignment(read01, refseq)
aln02 = PairwiseAlignment(read02, refseq)
@show Variant(aln01)
@show Variant(aln02)
Results
PairwiseAlignment(AlignedSequence(mutseq, refseq), refseq) = PairwiseAlignment{LongDNASeq, LongDNASeq}:
seq: 1 ACATCTTTATCT 12
||| ||||||||
ref: 1 ACAACTTTATCT 12
Variant(aln01) = Variant{LongDNASeq, DNA} with 1 edit:
A4T
Variant(aln02) = Variant{LongDNASeq, DNA} with 1 edit:
C2T
Expected results
PairwiseAlignment(AlignedSequence(mutseq, refseq), refseq) = PairwiseAlignment{LongDNASeq, LongDNASeq}:
seq: 1 ACATCTTTATCT 12
||| ||||||||
ref: 1 ACAACTTTATCT 12
Variant(aln01) = Variant{LongDNASeq, DNA} with 1 edit:
A4T
Variant(aln02) = Variant{LongDNASeq, DNA} with 1 edit:
A4T
Possible fix
Variant(::PairwiseAlignment)
assumes a starting reference and sequence position of 0.
If it instead pulled the refpos
and seqpos
of the first (starting) AlignmentAnchor
, then that should fix the problem.
I'll get started on a PR for that.