AlignmentPosition() is wrong with hard clips
Ahhgust opened this issue · 3 comments
There's a bug in the AlignmentPosition code in the BamRecord object (code below).
Hard clipping doesn't impact the start position in the read (removing the other half of the or statement should make the code correct)-- those bases are just missing from the reported read.
e.g., I have a read that looks like:
8:32868238-32874773_1151_1695_2:0:0_3:0:0_3cb 2048 1 1151 0 90H60M 0 -1 0 ACAGCTTAAAACTCAAAGGACTTGGCAGTGGTTTATATCCCTCTAGAGGAGCCTGTTCTA
For which the bamrecord AlignmentPosition() and AlignmentPositionEnd() is 90 and 60, respectively.
/** Get the start of the alignment on the read, by removing soft-clips
/
inline int32_t AlignmentPosition() const {
uint32_t c = bam_get_cigar(b);
int32_t p = 0;
for (size_t i = 0; i < b->core.n_cigar; ++i) {
// Remove the code after the ||
if ( (bam_cigar_opchr(c[i]) == 'S') || (bam_cigar_opchr(c[i]) == 'H'))
p += bam_cigar_oplen(c[i]);
else // not a clip, so stop counting
break;
}
return p;
}
On second thought, it's (theoretically) possible to have both hard and soft clipping on the same end of the read. I don't know why you'd do that, but perhaps:
if ( (bam_cigar_opchr(c[i]) == 'S') || (bam_cigar_opchr(c[i]) == 'H'))
p += bam_cigar_oplen(c[i]);
should be
if ( (bam_cigar_opchr(c[i]) == 'S')
p += bam_cigar_oplen(c[i]);
else if ( bam_cigar_opchr(c[i]) != 'H')
break;
Good catch. I agree that this is the right solution -- just pushed the commit.
Thanks!
It looks like different flavors of the same bug are found elsewhere. Similar patches are needed at lines:
612
627
656
Thanks!
-August