walaj/SeqLib

AlignmentPosition() is wrong with hard clips

Ahhgust opened this issue · 3 comments

There's a bug in the AlignmentPosition code in the BamRecord object (code below).
Hard clipping doesn't impact the start position in the read (removing the other half of the or statement should make the code correct)-- those bases are just missing from the reported read.

e.g., I have a read that looks like:
8:32868238-32874773_1151_1695_2:0:0_3:0:0_3cb 2048 1 1151 0 90H60M 0 -1 0 ACAGCTTAAAACTCAAAGGACTTGGCAGTGGTTTATATCCCTCTAGAGGAGCCTGTTCTA

For which the bamrecord AlignmentPosition() and AlignmentPositionEnd() is 90 and 60, respectively.

/** Get the start of the alignment on the read, by removing soft-clips
/
inline int32_t AlignmentPosition() const {
uint32_t
c = bam_get_cigar(b);
int32_t p = 0;
for (size_t i = 0; i < b->core.n_cigar; ++i) {
// Remove the code after the ||
if ( (bam_cigar_opchr(c[i]) == 'S') || (bam_cigar_opchr(c[i]) == 'H'))
p += bam_cigar_oplen(c[i]);
else // not a clip, so stop counting
break;
}
return p;
}

On second thought, it's (theoretically) possible to have both hard and soft clipping on the same end of the read. I don't know why you'd do that, but perhaps:
if ( (bam_cigar_opchr(c[i]) == 'S') || (bam_cigar_opchr(c[i]) == 'H'))
p += bam_cigar_oplen(c[i]);

should be

if ( (bam_cigar_opchr(c[i]) == 'S')
p += bam_cigar_oplen(c[i]);
else if ( bam_cigar_opchr(c[i]) != 'H')
break;

walaj commented

Good catch. I agree that this is the right solution -- just pushed the commit.

Thanks!
It looks like different flavors of the same bug are found elsewhere. Similar patches are needed at lines:
612
627
656
Thanks!
-August