betteridiot/bamnostic

Read attributes/parameters do not agree with what pysam would show and with what one would expect

Closed this issue · 2 comments

Describe the bug
Sorry for having this in the comment only and apparently not in the main textbox, apparently due to some technical issue.
I would expect bamnostic to deliver read parameters that are, well, nearly the same
as the ones that pysam provides, maybe except for when there is softclipping involved.
Using the provided example.bam, the read with
Queryn : EAS56_57:6:190:289:82
(which is the second read, namely the first mapped one)
has the following parameters when using bamnostics:

query length : 35
query_alignment_start: 99
Query_alignment_leng : 35
query_alignment_end : 134
query alignment seq : AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC
query alignment len : 35
Unmapped? : False
Refnam : chr1
RefStart : 99
RefEnd : 134
Queryseq : AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC
len(Queryseq) : 35
CIGAR : 35M

As the cigar is 35M, there are no clips. Queryseq and alignmentseq are identical.
So I would expect to have query_alignment start to be 0, and query_alignment_end to be 35,
as those are start and end position in the query that are mapped.
In the reference sequence (which we don't see), apparently, that sequence is from RefStart to RefEnd, i.e., 99 to 134 (the latter exlclusive). Having query_alignment_start and query_alignment_end having those values doesn't make sense (at least to me).
Actually, using pysam on Linux shows the parameters of that read
as follows which agrees with what I would expect to see.

Queryn : EAS56_57:6:190:289:82
query length : 35
query_alignment_start: 0
Query_alignment_leng : 35
query_alignment_end : 35
query alignment seq : AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC
query alignment len : 35
Unmapped? : False
Refnam : chr1
RefStart : 99
RefEnd : 134
Refseq : AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC
Queryseq : AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC
len(Queryseq) : 35
CIGAR : 35M

To Reproduce
Make a for loop over the reads of the example.bam, and stop after the second read.
For each read, show the attributes of the read similar to the following:

print "Queryn               :  " + read.query_name
print "query length         :  " + str(read.query_length)
print "query_alignment_start:  " + str(read.query_alignment_start)
print "Query_alignment_leng :  " + str(read.query_alignment_length)
print "query_alignment_end  :  " + str(read.query_alignment_end)
print "query alignment seq  :  " + str(read.query_alignment_sequence)
try:
    print "query alignment len  :  " + str(len(read.query_alignment_sequence))
except Exception as ex:
    pass
print "Unmapped?            :  " + str(read.is_unmapped)
print "RefStart             :  " + str(read.reference_start)
print "RefEnd               :  " + str(read.reference_end)

Expected behavior
I'd expect to see the parameters 0 and 35 for the query_alignment_start and end, and not 99 and 134.

Desktop (please complete the following information):

  • OS: Windows 7 and Linux Ubuntu 14.04, Linux with pysam, Win with bamnostics
  • Python Version: 2.7
  • bamnostic Version: 1.0.11

You are correct. I should be able to get to this sometime next week. Sorry for any inconvenience.

This is fixed now. Thankfully it only required a changing a single line of code. Thank you for bringing this to my attention.