ReadExtractor fails if the first read in bam file is an unpaired read2
Opened this issue · 1 comments
A small subset of samples were failing with:
circlemap/extract_circle_SV_reads.py", line 123, in extract_sv_circleReads
if read.is_read2 and read.qname == read1.qname:
AttributeError: 'str' object has no attribute 'qname'
The loop structure for iterating over the bam file in both utils.py: insert_size_dist()
and extract_circle_SV_reads.py: extract_sv_circleReads()
assumes paired reads, alternating read1-read2-read1-read2.
The case of an unpaired read is handled correctly, as long as any read1 has already been encountered. The first read1 overwrites the initialization string read1 = ''
to a pysam alignment.
However, if a read2 is encountered before any read1, then read1 is still an empty string with no attribute qname
.
My fix was to edit utils.py
and extract_circle_SV_reads.py
such that:
if read.is_read1:
read1 = read
# Checks for initialization read1 = '' by looking for read1 type == string.
# read1 type == pysam.Alignment after first read.is_read1 is encountered
# Condition should only be met if unpaired read2 begins BAM file
elif isinstance(read1, str):
pass
else:
if read.is_read2 and read.qname == read1.qname:
...
An alternative fix that would probably run faster would be to initialize read1 as type pysam.alignment with read1.qname = "fakename". This would remove the elif statement from each loop iteration, but I don't know which pysam function could be used to generate a fake read / qname like that.
If I find time, I'll try to roll my collection of small fixes into a PR.
Cheers,
Michael
Hey,
I received the same error with 6 out of 9 samples.
Is there another workaround when using the Circle-Map terminal version? I included Circle-Map into a nextflow pipeline and don't have access to the raw scripts.
Best,
Daniel