iprada/Circle-Map

ReadExtractor fails if the first read in bam file is an unpaired read2

Opened this issue · 1 comments

A small subset of samples were failing with:

circlemap/extract_circle_SV_reads.py", line 123, in extract_sv_circleReads
    if read.is_read2 and read.qname == read1.qname:
AttributeError: 'str' object has no attribute 'qname'

The loop structure for iterating over the bam file in both utils.py: insert_size_dist() and extract_circle_SV_reads.py: extract_sv_circleReads() assumes paired reads, alternating read1-read2-read1-read2.

The case of an unpaired read is handled correctly, as long as any read1 has already been encountered. The first read1 overwrites the initialization string read1 = '' to a pysam alignment.

However, if a read2 is encountered before any read1, then read1 is still an empty string with no attribute qname.

My fix was to edit utils.py and extract_circle_SV_reads.py such that:

        if read.is_read1:
            read1 = read

        # Checks for initialization read1 = '' by looking for read1 type == string.
        # read1 type == pysam.Alignment after first read.is_read1 is encountered
        # Condition should only be met if unpaired read2 begins BAM file
        elif isinstance(read1, str):
            pass

        else:
            if read.is_read2 and read.qname == read1.qname:
                ...

An alternative fix that would probably run faster would be to initialize read1 as type pysam.alignment with read1.qname = "fakename". This would remove the elif statement from each loop iteration, but I don't know which pysam function could be used to generate a fake read / qname like that.

If I find time, I'll try to roll my collection of small fixes into a PR.

Cheers,
Michael

Hey,
I received the same error with 6 out of 9 samples.
Is there another workaround when using the Circle-Map terminal version? I included Circle-Map into a nextflow pipeline and don't have access to the raw scripts.

Best,
Daniel