beliveau-lab/OligoMiner

error in outputClean.py output

Closed this issue · 1 comments

I'm having an issue where chromosome names containing hyphens are not well-handled by OligoMiner. I've seen the discussion threads concerning how to specify a particular header using -H with blockParse.py, and I've been able to use that successfully, but my problem is that when I map the resulting fastq with bowtie2, it generates a sam file that is not correctly parsed by outputClean.py, and there does not seem to be a corresponding parameter in that script to handle the header correctly.

The offending line of code is line 91 in outputClean.py. Swapping the lines below seems to fix the problem for me.

stop = file_read[i].split('\t')[0].split('-')[1].strip(' ')

stop = file_read[i].split('\t')[0].split(':')[1].split('-')[1].strip()

This should solve the problem for other people who have hyphens in the chromosome names. But I think you could probably make the script even more robust by using one of the well-established SAM parsers like pysam:
https://github.com/pysam-developers/pysam

Hi, we do not have plans to alter our parsing logic to support hyphenated scaffold IDs at this time. I would suggest altering these names prior to running OligoMiner.

Changes to the scaffold parsing logic are planned for a future release.