SushiLab/mVIRs

Issue with 'read_seq_file' function.

sgsutcliffe opened this issue · 3 comments

I'm using unziped fastq files for my read sequence files. I get this error.
File "mVIRs/mVIRs/mvirs.py", line 515, in read_seq_file if lines[0].startswith('@'): IndexError: list index out of range

I think it is because the function only works on gzipped files due to function:
def read_seq_file(seq_file): lines = [] if seq_file.endswith('gz'): with gzip.open(seq_file, 'rt') as handle: for line in handle: lines.append(line.strip()) if len(lines) == 1000: break modulo = 2 if lines[0].startswith('@'): modulo = 4 seq_headers = [] for cnt, line in enumerate(lines): if cnt % modulo == 0: seq_headers.append(line.split()[0]) return seq_headers

I gzipped the read and no longer had the issue.

Hi There

You're right. This was "One bug fix creates another bug" :). The tool would work also with uncompressed files. But the check function requires a compressed input. It will be fixed in the next version. Actually there is already a second version in the clipped_reads branch which will use the clipped alignment information to extract potential phage sequences. It is WIP but give it a try if you like.

Best,
hans

This issue should be fixed in af46762

Best,
Hans