marbl/parsnp

in <module> if hdr[0] != ">": IndexError: string index out of range error in 1.7.2 release

mf116 opened this issue · 3 comments

mf116 commented

Hi,

I am trying to create a phylogeny using around 2700 sequences, kindly find below the run.
11:31:54 - INFO - |--Parsnp 1.7.2--|

Ref *.fasta
12:25:58 - INFO -


SETTINGS:
|-refgenome: *.fasta
|-genomes:
*.fasta
*.fasta
...2674 more file(s)...
*.fasta
*.fasta
|-aligner: muscle
|-outdir: */P_2022_06_17_113154357377
|-OS: Linux
|-threads: 6


12:25:58 - INFO - <>
12:25:58 - INFO - No genbank file provided for reference annotations, skipping..
Traceback (most recent call last):
File "*/anaconda3/envs/parsnp/bin/parsnp", line 819, in
if hdr[0] != ">":
IndexError: string index out of range

this error kept on coming. please help me solve it. Thank you

Hi @mf116

Thanks for opening an issue! I think the problem here is an atypical formatting somewhere in your fasta files, although I agree that Parsnp should be able to adjust for this and I'll fix it in the next release (should be out this week). It looks like one of your fasta files is completely empty which causes parsnp to fail when trying to lookup the sequence header.

mf116 commented

hi @bkille,

thank you for the reply. yeah we did figure this out after. but now we are having another issue:
our pc is 32 cores, 200GB ram, 4TB from which 1.3 TB are available as storage. same number of strains but we are having a different error, kindly find below the log:
07:27:15 - INFO - |--Parsnp 1.7.2--|

Ref *.fasta
09:04:02 - INFO -


SETTINGS:
|-refgenome: *.fasta
|-genomes:
*.fasta
*.fasta
...2674 more file(s)...
*.fasta
*.fasta
|-aligner: muscle
|-outdir: */P_2022_06_22_072715418693
|-OS: Linux
|-threads: 12


09:04:02 - INFO - <>
09:04:02 - INFO - No genbank file provided for reference annotations, skipping..
09:04:21 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
Traceback (most recent call last):
File "/miniconda3/envs/parsnp/bin/parsnp", line 1230, in
if header[0] != ">":
IndexError: string index out of range

we tried creating phylogeny on 700 random from the same isolates used in the log above and it worked. the issue is showing with the bigger number of isolates.

Hi @mf116,

This should be fixed now. Thanks for opening an issue and please let me know if it persists!