gtonkinhill/panaroo

Broken error handling in multithread_codonalign_build

Opened this issue · 1 comments

For some reason there was an IndexError when running the method below, but since the variable codon_alignment was never initialized there was a UnboundLocalError raised as well claiming: local variable 'codon_alignment' referenced before assignment

I am not sure why there was an index error, but can this method be updated so that it gracefully continues? I will have to run the code with the problem inputs in the debugger to investigate the index error.

I am using panaroo version 1.5.

def multithread_codonalign_build(dna, protein, name):
try:
codon_alignment = codonalign.build(dna, protein)
except RuntimeError as e:
print(e)
print(name)
print(dna)
print(protein)
except IndexError as e:
print(e)
print(name)
print(dna)
print(protein)
return(name, codon_alignment)

Hello, I investigated the cause of the index error by passing the problem aligned protein FASTA and unaligned dna FASTA to Bio.codonalign.build and I got the stack trace below. It appears that there are some mistranslation issues in addition to a mismatch between refound dna and protein sequences. I think that until this get solved I will turn off codon alignments.

/home/user/miniconda3/envs/panaroo/lib/python3.9/site-packages/Bio/codonalign/__init__.py:627: BiopythonWarning: GENOME_ID_1;1243_20_0(M 0) does not correspond to GENOME_ID_1;1243_20_0(GTG)
  warnings.warn(
/home/user/miniconda3/envs/panaroo/lib/python3.9/site-packages/Bio/codonalign/__init__.py:627: BiopythonWarning: GENOME_ID_2;1490_12_0(M 0) does not correspond to GENOME_ID_2;1490_12_0(GTG)
  warnings.warn(
/home/user/miniconda3/envs/panaroo/lib/python3.9/site-packages/Bio/codonalign/__init__.py:627: BiopythonWarning: GENOME_ID_3;1609_19_51(M 0) does not correspond to GENOME_ID_3;1609_19_51(TTG)
  warnings.warn(
/home/user/miniconda3/envs/panaroo/lib/python3.9/site-packages/Bio/codonalign/__init__.py:382: BiopythonWarning: middle frameshift detection failed for GENOME_ID_4;101_refound_2380
  warnings.warn(
Traceback (most recent call last):
  File "/local/home/user/.pycharm_helpers/pydev/pydevd.py", line 1551, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/local/home/user/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/tmp/pycharm_project_499/panaroo/bug.py", line 30, in <module>
    codon_alignment = codonalign.build(protein, dna)# load the final pangenome graph
  File "/home/user/miniconda3/envs/panaroo/lib/python3.9/site-packages/Bio/codonalign/__init__.py", line 169, in build
    corr_span = _check_corr(
  File "/home/user/miniconda3/envs/panaroo/lib/python3.9/site-packages/Bio/codonalign/__init__.py", line 435, in _check_corr
    raise RuntimeError(
RuntimeError: Protein SeqRecord (GENOME_ID_4;101_refound_2380) and Nucleotide SeqRecord (GENOME_ID_4;101_refound_2380) do not match!