sanger-pathogens/seroba

TypeError: argument of type 'NoneType' is not iterable

HarryHung opened this issue · 5 comments

On a sample, SeroBA encounters a fatal Python error

cluster detected 1 threads available to it
cluster reported completion
cluster_3 detected 1 threads available to it
cluster_3 reported completion
cluster_4 detected 1 threads available to it
cluster_4 reported completion
cluster_6 detected 1 threads available to it
cluster_6 reported completion

0.013121071707115657
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/11F/11F intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.03264454465908326
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/06C/06C intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.034005116366132154
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/10A/10A intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.018366189193022266
15C
{'15A': 0, '15B': 0, '15C': 0, '15F': 16}
15A
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15B
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15C
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15C
15F
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
{'15A': -1, '15B': 0, '15C': -2.5, '15F': 15}
{'15A': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}, '15B': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}, '15C': {'genes': [], 'pseudo': ['wciZ'], 'allele': [], 'snps': []}, '15F': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}}
['15A', '15C']
15A
15B/15C
15C
None
Traceback (most recent call last):
  File "/usr/local/bin/seroba", line 4, in <module>
    __import__('pkg_resources').run_script('seroba==1.0.2', 'seroba')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1445, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/EGG-INFO/scripts/seroba", line 86, in <module>
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/tasks/sero_run.py", line 19, in run
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 481, in run
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 453, in _prediction
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 397, in _find_serotype
TypeError: argument of type 'NoneType' is not iterable

The relevant part in the code is

seroba/seroba/serotyping.py

Lines 392 to 397 in 8138dc8

if mixed_serotype != None:
for key in min_keys:
print(key)
print(mixed_serotype)
if key not in mixed_serotype:
mixed_serotype = None

I think this piece of code has a logic flaw.

While iterating through min_keys in line 393:

  • when if key not in mixed_serotype at line 396 is true, mixed_serotype is therefore set to None.
  • In the next loop, if key not in mixed_serotype at line 396 is effectively turns into if key not in None and leads to the Python error TypeError: argument of type 'NoneType' is not iterable

Fixing this seems to be trivial, but I am not sure which one is the right approach:

  • looping through all min_keys, only when all keys are not in mixed_serotype, then mixed_serotype should be set to None
  • looping through all min_keys, when any key is not in mixed_serotype, mixed_serotype should be set to None and exit the loop

Only when all in min_keys are not in mixed_serotype, then mixed_serotype should be set to None:

if mixed_serotype != None:
    for key in min_keys:
        print(key)
        print(mixed_serotype)
        if key in mixed_serotype:
            break
    else:
        mixed_serotype = None

When any in min_keys is not in mixed_serotype, mixed_serotype should be set to None

if mixed_serotype != None:
    for key in min_keys:
        print(key)
        print(mixed_serotype)
        if key not in mixed_serotype:
           mixed_serotype = None
           break
flass commented

hi Harry,
thanks for pointing this out, it is indeed a flaw in the logic and for proposing fixes.
in my view, mixed_serotype should be just an empty string when not populated, but I think we should keep it as a None to avoid introducing new bugs.
I think of your two proposed fixes, the first one makes more sense as in the second one, a serotype not being mentioned in the mixed_serotype string generated by the previous _detect_mixed_samples() call will lead to erase the rest of the information, which sounds wrong, even though I'm not too sure exactly what is the rationale behind this bit of code.

If you (and others in the team) agree, we can implement the fix.

Best wishes,

Florent

Agreed with your view that mixed_serotype should never be None to begin with, but also agreed that better keep it as-is at this point.

I also think the first fix makes more sense at a glance, but I got to be honest that I have not deep dived into the codes to have a complete understanding.

Thank you!

flass commented

n worries. We'll review this and introduce a fix. if you do't mind then, we'll ask you to test the fixed branch. is that OK?

Sure, thanks for your help!