fteufel/signalp-6.0

Numpy error with 6.0b for some sequences

Closed this issue · 2 comments

Hi there!

Thanks for your great work.
I was testing out the update and came along an issue while running on the signalp5 benchmark set during the marginal conflict resolution step.

This sequence:

>A0R1E8|POSITIVE|LIPO|2
MTQNCVAPVAIIGMACRLPGAINSPQQLWEALLRGDDFVTEIPTGRWDAEEYYDPEPGVPGRSVSKWGAF

from https://services.healthtech.dtu.dk/services/SignalP-6.0/public_data/benchmark_set_sp5.fasta
appears to be the issue.

Running version 6.0b in "fast" mode with this sequence in both other and eukaryote organisms causes the following error.

$ signalp6 --output_dir test --format txt --organism euk --mode fast --fastafile test.fasta

/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/torch/nn/modules/module.py:1051: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at  /tmp/pip-req-build-1ky46svp/aten/src/ATen/native/TensorCompare.cpp:255.)
  return forward_call(*input, **kwargs)
Predicting: 100%|| 1/1 [00:00<00:00,  1.53batch/s]
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/sp6/bin/signalp6", line 8, in <module>
    sys.exit(predict())
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/__init__.py", line 6, in predict
    main()
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/predict.py", line 235, in main
    resolve_viterbi_marginal_conflicts(global_probs, marginal_probs, cleavage_sites, viterbi_paths)
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/utils.py", line 254, in resolve_viterbi_marginal_conflicts
    cleavage_sites[i] = sp_idx.max() +1
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/numpy/core/_methods.py", line 39, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

This doesn't appear to be an issue with the previous version available for download.
Both have identical main dependency versions:

python 3.6.13
numpy 1.19.5
pytorch 1.9.1
tqdm 4.62.3

Thanks in advance,
Darcy

Hi Darcy, thanks a lot for raising this!

Turns out there was an issue in the conflict resolving function when processing Sec/SPII and Tat/SPII lipoproteins. I added logic to handle those as a separate case, using the predicted modified cysteine after the cleavage site to impute it when it's missing.

output_A0R1E8_POSITIVE_LIPO_2_plot

The online version is patched, I'll close the issue once the updated downloads go live.

Cool, thanks!