r2dt/traveller segfault for some sequences/models
afg1 opened this issue · 3 comments
I've noticed some segfaults when running our pipeline which result in errors later when r2dt tries to find an svg that wasn't created.
I'm running a singularity container (converted from the dockerhub container) within nextflow, and the command is
#!/bin/bash -ue
esl-sfetch --index test.fasta
r2dt.py draw test.fasta output/
Which gives:
stdout
Creating SSI index for test.fasta... done.
Indexed 1 sequences (1 names).
SSI index written to file test.fasta.ssi
Creating SSI index for output/subset.fasta... done.
Indexed 1 sequences (1 names).
SSI index written to file output/subset.fasta.ssi
# R2DT :: visualise RNA secondary structure using templates
# Version 1.2 (August 10, 2021)
# https://github.com/RNAcentral/R2DT
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Analysing 1 sequences with Rfam
Repeating using Traveler mapping
traveler --verbose --target-structure output/rfam/URS00025E7400.fasta --template-structure --file-format traveler /rna/r2dt/data/rfam/RF01577/traveler-template.xml /rna/r2dt/data/rfam/RF01577/RF01577-traveler.fasta --all output/rfam/URS00025E7400-RF01577 > output/rfam/URS00025E7400-RF01577.log
stderr
Error: HMM banded truncated alignment mxes need 2668.75 Mb > 1028.00 Mb limit.
Use --mxsize, --maxtau or --tau.
Segmentation fault (core dumped)
test.fasta
contains a single sequence, stored as DNA, which you can find here: https://test.rnacentral.org/rna/URS00025E7400. The traveller log is attached here: URS00025E7400-RF01577.log
Looking at the core that was dumped (can provide if useful), the segfault happens in traveler
Let me know if there is any other information I can provide to be useful
Looks like that is a chicken lncRNA sequence being aligned to a plasmodium RNase P model, so the resulting structure wouldn't be biologically meaningful.
@davidhoksza : The segfault is probably due to the cmalign error (starting "HMM banded truncated alignment..."). I'm guessing cmalign is probably being called with the --mxsize 1028
option from within traveler. I think using --mxsize 4096
would allow cmalign to run successfully, but again the structure won't be biologically meaningful.
@afg1 , @blakesweeney : What happens if an RNAcentral sequence has no match in ribovore? This sequence should not match the RF01577 model, but ribovore is saying it does (a false positive). If traveler fails due to this cmalign error, can R2DT treat it like a non-match? (I understand a segfault is in general bad, but treating it like a non-match could be a way of handling this and avoiding providing incorrect structures for false positive ribovore matches like this one.)
R2DT can treat this as it have not matches if ribovore does not match. Since it is checking Rfam families it is probably pretty far down the possible attempts at finding a match and after not matching here it can just say the sequence does not match a known family.
Okay, so sounds like another option for traveler might be to detect if cmalign fails with this error and exit cleanly, but I'm not sure how hard that is to implement. Then in R2DT, it would have to detect when traveler exits in this way, and report no match.