r2dt-bio/R2DT

r2dt/traveller segfault for some sequences/models

afg1 opened this issue · 3 comments

afg1 commented

I've noticed some segfaults when running our pipeline which result in errors later when r2dt tries to find an svg that wasn't created.

I'm running a singularity container (converted from the dockerhub container) within nextflow, and the command is

#!/bin/bash -ue
esl-sfetch --index test.fasta
r2dt.py draw test.fasta output/

Which gives:
stdout

Creating SSI index for test.fasta...    done.
Indexed 1 sequences (1 names).
SSI index written to file test.fasta.ssi
Creating SSI index for output/subset.fasta...    done.
Indexed 1 sequences (1 names).
SSI index written to file output/subset.fasta.ssi
# R2DT :: visualise RNA secondary structure using templates
# Version 1.2 (August 10, 2021)
# https://github.com/RNAcentral/R2DT
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Analysing 1 sequences with Rfam
Repeating using Traveler mapping
traveler --verbose --target-structure output/rfam/URS00025E7400.fasta --template-structure --file-format traveler /rna/r2dt/data/rfam/RF01577/traveler-template.xml /rna/r2dt/data/rfam/RF01577/RF01577-traveler.fasta --all output/rfam/URS00025E7400-RF01577 > output/rfam/URS00025E7400-RF01577.log

stderr

Error: HMM banded truncated alignment mxes need 2668.75 Mb > 1028.00 Mb limit.
Use --mxsize, --maxtau or --tau.
Segmentation fault (core dumped)

test.fasta contains a single sequence, stored as DNA, which you can find here: https://test.rnacentral.org/rna/URS00025E7400. The traveller log is attached here: URS00025E7400-RF01577.log

Looking at the core that was dumped (can provide if useful), the segfault happens in traveler

Let me know if there is any other information I can provide to be useful

Looks like that is a chicken lncRNA sequence being aligned to a plasmodium RNase P model, so the resulting structure wouldn't be biologically meaningful.

@davidhoksza : The segfault is probably due to the cmalign error (starting "HMM banded truncated alignment..."). I'm guessing cmalign is probably being called with the --mxsize 1028 option from within traveler. I think using --mxsize 4096 would allow cmalign to run successfully, but again the structure won't be biologically meaningful.

@afg1 , @blakesweeney : What happens if an RNAcentral sequence has no match in ribovore? This sequence should not match the RF01577 model, but ribovore is saying it does (a false positive). If traveler fails due to this cmalign error, can R2DT treat it like a non-match? (I understand a segfault is in general bad, but treating it like a non-match could be a way of handling this and avoiding providing incorrect structures for false positive ribovore matches like this one.)

R2DT can treat this as it have not matches if ribovore does not match. Since it is checking Rfam families it is probably pretty far down the possible attempts at finding a match and after not matching here it can just say the sequence does not match a known family.

Okay, so sounds like another option for traveler might be to detect if cmalign fails with this error and exit cleanly, but I'm not sure how hard that is to implement. Then in R2DT, it would have to detect when traveler exits in this way, and report no match.