r2dt-bio/R2DT

R2DT fails with complex header lines

blakesweeney opened this issue · 0 comments

It can't handle some complex header lines like:

>DB|TEXT|MORE Some nice name

It seems that having several | characters causes esl-sfetch to fail with:

Creating SSI index for /rna/r2dt/temp/sequence.fasta...    done.
Indexed 1 sequences (1 names).
SSI index written to file /rna/r2dt/temp/sequence.fasta.ssi
sh: 1: TEXT: not found
sh: 1: MORE: not found
seq DB not found in SSI index for file /rna/r2dt/temp/sequence.fasta

# R2DT :: visualise RNA secondary structure using templates
# Version 1.2 (August 10, 2021)
# https://github.com/RNAcentral/R2DT
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# R2DT :: visualise RNA secondary structure using templates
# Version 1.2 (August 10, 2021)
# https://github.com/RNAcentral/R2DT
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Visualising sequence DB|TEXT|MORE using the HS_LSU_3D model from ribovision_lsu
Traceback (most recent call last):
  File "/rna/r2dt/r2dt.py", line 583, in <module>
    cli()
  File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/rna/r2dt/r2dt.py", line 163, in draw
    force_draw(force_template, fasta_input, output_folder, seq_id)
  File "/rna/r2dt/r2dt.py", line 517, in force_draw
    ribovision.visualise('lsu', fasta_input, output, seq_id, model_id)
  File "/rna/r2dt/utils/ribovision.py", line 51, in visualise
    raise ValueError("Failed esl-sfetch for: %s" % rnacentral_id)
ValueError: Failed esl-sfetch for: DB|TEXT|MORE

A complete fix of the issue is to transform all fasta headers to something that will always work (1, 2, 3) and then run the steps on this transformed file. Before creating the final results for the user all ids should be replaced with the original ones.