bcgsc/RNA-Bloom

questions about how to get genes from the output

Opened this issue · 4 comments

Please report

  • version of RNA-Bloom with java -jar RNA-Bloom.jar -version
    RNA-Bloom v2.0.1
  • version of java with java -version
    openjdk version "18.0.1" 2022-04-19
  • exact command used to run RNA-Bloom
    rnabloom -long ${FILE} -t 48 -outdir ${NAME}

Hi Ka Ming,

I'm using RNA-bloom2 to assemble long-read cDNA RNA-seq data. I have a question about the output. I can see the transcripts.fa files have the sequences for each transcripts, but how can I know which transcripts are from the same gene?
I don't see that information contained in the header. Some example headers are shown here:

>rb_90719 l=1982 c=0.25546062 path=[94775+,95098+]
>rb_90720 l=407 c=0.21744472 s=103012

Also, I'm not sure why some header show s while others show path, any difference?

Thank you so much if you could help to explain it.

Cheers,
Alex

kmnip commented

There is no inference about genes.

path indicates that it was assembled from the list of sequences from the previous step of the assembly.
s indicates that it originate from a single sequence.

Thank you so much for your reply. Are there any suggestions on how to infer genes from RNA-bloom2 output from your experience?

Cheers,
Alex

kmnip commented

You can possibly try this:
http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/sra2genes4v_testdrive/

If you are interested in a crude gene groupings of assembled transcripts, I can make it a feature request (but very low priority).

Thank you so much. Would definitely like to have this feature in the future.