questions about how to get genes from the output

Question

questions about how to get genes from the output

Opened this issue a year ago · 4 comments

Please report

version of RNA-Bloom with java -jar RNA-Bloom.jar -version
RNA-Bloom v2.0.1
version of java with java -version
openjdk version "18.0.1" 2022-04-19
exact command used to run RNA-Bloom
rnabloom -long ${FILE} -t 48 -outdir ${NAME}

Hi Ka Ming,

I'm using RNA-bloom2 to assemble long-read cDNA RNA-seq data. I have a question about the output. I can see the transcripts.fa files have the sequences for each transcripts, but how can I know which transcripts are from the same gene?
I don't see that information contained in the header. Some example headers are shown here:

>rb_90719 l=1982 c=0.25546062 path=[94775+,95098+]
>rb_90720 l=407 c=0.21744472 s=103012

Also, I'm not sure why some header show s while others show path, any difference?

Thank you so much if you could help to explain it.

Cheers,
Alex

Answer 1 · 2023-08-03T05:34:34.000Z

There is no inference about genes.

path indicates that it was assembled from the list of sequences from the previous step of the assembly.
s indicates that it originate from a single sequence.

Answer 2 · 2023-08-06T11:51:22.000Z

Thank you so much for your reply. Are there any suggestions on how to infer genes from RNA-bloom2 output from your experience?

Cheers,
Alex

Answer 3 · 2023-08-09T02:21:55.000Z

You can possibly try this:
http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/sra2genes4v_testdrive/

If you are interested in a crude gene groupings of assembled transcripts, I can make it a feature request (but very low priority).

Answer 4 · 2023-08-09T05:51:08.000Z

Thank you so much. Would definitely like to have this feature in the future.