questions about how to get genes from the output
Opened this issue · 4 comments
Please report
- version of RNA-Bloom with
java -jar RNA-Bloom.jar -version
RNA-Bloom v2.0.1 - version of java with
java -version
openjdk version "18.0.1" 2022-04-19 - exact command used to run RNA-Bloom
rnabloom -long ${FILE} -t 48 -outdir ${NAME}
Hi Ka Ming,
I'm using RNA-bloom2 to assemble long-read cDNA RNA-seq data. I have a question about the output. I can see the transcripts.fa
files have the sequences for each transcripts, but how can I know which transcripts are from the same gene?
I don't see that information contained in the header. Some example headers are shown here:
>rb_90719 l=1982 c=0.25546062 path=[94775+,95098+]
>rb_90720 l=407 c=0.21744472 s=103012
Also, I'm not sure why some header show s
while others show path
, any difference?
Thank you so much if you could help to explain it.
Cheers,
Alex
There is no inference about genes.
path
indicates that it was assembled from the list of sequences from the previous step of the assembly.
s
indicates that it originate from a single sequence.
Thank you so much for your reply. Are there any suggestions on how to infer genes from RNA-bloom2 output from your experience?
Cheers,
Alex
You can possibly try this:
http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/sra2genes4v_testdrive/
If you are interested in a crude gene groupings of assembled transcripts, I can make it a feature request (but very low priority).
Thank you so much. Would definitely like to have this feature in the future.