Dealing with Polycistronic transcripts
Opened this issue · 1 comments
Hello!
Recently I've had to sequence and analyse a RNA-Seq set from T. cruzi RNA. For that, I used Salmon in the alignment-independent mode (aligning to a reference transcriptome).
Typical issues aside, I read afterwards that this organism has polycistronic mRNA: The genomic sequence is transcribed into long pre-mRNAs with more than one transcript before being chopped and translated by a specific mechanism.
Considering I might have some of these in my dataset, how does Salmon deal with them?
Say you have multiple matches for a single read (We used Nanopore sequencing). Is the rest of the read ignored? Is it all mapped and classified? How would I go about dealing with this kinds of reads?
I also work on organisms within that group (Trypanosoma and Leishmania species), and I can confirm that when using RNAseq with these organisms, it's not an issue, because what you get is the same as for any other conventional organism, i.e. UTRs + CDS, so their polycistronic transcription should not be an issue.
I hope this helps!