vastgroup/vast-tools

Question about stranded RNA-seq data

bontus opened this issue · 5 comments

Dear all,

I was wondering how vast-tools handles stranded RNA-seq libraries, as I could not find any mentioning of its capabilities in the description and some of our current results to be in the opposite direction compared to the PCR validation.

To explain the situation a bit more, the region in question roughly corresponds to hg19 coordinates chr5:177,631,107-177,650,933 and the observed behavior could be attributed to a gene read through (HNRNPAB) located on the opposite strand directly downstream of our target event in PHYKPL.
Both genes have overlapping 3' ends and we see continued transcription past the TES / polyA-site in both cases after treatment when comparing to control cells. Thus, it could be that reads originating from these read-through / PolII run-off events are count for splicing events of the gene located on the opposite strand if they are not handled properly.

Thus my question is: does vast-tools discard reads that are not properly stranded for its calculations?

Thank you in advance and best regards

Hey,

vast-tools doesn't really distinguish between stranded and unstranded reads. It aligns to the forward and to the reverse complementary. (This could be easily changed actually). The thing is that if there was an event annotated in the opposite strand, using the exact exon-exon junctions (which I think would be fairly difficult), then both strand would cancel each other and these events will not pass the mappability filters. This could be different for intron retention, though. What we did in our paper about IR (Braunschweig, Genome Res 2014) was to remove all introns overlapping with other genes (i.e. this is a perfect case). What event is it? (if you don't mind sending me the EventID).

I will see if I can add quickly an option to map only to the forward strand (i.e. use --norc in bowtie, if I'm not mistaken).

Cheers
Manu

Hi Manu,

It's the HsaINT0005801 event which has no overlapping features on the other strand. What happens for us is that PolII seems to continue ~20kb past the TES of HNRNPAB (hence no overlapping features anymore), so if we would use unstranded RNA-seq and would not know strand orientation the reads would be logically assigned to PHYKPL. I think changing the mapping parameters of bowtie will not help here, instead it would make more sense to add a CMD line option "--stranded" similar to featureCounts to be able to tell which mate originates from the sense and which from the antisense strand and then decide per event which reads are in the correct orientation.

For reference, the featureCounts option looks like this:
-S <ff:fr:rf> Orientation of the two reads from the same pair, 'fr' by default.

Best

Yeah, I just took a quick look at the corresponding code but it can't actually be implemented either way. vast-tools pools together both PE mates so the strandness if forever lost.

I don't think vast-tools will be helpful here. The only protection it has against this artifacts is that for IR events (as is the case), it does a binomial test on the inclusion reads for each IE junction, and in this case I suspect they'll be significantly different (this is the 5th value in the Q column; it corresponds to the p-val of the binomial test. We discard introns with p < 0.05). But, again, this is a way to get rid of potential problems; it won't give you a proper quantification of both co-occurring events. Sorry!

M

Ok, I guess one way of tackling this issue in our case could be to identify problematic regions a priori based on gene distance / 3' overlap and to filter out any identified events nearby or to store them in a separate table.

Thanks for the swift response though!

Hey,

After >6 months, we have actually managed to implement this. The new v2.0.0 should recognize the type of reads and map them accordingly. Keep in mind that new VASTDB libraries are needed if you wish to try this.

Cheers
Manu