zero-based stop positions in BED files for splice junctions
Closed this issue · 2 comments
Hi.
Please check my math, but it looks like the stop positions in recount BED files for splice junctions are 0-based instead of 1-based.
Here is an example. I choose a small study at random and click on jx_bed. Here are two KnownGene junctions from this file:
chr1 1407019 1407129 ... -
chr1 1486235 1486542 ... +
In real-life (UCSC browser 1-based coordinates), the first intron goes from 1407020 to 1407130, and the second from 1486236 to 1486543. Stop positions are 1-based in BED format. I hate that too. :O)
It would be useful to include a comment in the files themselves or in the file info column to the effect that these are zero-based intronic hg38 coordinates, and assuming you don't want to re-create all the files, to call the files something other than BED.
Thanks.
Hi,
Thanks for creating an issue.
This has been listed under the known issues in the main website since we noticed it at https://jhubiostatistics.shinyapps.io/recount/ (see https://github.com/leekgroup/recount-website/blob/master/website/ui.R#L123-L124).
We don't really plan on remaking the BED files. The issue was fixed in Rail-RNA a while back. So I suspect that we won't update the BED files until the samples are re-processed (if/whenever we update recount2
). @nellore might have more to say.
I'll leave the issue open as a low priority one.
This will be fixed in recount3
, though that will involve completely new files so the ones in recount2
won't be updated/fixed.
Best,
Leo