arq5x/bedtools

bedSort fails for 0 length features

Opened this issue · 1 comments

bedSort outputs the following for the SNPs dataset from UCSC

...
chr22	17586594	17586595	rs34484815	0	+
chr22	17586605	17586605	rs536619616	0	+
chr22	17586604	17586605	rs560126106	0	+
...

I guess the problem are 0 length features which do not make sense. But bedtools should still output sorted data.

The note from UCSC on the validity of 0 length SNPs:

We consider point
insertions into the genome to be zero length features. You can see the
SNP in question in the following Genome Browser view:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=chmalee&hgS_otherUserSessionName=hg19_chr22PointInsertion

where the highlighted SNP indicates a G or GG insertion between bases
17586605 and 17586606 on chromosome 22. Because we internally store
our coordinates as zero-based half open coordinates, these point
insertions end up as zero length coordinates. For more information on
our coordinate system please see the following blog post:
http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/