overlaps with character arguments as seqlevels?
Opened this issue · 0 comments
I was recently in the situation where I needed to write some code that would detect whether a GRanges
or GRangesList
contained elements on a particular chromosome. Well, no problem, I'll just look at the seqnames
:
example(GenomicRanges, echo=FALSE)
as.logical(seqnames(gr) %in% "chr1")
## [1] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
So far so good. But then I realized that the same approach would not work properly for GRangesList
s:
set.seed(10)
grl <- split(gr, sample(3, length(gr), replace=TRUE))
seqnames(grl) %in% "chr1"
## RleList of length 3
## $`1`
## logical-Rle of length 1 with 1 run
## Lengths: 1
## Values : FALSE
##
## $`2`
## logical-Rle of length 2 with 2 runs
## Lengths: 1 1
## Values : FALSE TRUE
##
## $`3`
## logical-Rle of length 7 with 3 runs
## Lengths: 2 1 4
## Values : FALSE TRUE FALSE
Which breaks the GRanges*
abstraction that I was hoping to use. As such, I need to write GRanges
and GRangesList
-specific code to check whether the entries contain any intervals in my desired chromosome - not great.
However, it occurred to me that an elegant solution would be to repurpose overlapsAny()
, which always returns a logical vector. To wit, the following gives me the desired result for both objects:
chr1 <- GRanges("chr1:1-1000")
overlapsAny(gr, chr1)
## [1] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
overlapsAny(grl, chr1)
## [1] FALSE TRUE TRUE
The above is not quite perfect as it still requires us to construct chr1
, which requires knowledge of the range of entries on chromosome 1. A user-friendlier version of the above would allow us to just do:
overlapsAny(gr, "chr1")
overlapsAny(grl, "chr1")
To achieve the same effect. This would simply require new methods for GRanges(List),character
, with the understanding that all character arguments are interpreted as seqlevels
by the GenomicRanges overlap infrastructure.