`iv_between()` (and friends) variant that returns a vector the size of `haystack`
Closed this issue · 3 comments
Like https://stackoverflow.com/questions/74874828/identify-intervals-where-a-given-vector-of-dates-occurs
i.e. you want to return TRUE
if a haystack[i]
interval contained any of the needles
vector. Another way to say it is: if any of the needles
were between haystack[i]
, return TRUE
.
Right now it is: if needles[i]
is between any of the haystack
intervals, return TRUE
, giving us something the size of needles
.
This seems somewhat straightforward though. Use iv_locate_between()
to get all matches, dropping unmatched x
values, and use the $haystack
location column to identify ones that surround at least one date.
We should see if this works for all cases I guess? Consider duplicates in both inputs and missing values, but I think it has promise enough that we might not need this. Maybe we can just add an example. It is better than yet another family of functions
df1<-data.frame(diveno=c(1,2,3,4,5),
start=c("2018-08-01 08:20:40","2018-08-01 08:40:50", "2018-08-01 10:01:00","2018-08-01 15:45:30","2018-08-01 17:06:00"),
fin=c("2018-08-01 08:39:20","2018-08-01 08:53:40","2018-08-01 10:16:30","2018-08-01 15:58:20", "2018-08-01 17:18:20"))
df1$start <- as.POSIXct(df1$start,format="%Y-%m-%d %H:%M:%S",tz="CET")
df1$fin <- as.POSIXct(df1$fin,format="%Y-%m-%d %H:%M:%S",tz="CET")
df2<-data.frame(date=c("2018-08-01 08:30:00", "2018-08-01 15:47:00", "2018-08-02 17:10:00"))
df2$date <- as.POSIXct(df2$date,format="%Y-%m-%d %H:%M:%S",tz="CET")
df1
#> diveno start fin
#> 1 1 2018-08-01 08:20:40 2018-08-01 08:39:20
#> 2 2 2018-08-01 08:40:50 2018-08-01 08:53:40
#> 3 3 2018-08-01 10:01:00 2018-08-01 10:16:30
#> 4 4 2018-08-01 15:45:30 2018-08-01 15:58:20
#> 5 5 2018-08-01 17:06:00 2018-08-01 17:18:20
df2
#> date
#> 1 2018-08-01 08:30:00
#> 2 2018-08-01 15:47:00
#> 3 2018-08-02 17:10:00
locs <- ivs::iv_locate_between(
needles = df2$date,
haystack = ivs::iv(df1$start, df1$fin),
no_match = "drop"
)
df1$surrounds <- FALSE
df1$surrounds[locs$haystack] <- TRUE
df1
#> diveno start fin surrounds
#> 1 1 2018-08-01 08:20:40 2018-08-01 08:39:20 TRUE
#> 2 2 2018-08-01 08:40:50 2018-08-01 08:53:40 FALSE
#> 3 3 2018-08-01 10:01:00 2018-08-01 10:16:30 FALSE
#> 4 4 2018-08-01 15:45:30 2018-08-01 15:58:20 TRUE
#> 5 5 2018-08-01 17:06:00 2018-08-01 17:18:20 FALSE
Created on 2022-12-21 with reprex v2.0.2.9000
Ideally it would look like this
library(dplyr)
library(ivs)
library(lubridate)
table <- tibble(
start = as.Date(c("2022-08-02", "2022-10-06", "2023-01-11")),
end = as.Date(c("2022-08-04", "2023-02-06", "2023-02-04"))
)
events <- c(
ymd("2022-08-07"),
ymd("2022-10-17"),
ymd("2023-01-17"),
ymd("2023-02-02")
)
table %>%
mutate(range = iv(start, end), .keep = "unused") %>%
mutate(count = iv_count_between(range, events))
Maybe the iv_*_between()
functions just require that one of the two inputs should be an iv?
Deprecate iv_between()
family in favor of two families like:
iv_within()
iv_contains()
Where iv_within(needles, haystack)
replaces iv_between()
but needles
is allowed to be a vector or an iv.
And iv_contains()
is new but allows the same thing for haystack
.
Both are nice because they are named after the type
options in iv_overlaps(type =)
Needs to use c(>=, <)
for vector needles
and c(>=, <=)
for iv needles
for iv_within()