DavisVaughan/ivs

requirement for iv subclass (non rcrd type)

Closed this issue · 3 comments

I'm wanting to build an age based interval vector and it makes sense to do this on top of {ivs}. I need this to work with data.table though so I'm considering building on top of a character vector and dealing with the coercion via iv_proxy.

A rough sketch of the implementation is given below but my question is whether I'm likely to run in to any unforeseen problems due to the class not being a strict subclass (i.e. the class below is age_iv not "age_iv" "iv" "vctrs_rcrd" "vctrs_vctr") or should everything (save printing) be handled due to iv_proxy and any issues arising I could consider a bug?

Hope this makes sense.

library(ivs)

new_age_iv <- function(start, end,...) {
    out <- sprintf("[%d, %d)", start, end)
    structure(out, class = "age_iv")
}

iv_proxy.age_iv <- function(x, ...) {
    start <- as.integer(sub("\\[([0-9]+), [0-9]+)","\\1",x))
    end <- as.integer(sub("\\[[0-9]+, ([0-9]+))","\\1",x))
    new_iv(start, end)
}

format.age_iv <- function(x, ...) format(iv_proxy.age_iv(x))

iv_restore.age_iv <- function(x, to, ...) new_age_iv(iv_start(x), iv_end(x))

# seems to work
dat <- new_age_iv(c(1, 7, 5), c(2, 8, 9))
iv_groups(dat)
#> [1] "[1, 2)" "[5, 9)"
#> attr(,"class")
#> [1] "age_iv"
iv_complement(dat)
#> [1] "[2, 5)"
#> attr(,"class")
#> [1] "age_iv"

You are allowed to do this, and everything should work. I do test for this case through this helper https://github.com/DavisVaughan/ivs/blob/main/R/helper.R

But it will be very slow, I wonder if it would work better for you to keep your start/end vectors in separate columns, and just write a few wrappers around ivs that returns what you need for data table. Like:

library(data.table)
library(ivs)

my_groups <- function(start, end) {
  x <- iv(start, end)
  groups <- iv_groups(x)
  list(start = iv_start(groups), end = iv_end(groups))
}

dt <- data.table(start = c(1, 7, 5), end = c(2, 8, 9))

dt[, my_groups(start, end)]
#>    start end
#> 1:     1   2
#> 2:     5   9

Created on 2022-09-13 with reprex v2.0.2

Using a complex type as a base object is probably a bit faster, but Re() and Im() still allocate so it isn't free.

This also lets you get the typical vctrs subclass benefits

Not fully tested or thought through though, so use with care :)

library(data.table)
library(ivs)
library(vctrs)

new_age_iv <- function(start, end,...) {
  out <- complex(real = start, imaginary = end)
  new_vctr(out, class = "age_iv", inherit_base_type = FALSE)
}

iv_proxy.age_iv <- function(x, ...) {
  x <- unclass(x)
  start <- Re(x)
  end <- Im(x)
  new_iv(start, end)
}

format.age_iv <- function(x, ...) {
  format(iv_proxy.age_iv(x))
}

iv_restore.age_iv <- function(x, to, ...) {
  new_age_iv(iv_start(x), iv_end(x))
}

dat <- new_age_iv(c(1, 7, 5), c(2, 8, 9))
dat
#> <age_iv[3]>
#> [1] [1, 2) [7, 8) [5, 9)

dt <- data.table(dat = dat)
dt
#>       dat
#> 1: [1, 2)
#> 2: [7, 8)
#> 3: [5, 9)

dt[, .(groups = iv_groups(dat))]
#>    groups
#> 1: [1, 2)
#> 2: [5, 9)

Created on 2022-09-13 with reprex v2.0.2

Oh cool. I'd been trying to think of a way to avoid the character parsing overhead to no avail. I'll try that out. Using complex should also help with date intervals too - cheers!