Any possibility of being less strict with sequence methods?

Question

Any possibility of being less strict with sequence methods?

TimTaylor opened this issue a year ago · 2 comments

I'm aware you've made some very deliberate design decisions in regards to the calendar classes, and what can be done with them, but would you consider being a little more flexible in some of the methods you provide? The motivation for this is to avoid downstream packages providing multiple, similar methods to work around some of the strictness.

As a concrete example, consider <iso_year_week_day> objects of varying precision. Quite often you would want something akin to calendar_spanning_seq that corresponds to valid timepoints. At present, due to the strictness of the class, I could envisage multiple implementations popping up similar to:

library(clock)

days <- iso_year_week_day(2021:2022, c(1L, 52L), 1L)
wks  <- calendar_narrow(days, "week")
yrs <- calendar_narrow(days, "year")

my_calendar_spanning_seq.clock_iso_year_week_day <- function(x) {
    
    df <- function(x) {
        out <- as_naive_time(x)
        out <- seq(min(out), max(out), by = 1L)
        as_iso_year_week_day(out)
    }
    
    wf <- function(x) {
        out <- calendar_widen(x, "day")
        out <- as_naive_time(out)
        out <- seq(min(out), max(out), by = 7L)
        out <- as_iso_year_week_day(out)
        calendar_narrow(out, "week")
    }
    
    precision <- calendar_precision(x)
    
    switch(
        precision,
        "day" = df(x),
        "week" = wf(x),
        "year" = calendar_spanning_seq(x),
        stop("Unsupported precision")
    )
}

I'm aware that you're still working on how you think {clock} is best utilised by other packages and for end users the benefit of the current approach is to make them explicitly handle these situations. However I'm hoping there is scope to maintain the minimalist class constructors but slightly increase the flexibility of the methods (balanced with increased documentation as to what the methods are doing).

I hope all this makes sense and no problem if you want to keep things as is (I just wanted to ensure I had raised it as a discussion point).

Answer 1 · 2023-05-30T14:12:49.000Z

The main issue with this comes with the iso_year_week_day -> naive_time conversion. This is only always possible for this class at "year" precision (for year_month_day that conversion is possible up to "month" precision). Starting at "week" precision for this class, you can begin to have invalid year-week dates, like:

library(clock)

my_calendar_spanning_seq.clock_iso_year_week_day <- function(x) {
  
  df <- function(x) {
    out <- as_naive_time(x)
    out <- seq(min(out), max(out), by = 1L)
    as_iso_year_week_day(out)
  }
  
  wf <- function(x) {
    out <- calendar_widen(x, "day")
    out <- as_naive_time(out)
    out <- seq(min(out), max(out), by = 7L)
    out <- as_iso_year_week_day(out)
    calendar_narrow(out, "week")
  }
  
  precision <- calendar_precision(x)
  
  switch(
    precision,
    "day" = df(x),
    "week" = wf(x),
    "year" = calendar_spanning_seq(x),
    stop("Unsupported precision")
  )
}

days <- iso_year_week_day(2021:2022, c(1L, 53L), 1L)
invalid_detect(days)
#> [1] FALSE  TRUE
my_calendar_spanning_seq.clock_iso_year_week_day(days)
#> Error in `as_sys_time()`:
#> ! Can't convert `x` to another type because some dates are invalid.
#> ℹ The following locations are invalid: 2.
#> ℹ Resolve invalid dates with `invalid_resolve()`.

To allow the user to handle this, we'd have to expose invalid to calendar_spanning_seq(), and I have avoided that like the plague. Currently the only place invalid is exposed in the low level API is invalid_resolve(). To cross the type barrier between <calendar> <-> <time-point> you have to call invalid_resolve() first to ensure there aren't any invalid values. This avoids a proliferation of invalid, nonexistent, and ambiguous arguments in the low level API (nonexistent and ambiguous are only exposed in as_zoned_time.clock_naive_time() in the low level API). The high level API, on the other hand, exposes these "as needed" to gloss over some of the details.

I find the fact that invalid and nonexistent/ambiguous are only exposed in a single place in the low level API pretty "beautiful" from a software perspective, but I am empathetic to the fact that this is somewhat annoying, especially with the week based class.

In the ggplot2 PR that is still open, I had to add a little custom helper that allows you to add weeks to a year-week-day. It "assumes" the user won't provide invalid year-week dates.
https://github.com/r-lib/clock/pull/345/files#diff-7a8566ced0cab53e90292e8d01f43dd2b355afb2fa00d99689dd7225e7b4073dR300

Without that you can't do a week precision plot, which would suck.

Typically this "invalid" date precision boundary also has a performance component too. You can't add days to a year-month-day because:

At day precision it could have invalid dates
Converting to a time point, which is internally an int64_t count, and then doing addition is massively faster than doing it on individual year/month/day components.

See also https://github.com/HowardHinnant/date/wiki/FAQ#day_arithmetic

So I'm not sure what the right solution is right now, but I am thinking about it

Answer 2 · 2023-05-30T14:43:54.000Z

Cheers - Sorry for being the (pesky) user wanting to ruin the beautiful code! I appreciate the thought you're putting in to this!

Another benefit from the user perspective is that you can implement methods (such as calendar_spanning_seq) more efficiently internally than we can via the exposed API.

Plot functionality looks great btw!