reconverse/incidence2

Change behaviour of keep_first() and keep_last()

thibautjombart opened this issue · 3 comments

I think the basic understanding of keep_first() and keep_last() is that the first/last xxx time steps are kept. Currently behaviour keeps the first/last xxx items in the data, which is only the same if data has gone through complete_counts() first. Here is an example:

library(incidence2)
x <- incidence(data.frame(dates = c(1, 0, 8, 10, 11, 12, 12, 10)), "dates")
x
#> # incidence: 6 x 3
#>   date_index count_variable count
#> *      <dbl> <chr>          <int>
#> 1          0 dates              1
#> 2          1 dates              1
#> 3          8 dates              1
#> 4         10 dates              2
#> 5         11 dates              1
#> 6         12 dates              2

## current behaviour
keep_first(x, 3)
#> # incidence: 3 x 3
#>   date_index count_variable count
#> *      <dbl> <chr>          <int>
#> 1          0 dates              1
#> 2          1 dates              1
#> 3          8 dates              1
plot(keep_first(x, 3)) # first 9 days are actually kept

keep_last(x, 4)
#> # incidence: 4 x 3
#>   date_index count_variable count
#> *      <dbl> <chr>          <int>
#> 1          8 dates              1
#> 2         10 dates              2
#> 3         11 dates              1
#> 4         12 dates              2
plot(keep_last(x, 4)) # last 5 days are actually kept

## expected behaviour
y <- complete_counts(x)
keep_first(y, 3)
#> # incidence: 3 x 3
#>   date_index count_variable count
#> *      <dbl> <chr>          <int>
#> 1          0 dates              1
#> 2          1 dates              1
#> 3          2 dates              0
plot(keep_first(y, 3))

keep_last(y, 4)
#> # incidence: 4 x 3
#>   date_index count_variable count
#> *      <dbl> <chr>          <int>
#> 1          9 dates              0
#> 2         10 dates              2
#> 3         11 dates              1
#> 4         12 dates              2
plot(keep_last(y, 4))

Created on 2023-01-09 with reprex v2.0.2

The issue comes in knowing what a continuous sequence of "dates" is. For {grate} objects and dates this is straightforward (we increase by 1 units as a sensible default) but for POSIXt objects it is a little trickier. Is a user intending it to be a sequence by seconds, minutes, hours, days, etc etc. We need to think more about this.

You would need to know the above to get the desired sequence. It may be that we need to customise complete_counts() for POSIXt objects but this makes me think it is a concious thing to push on to the user and they need to call that function first.

Wonder if this should actually be renamed get_first() if we change the behaviour as suggested?

closed by 1ef0420