round_date / floor_date / ceiling_date: week start when using a period as unit
TimBMK opened this issue · 2 comments
The round_date()
-functions (like ceiling_date()
and floor_date()
) documentation states that periods, such as weeks(1)
, can be provided as unit
. This is great for use in higher level functions, where an arbitrary time period may be chosen to floor or ceiling a date. However, the results between unit = "weeks"
and unit = weeks(1)
differ. Specifically, unit = weeks(1)
seems to set the week_start to Friday, and ignores the week_start
argument. See reprex below.
Am I missing something here or is this a bug?
library(tidyverse)
example <-
tibble(
datetime = c(
"2022-07-05 04:32:55 UTC",
"2022-07-09 04:42:47 UTC",
"2022-07-18 04:45:42 UTC"
)
)
# see how the results differ between the two unit arguments
example %>% mutate(
period_unit = ceiling_date(as.Date(datetime), unit = weeks(1)),
period_str = ceiling_date(as.Date(datetime), unit = "weeks")
)
# weeks(1) also ignores the week_start argument
example %>% mutate(
period_unit = ceiling_date(as.Date(datetime), unit = weeks(1), week_start = 3),
period_str = ceiling_date(as.Date(datetime), unit = "weeks", week_start = 3)
)
> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.1 tidyverse_2.0.0 lubridate_1.9.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 quanteda_3.2.2 pillar_1.9.0 compiler_4.2.2 tools_4.2.2 stopwords_2.3 digest_0.6.31 bit_4.0.5
[9] gtable_0.3.0 timechange_0.2.0 lifecycle_1.0.3 lattice_0.20-45 pkgconfig_2.0.3 rlang_1.1.1 Matrix_1.5-1 fastmatch_1.1-3
[17] cli_3.6.1 rstudioapi_0.14 parallel_4.2.2 withr_2.5.0 hms_1.1.3 generics_0.1.3 vctrs_0.6.2 globals_0.16.2
[25] bit64_4.0.5 grid_4.2.2 tidyselect_1.2.0 glue_1.6.2 listenv_0.9.0 R6_2.5.1 fansi_1.0.4 parallelly_1.35.0
[33] vroom_1.6.3 tzdb_0.3.0 magrittr_2.0.3 scales_1.2.0 codetools_0.2-19 colorspace_2.0-3 future_1.32.0 utf8_1.2.3
[41] stringi_1.7.12 munsell_0.5.0 RcppParallel_5.1.5 crayon_1.5.2
This is because week(1)
is internally a "7d 0H 0M 0S"
> weeks(1)
[1] "7d 0H 0M 0S"
> str(weeks(1))
Formal class 'Period' [package "lubridate"] with 6 slots
..@ .Data : num 0
..@ year : num 0
..@ month : num 0
..@ day : num 7
..@ hour : num 0
..@ minute: num 0
So you get a ceiling and flooring to the multiple of 7 days from the begining of the month. I agree that this is a bit confusing but I am afraid there is no solution for this given the current internal representation of the periods.
I am clarifying the documentation regarding exactly this point.
Thanks for the clarification! Yes, it makes sense that it starts from the first day of the month. It's a little confusing when you expect weeks(1) to behave like an equivalent of the "weeks" unit, especially considering the week_start argument that is silently getting ignored in this case.
In practice, I've used the weeks() unit specifically to avoid the fiddly month start dates, e.g. by replacing months(3) with weeks(12) to get more consistent timeframe slices over longer periods. So I'm a little surprised weeks() is tied to the monthly start date here, but that might just be me misunderstanding the exact nature of the weeks() period.
Anyways, thank you for this great package. It certainly makes life a lot easier when dealing with dates!