reconverse/incidence2

Where to put labels on the x-axis?

Closed this issue · 7 comments

There has been debates in the past on where dates should appear on the x-axis. I will try to sum up views / things to take into account below, and maybe some will add thoughts to it.

Original incidence package

  • epicurves were treated as histograms; bars represent case counts between 2 time points, so that e.g. for monthly incidence, a date on the x-axis marks the left hand-side of the bin (label to the left)
  • for fitting, a single date needs to be associated to a case count; thus we were using the middle of the time interval (label in the middle)
  • we did not have options for plotting epicurves as points / lines
  • several users complained that label in the middle was more intuitive

Current considerations

  • I suspect most epis do not read epicurves as histograms, so label in the middle would make sense
  • if we add geom_point and geom_line as options for plot and facet_plot, it is preferrable to have a consistent label positioning, which works the same for all geoms; label in the middle seems better for this: it still makes sense with geom_bar
  • model predictions will probably work better with label in the middle
  • devel-wise, it is safer to go with the least-amount of fiddling with ggplot2 handing of the x-axis

Initial thoughts: I think we may need to go the tricky route. I think date labels should be on the left (although I wouldn't object to them being on the middle if it was a daily incidence object). For the custom custom labels (e.g. 2018-W05) from the aweek package I'd be ok with middle. Interestingly we don't use custom labels for "quarter" intervals instead showing them as dates.

library(incidence2)
library(dplyr, warn.conflicts = FALSE)

data(ebola_sim_clean, package = "outbreaks")
dat <- 
  ebola_sim_clean$linelist %>% 
  filter(date_of_onset <= "2014-07-07")


inci_10 <- incidence(dat, date_index = date_of_onset, interval = 10)

inci_week <- incidence(dat, date_index = date_of_onset, interval = "week")

# Dates should be on the left
plot(inci_10)

# Groupings could be in the middle rather than as shown here
plot(inci_week)

Created on 2020-07-07 by the reprex package (v0.3.0)

I agree, we may need to go for a bit of complexity here. Would it make sense to have a function generating information for the x-axis (position and label) from get_dates(x) and get_interval(x)? And a behaviour along the lines of:

  • if interval is: 1L / "day", 7L / "week", "month", "quarter", "year"

    • x_coord = get_dates(x) + interval / 2
    • x_label = [default label depending on time interval, overridden by user-specified format]
      • 1L / "day": as.character(get_dates(x))
      • 7L / "week": iso week
      • "month": format(get_dates(x), "%b %y")
      • "quarter": like "month" except Q1-4 replaces %b
      • "year": format(get_dates(x), "%Y")
  • other intervals:

    • x_coord = get_dates(x)
    • x_label = get_dates(x)

Does it sound sensible?

Yep exactly what I was thinking!

Still not sure I like centreing dates but here's the current implementation I'm working on (I've not yet pushed this to devel). Still a bug to fix with the intervals for monthly/quarterly groupings and in the weekly grouping (shouldn't show a date_group for 2 week intervals):

library(incidence2)
library(dplyr, warn.conflicts = FALSE)

# get some data
data(ebola_sim_clean, package = "outbreaks")
dat <-
  ebola_sim_clean$linelist %>%
  filter(date_of_onset <= "2014-05-30")

dat2 <-
  ebola_sim_clean$linelist %>%
  filter(date_of_onset <= "2014-09-30")

# day (centered)
iday <- incidence(dat, date_index = date_of_onset)
iday
#> <incidence object>
#> [72 cases from days 2014-04-07 to 2014-05-30]
#> [interval: 1 day]
#> [cumulative: FALSE]
#> 
#>    bin_date   count
#>    <date>     <int>
#>  1 2014-04-07     1
#>  2 2014-04-08     0
#>  3 2014-04-09     0
#>  4 2014-04-10     0
#>  5 2014-04-11     0
#>  6 2014-04-12     0
#>  7 2014-04-13     0
#>  8 2014-04-14     0
#>  9 2014-04-15     1
#> 10 2014-04-16     0
#> # … with 44 more rows
plot(iday, color = "white")

# week (centered)
iweek <- incidence(dat2, date_index = date_of_onset, interval = "1 week")
iweek
#> <incidence object>
#> [2088 cases from days 2014-04-07 to 2014-09-29]
#> [interval: 1 week]
#> [cumulative: FALSE]
#> 
#>    bin_date   date_group count
#>    <date>     <aweek>    <int>
#>  1 2014-04-07 2014-W15       1
#>  2 2014-04-14 2014-W16       1
#>  3 2014-04-21 2014-W17       5
#>  4 2014-04-28 2014-W18       4
#>  5 2014-05-05 2014-W19      12
#>  6 2014-05-12 2014-W20      17
#>  7 2014-05-19 2014-W21      15
#>  8 2014-05-26 2014-W22      19
#>  9 2014-06-02 2014-W23      23
#> 10 2014-06-09 2014-W24      21
#> # … with 16 more rows
plot(iweek, color = "white")

# 2 weeks (not centered)
i2week <- incidence(dat2, date_index = date_of_onset, interval = "2 weeks")
i2week
#> <incidence object>
#> [2088 cases from days 2014-04-07 to 2014-09-22]
#> [interval: 2 weeks]
#> [cumulative: FALSE]
#> 
#>    bin_date   date_group count
#>    <date>     <aweek>    <int>
#>  1 2014-04-07 2014-W15       2
#>  2 2014-04-21 2014-W17       9
#>  3 2014-05-05 2014-W19      29
#>  4 2014-05-19 2014-W21      34
#>  5 2014-06-02 2014-W23      44
#>  6 2014-06-16 2014-W25      52
#>  7 2014-06-30 2014-W27      72
#>  8 2014-07-14 2014-W29     120
#>  9 2014-07-28 2014-W31     166
#> 10 2014-08-11 2014-W33     255
#> 11 2014-08-25 2014-W35     369
#> 12 2014-09-08 2014-W37     558
#> 13 2014-09-22 2014-W39     378
plot(i2week, color = "white")

# month (centered)
imonth <- incidence(dat2, date_index = date_of_onset, interval = "month")
imonth
#> <incidence object>
#> [2088 cases from days 2014-04-01 to 2014-09-01]
#> [interval: 1 month]
#> [cumulative: FALSE]
#> 
#>   bin_date   date_group count
#>   <date>     <chr>      <int>
#> 1 2014-04-01 Apr 14         7
#> 2 2014-05-01 May 14        67
#> 3 2014-06-01 Jun 14       102
#> 4 2014-07-01 Jul 14       228
#> 5 2014-08-01 Aug 14       540
#> 6 2014-09-01 Sep 14      1144
plot(imonth)
#> Warning: position_stack requires non-overlapping x intervals

# 2 months (not centered)
i2month <- incidence(dat2, date_index = date_of_onset, interval = "2 months")
i2month
#> <incidence object>
#> [2088 cases from days 2014-04-01 to 2014-08-01]
#> [interval: 2 months]
#> [cumulative: FALSE]
#> 
#>   bin_date   count
#>   <date>     <int>
#> 1 2014-04-01    74
#> 2 2014-06-01   330
#> 3 2014-08-01  1684
plot(i2month)

Created on 2020-07-09 by the reprex package (v0.3.0)

Nice! Seeing this, oddly enough, I do like the centring for days and weeks, but not so much for month. Still, I think it makes sense as a default, but there will be different opinions on this. Maybe we could add an argument to override the default behaviour, so that people can choose?

That seems sensible. Currently refining and tidying the code so will leave this open for the time being.

Have added a centre_labels flag within the two plot functions. This defaults to FALSE for the moment but will change.