Where to put labels on the x-axis?
Closed this issue · 7 comments
There has been debates in the past on where dates should appear on the x-axis. I will try to sum up views / things to take into account below, and maybe some will add thoughts to it.
Original incidence package
- epicurves were treated as histograms; bars represent case counts between 2 time points, so that e.g. for monthly incidence, a date on the x-axis marks the left hand-side of the bin (label to the left)
- for fitting, a single date needs to be associated to a case count; thus we were using the middle of the time interval (label in the middle)
- we did not have options for plotting epicurves as points / lines
- several users complained that label in the middle was more intuitive
Current considerations
- I suspect most epis do not read epicurves as histograms, so label in the middle would make sense
- if we add
geom_point
andgeom_line
as options forplot
andfacet_plot
, it is preferrable to have a consistent label positioning, which works the same for all geoms; label in the middle seems better for this: it still makes sense withgeom_bar
- model predictions will probably work better with label in the middle
- devel-wise, it is safer to go with the least-amount of fiddling with ggplot2 handing of the x-axis
Initial thoughts: I think we may need to go the tricky route. I think date labels should be on the left (although I wouldn't object to them being on the middle if it was a daily incidence object). For the custom custom labels (e.g. 2018-W05) from the aweek package I'd be ok with middle. Interestingly we don't use custom labels for "quarter" intervals instead showing them as dates.
library(incidence2)
library(dplyr, warn.conflicts = FALSE)
data(ebola_sim_clean, package = "outbreaks")
dat <-
ebola_sim_clean$linelist %>%
filter(date_of_onset <= "2014-07-07")
inci_10 <- incidence(dat, date_index = date_of_onset, interval = 10)
inci_week <- incidence(dat, date_index = date_of_onset, interval = "week")
# Dates should be on the left
plot(inci_10)
# Groupings could be in the middle rather than as shown here
plot(inci_week)
Created on 2020-07-07 by the reprex package (v0.3.0)
I agree, we may need to go for a bit of complexity here. Would it make sense to have a function generating information for the x-axis (position and label) from get_dates(x)
and get_interval(x)
? And a behaviour along the lines of:
-
if interval is: 1L / "day", 7L / "week", "month", "quarter", "year"
x_coord = get_dates(x) + interval / 2
x_label =
[default label depending on time interval, overridden by user-specified format]- 1L / "day":
as.character(get_dates(x))
- 7L / "week": iso week
- "month":
format(get_dates(x), "%b %y")
- "quarter": like "month" except Q1-4 replaces
%b
- "year":
format(get_dates(x), "%Y")
- 1L / "day":
-
other intervals:
x_coord = get_dates(x)
x_label = get_dates(x)
Does it sound sensible?
Yep exactly what I was thinking!
Still not sure I like centreing dates but here's the current implementation I'm working on (I've not yet pushed this to devel). Still a bug to fix with the intervals for monthly/quarterly groupings and in the weekly grouping (shouldn't show a date_group for 2 week intervals):
library(incidence2)
library(dplyr, warn.conflicts = FALSE)
# get some data
data(ebola_sim_clean, package = "outbreaks")
dat <-
ebola_sim_clean$linelist %>%
filter(date_of_onset <= "2014-05-30")
dat2 <-
ebola_sim_clean$linelist %>%
filter(date_of_onset <= "2014-09-30")
# day (centered)
iday <- incidence(dat, date_index = date_of_onset)
iday
#> <incidence object>
#> [72 cases from days 2014-04-07 to 2014-05-30]
#> [interval: 1 day]
#> [cumulative: FALSE]
#>
#> bin_date count
#> <date> <int>
#> 1 2014-04-07 1
#> 2 2014-04-08 0
#> 3 2014-04-09 0
#> 4 2014-04-10 0
#> 5 2014-04-11 0
#> 6 2014-04-12 0
#> 7 2014-04-13 0
#> 8 2014-04-14 0
#> 9 2014-04-15 1
#> 10 2014-04-16 0
#> # … with 44 more rows
plot(iday, color = "white")
# week (centered)
iweek <- incidence(dat2, date_index = date_of_onset, interval = "1 week")
iweek
#> <incidence object>
#> [2088 cases from days 2014-04-07 to 2014-09-29]
#> [interval: 1 week]
#> [cumulative: FALSE]
#>
#> bin_date date_group count
#> <date> <aweek> <int>
#> 1 2014-04-07 2014-W15 1
#> 2 2014-04-14 2014-W16 1
#> 3 2014-04-21 2014-W17 5
#> 4 2014-04-28 2014-W18 4
#> 5 2014-05-05 2014-W19 12
#> 6 2014-05-12 2014-W20 17
#> 7 2014-05-19 2014-W21 15
#> 8 2014-05-26 2014-W22 19
#> 9 2014-06-02 2014-W23 23
#> 10 2014-06-09 2014-W24 21
#> # … with 16 more rows
plot(iweek, color = "white")
# 2 weeks (not centered)
i2week <- incidence(dat2, date_index = date_of_onset, interval = "2 weeks")
i2week
#> <incidence object>
#> [2088 cases from days 2014-04-07 to 2014-09-22]
#> [interval: 2 weeks]
#> [cumulative: FALSE]
#>
#> bin_date date_group count
#> <date> <aweek> <int>
#> 1 2014-04-07 2014-W15 2
#> 2 2014-04-21 2014-W17 9
#> 3 2014-05-05 2014-W19 29
#> 4 2014-05-19 2014-W21 34
#> 5 2014-06-02 2014-W23 44
#> 6 2014-06-16 2014-W25 52
#> 7 2014-06-30 2014-W27 72
#> 8 2014-07-14 2014-W29 120
#> 9 2014-07-28 2014-W31 166
#> 10 2014-08-11 2014-W33 255
#> 11 2014-08-25 2014-W35 369
#> 12 2014-09-08 2014-W37 558
#> 13 2014-09-22 2014-W39 378
plot(i2week, color = "white")
# month (centered)
imonth <- incidence(dat2, date_index = date_of_onset, interval = "month")
imonth
#> <incidence object>
#> [2088 cases from days 2014-04-01 to 2014-09-01]
#> [interval: 1 month]
#> [cumulative: FALSE]
#>
#> bin_date date_group count
#> <date> <chr> <int>
#> 1 2014-04-01 Apr 14 7
#> 2 2014-05-01 May 14 67
#> 3 2014-06-01 Jun 14 102
#> 4 2014-07-01 Jul 14 228
#> 5 2014-08-01 Aug 14 540
#> 6 2014-09-01 Sep 14 1144
plot(imonth)
#> Warning: position_stack requires non-overlapping x intervals
# 2 months (not centered)
i2month <- incidence(dat2, date_index = date_of_onset, interval = "2 months")
i2month
#> <incidence object>
#> [2088 cases from days 2014-04-01 to 2014-08-01]
#> [interval: 2 months]
#> [cumulative: FALSE]
#>
#> bin_date count
#> <date> <int>
#> 1 2014-04-01 74
#> 2 2014-06-01 330
#> 3 2014-08-01 1684
plot(i2month)
Created on 2020-07-09 by the reprex package (v0.3.0)
Nice! Seeing this, oddly enough, I do like the centring for days and weeks, but not so much for month. Still, I think it makes sense as a default, but there will be different opinions on this. Maybe we could add an argument to override the default behaviour, so that people can choose?
That seems sensible. Currently refining and tidying the code so will leave this open for the time being.
Have added a centre_labels
flag within the two plot functions. This defaults to FALSE
for the moment but will change.