setting axis aesthetics to `x/y = NULL` turns default axis labels into `"x" / "y"` instead of keeping the variable names of the next layer

Question

setting axis aesthetics to `x/y = NULL` turns default axis labels into `"x" / "y"` instead of keeping the variable names of the next layer

Closed this issue 20 days ago · 6 comments

I found a problem with the handling of default axis labels with layers that ignore an aesthetic by setting it to NULL. I expected the previous behavior: the default axis labels from the first layer that has those defined.

Here is a small code snippet to reproduce this bug: in all 3 cases below, the x and y axis labels should be cty and hwy but the x = NULL, y = NULL aesthetic in geom_rect turns these into x and y instead of having the labels come from top-level aes or the second0-layer geom_point() aes.

library(dplyr)
library(ggplot2)

# ggplot2 version
packageVersion("ggplot2")
#> [1] '4.0.0'

# this works to correctly set the x / y axis label defaults to hwy and cty
mpg |>
  ggplot() +
  geom_rect(
    data = function(df) {
      df |>
        filter(drv == "f") |>
        summarize(xmin = min(cty), xmax = max(cty), ymin = min(hwy), ymax = max(hwy))
    },
    map = aes(
      xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax
    )
  ) +
  geom_point(map = aes(cty, hwy))

# this does not work anymore the way it worked previously:
# define x/y aesthetics at top level and turn their use off in geom_rect with x=NULL and y=NULL
# the resulting x/y axis label defaults are now x/y instead of hwy and cty
# this happens the same way if the aesthetic is defined in geom_point and not at top-level,
# the x=NULL / y=NULL seems to cause it
mpg |>
  ggplot() +
  aes(cty, hwy) +
  geom_rect(
    data = function(df) {
      df |>
        filter(drv == "f") |>
        summarize(xmin = min(cty), xmax = max(cty), ymin = min(hwy), ymax = max(hwy))
    },
    map = aes(
      x = NULL, y = NULL,
      xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax
    )
  ) +
  geom_point()

# if it's just the geom point without the geom_rect layer, it works correctly again
mpg |>
  ggplot() +
  aes(cty, hwy) +
  geom_point()

^{Created on 2025-09-15 with reprex v2.1.1}

Thank you for all the wonderful new features in ggplot4.0 !

Answer 1 · 2025-09-29T11:23:17.000Z

To add to this, we also get this behaviour if we add a geom that uses a subset of the data:

babynames::babynames |>
  dplyr::filter(name %in% c("Mary", "Hannah"), sex == "F") |>
  ggplot(aes(x = year, y = n, group = name, color = name)) +
  geom_vline(xintercept = 1985) +
  ggtext::geom_textbox(
    data = head(babynames::babynames, 1),
    aes(x = 1985, y = 65000, label = "Something interesting happened in 1985"),
    hjust = 1
  ) +
  geom_line() +
  geom_point()

Removing the ggtext::geom_textbox() layer restores the axis titles to what they should be:

babynames::babynames |>
  dplyr::filter(name %in% c("Mary", "Hannah"), sex == "F") |>
  ggplot(aes(x = year, y = n, group = name, color = name)) +
  geom_vline(xintercept = 1985) +
  geom_line() +
  geom_point()

Thank you!

Answer 2 · 2025-09-29T11:44:20.000Z

@cararthompson Thanks for this additional example. We changed the automatic labelling to use the first instance of an aesthetic (the global aesthetics don't necessarily instantiate). If we just consider the x-axis title, the x = 1985 is causing the title to be x. Leaving out the textbox layer will cause geom_line() to contribute the label, which derives the label from the global aesthetics ('year').

I'm open to an argument that atomic mappings, where you declare for example aes(x = 1:5) give the label x, but that we should only use this as a fallback. Does that rhyme with your intuitions?

Answer 3 · 2025-09-29T12:55:21.000Z

Thank you for getting back to me. That makes sense of the behaviour. It feels unintuitive to me that despite setting aes() within the initial ggplot() call it would be changed by the first layer.

Typically, it's helpful to put annotation elements behind the data, so we need to add them as the first layer and then add the other geoms. For example, if you want to add a rectangle to highlight a particular area in the data for example - you don't want that rectangle to be in front of the relevant data geoms. Having the aes() derive from the first layer rather than from the aes within the original ggplot() seems problematic for that reason.

My preference would be for the axis titles which correspond to variables in the data to remain unchanged regardless of what different layers do in terms of specifying x and y directly. I appreciate my terminology may be clumsy here, but I hope it makes enough sense. Happy to try to clarify where need be!

Answer 4 · 2025-09-29T13:14:07.000Z

Here's an example in which the labels currently get changed unhelpfully imo by the first layer:

babynames::babynames |>
  dplyr::filter(name %in% c("Mary", "Hannah"), sex == "F") |>
  ggplot(aes(x = year, y = n)) +
  geom_segment(
    data = data.frame(),
    aes(
      x = c(1950, 1965, 1980),
      xend = c(1950, 1965, 1980),
      y = -Inf,
      yend = Inf
    ),
    linewidth = 10
  ) +
  geom_line(aes(group = name, color = name)) +
  geom_point(aes(group = name, color = name))

Answer 5 · 2025-09-29T13:31:36.000Z

Thanks for elaborating!

Typically, it's helpful to put annotation elements like this vertical line behind the data

I'm not opposed to this point, which is why we make an exception for layers generated with annotate(). I'm not saying you must use these, but they were designed for such purposes.

library(ggplot2)

babynames::babynames |>
  dplyr::filter(name %in% c("Mary", "Hannah"), sex == "F") |>
  ggplot(aes(x = year, y = n)) +
  annotate(
    geom = "segment",
    x = c(1950, 1965, 1980),
    xend = c(1950, 1965, 1980),
    y = -Inf,
    yend = Inf,
    linewidth = 10
  ) +
  geom_line(aes(group = name, color = name)) +
  geom_point(aes(group = name, color = name))

^{Created on 2025-09-29 with reprex v2.1.1}

As a counter-example, this should in my mind clearly display the cty title. Using the global mapping would be misleading in such a case.

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(y = cty))

^{Created on 2025-09-29 with reprex v2.1.1}

My preference would be for the axis titles which correspond to variables in the data to remain unchanged regardless of what different layers do in terms of specifying x and y directly.

I'm not saying this is impossible, but it'd require quite some extra logic to populate such defaults, which complicates maintenance in the long haul. Now if it is worth it in some way, i.e. we get genuinely better labels in most cases, that'd be a reasonable choice to make. As it stands though, I'm not yet convinced the trade-off falls in favour of doing this (see counter-example).

Another thing to keep in mind is that these are 'just' defaults. Any finished plot should likely have custom titles anyway and there are plenty of places where the titles can be changed (labs(), scales, guides).

Answer 6 · 2025-09-29T13:55:18.000Z

That's fair re: using annotate - my example was slightly forced after I realised that annotate gets round the issue.

And I also appreciate the point about labs() - a very valid point - it's rare to stick to the default label anyway.

Happy to trust your judgement on the trade-off and try to stick within the normal uses of these things rather hacks which worked despite the package creators' best intentions. Going forward, if this is the axis labelling behaviour, it's probably going to be a case of being clever with how we filter or transform the data for annotations which rely on a different data input.

For example, creating annotations for which the coordinates are the means of the x value for different groups in the data. I've typically created variable names that make it clear that we're looking at the mean (see below). If I do this in a workshop, it will confuse folks, but so long as I understand why this is happening, we can talk through it. We'll eventually get to labelling the axes using labs(), but in the steps towards building the graphs, the change in axis titles will take a bit of getting used to.

Really grateful for your time in helping me get my head round this!


beak_means_df <- penguins |>
  dplyr::group_by(species) |>
  # I just need to change this name...
  dplyr::summarise(mean_length = mean(bill_len, na.rm = TRUE))

penguins |>
  ggplot(aes(x = bill_len, y = species)) +
  # ... and this accordingly
  geom_segment(
    data = beak_means_df,
    aes(x = mean_length, xend = mean_length, y = -Inf, yend = species),
    linetype = 3
  ) +
  geom_jitter(
    aes(
      fill = species,
      size = body_mass
    ),
    shape = 21,
    width = 0,
    height = 0.15,
    colour = "#1A242F",
    stroke = 0.5
  ) +
  ggtext::geom_textbox(
    data = beak_means_df,
    aes(
     # ... and this
      x = mean_length,
      y = species,
      label = paste0(
        species,
        " mean<br>**",
        janitor::round_half_up(mean_length),
        "mm**"
      )
    ),
    hjust = 0,
    fill = NA,
    nudge_y = -0.3,
    box.colour = NA
  )