tidyverse/ggplot2

Change in aesthetic column order in get_layer_data() output in ggplot2 v4.0.0

Closed this issue · 3 comments

Hello ggplot2 team,

First, thanks for the work on the 4.0.0 release and the S7 migration!

I’m seeing a change in the ordering of columns returned by get_layer_data() depending on whether aesthetics are specified in a single aes() call or split across multiple aes() calls.

library(ggplot2)

# Case 1: all aesthetics in one aes()
p1 <- ggplot(penguins) +
  aes(x = bill_dep, y = bill_len, colour = species, shape = island) +
  geom_point()

# Case 2: aesthetics split across two aes() calls
p2 <- ggplot(penguins) +
  aes(x = bill_dep, y = bill_len) +
  aes(colour = species, shape = island) +
  geom_point()

ld1 <- get_layer_data(p1)
ld2 <- get_layer_data(p2)

print(names(ld1)[1:4])
print(names(ld2)[1:4])

On ggplot2 v4.0.0, I observe:

[1] "x"      "y"      "colour" "shape" 
[1] "colour" "shape"  "x"      "y"  

So when aesthetics are added across multiple aes() calls, the merged mapping appears to reorder the columns in the layer data (non-positional aesthetics like colour/shape come before x/y).

Questions

  • Is this reordering intentional in v4.0.0 (e.g., due to internal changes with S7/merging of mappings), or an unintended side effect?
  • For downstream code/packages that inspect get_layer_data(), should we avoid assuming any column order and always access columns by name?
  • If the change is unintentional, is there a plan to restore the previous ordering in a patch release—or should we consider the new ordering as the correct behavior going forward?

Thanks a lot for your guidance and for all your work on ggplot2!

Thanks for the report!

In my mind get_layer_data() is a user-facing function that retrieves an internal data structure. If the internals change, then the output of get_layer_data() changes, but get_layer_data()'s API is still stably retrieving that structure. So the 'promise' that ggplot2 makes is that this structure is retrievable, but not that the shape/order is consistent across versions.

Is this reordering intentional

No, but it may be inconvenient to preserve the old order.

should we avoid assuming any column order

Yes. If you're buildling tests based on this structure, I recommend testing only variables under control by your package. For example, if you extend a Stat and want to test for the presence of your computed variables. If this is more of a proxy like 'can this layer be built', I'd recommend using snapshot tests instead.

is there a plan to restore the previous ordering in a patch release

None that I'm aware of. The new order is as correct as the old order for reasons outlined at the top.

Thanks for the detailed clarification!

That makes sense — I understand that get_layer_data() reflects internals and that column order is not guaranteed.

I still find it useful in my workflow to test against the whole get_layer_data object (not just individual columns), but I’ve discovered that testthat::expect_mapequal() works well here since it compares ignoring column order. That solves my immediate problem.

No need for further action on your side, so feel free to close this issue. Thanks again for the explanation and guidance!

Lovely, I didn't know about testthath::expect_mapequal() but this sounds like a good use case for it!