tidyverse/ggplot2

De-deprecate numeric vector for legend.position

Closed this issue · 3 comments

The ggplot2 readme states:

ggplot2 is now over 10 years old and is used by hundreds of thousands of people to make millions of plots. That means, by-and-large, ggplot2 itself changes relatively little. When we do make changes, they will be generally to add new functions or arguments rather than changing the behaviour of existing functions, and if we do make changes to existing behaviour we will do them for compelling reasons.

In ggplot2 3.5.0, supplying a numeric vector to legend.position was soft-deprecated. Hard-deprecating this option will break around ~100,000 plots on github, which I attest by searching* for legend.position = c(0. and legend.position=c(0.. This yielded 42,000 file hits overall, and I estimated the number of relevant uses of legend.position per hit by counting the number of appearances of the search string per file in pages 1, 3, and 5 of either set of search results. There were just over 2.3 uses of a numeric vector legend.position per file hit, giving an estimate of 100,000 uses of legend.position that will be deprecated.

I then searched for ggplot(, yielding 1.3M file hits. With a similar methodology, and discounting appearances of "ggplot()" on its own in explanatory text, I estimate that there are around 19M uses of ggplot() on github (14.65 appearances per file hit times 1.3M files).

This suggests that around 0.5% of uses of ggplot() (in code that is either used to generate a plot or used as an example of how one might generate a plot) could be broken when numeric vector legend.position() is hard-deprecated.

I have read the blog post explaining the rationale for deprecating this usage, which states:

In previous versions of ggplot2, you could set the legend.position to a coordinate to control the placement. However, doing this would change the default legend position, which is not always desirable.

It may not be always desirable, but it has seemed to work well enough so far. I wonder if it is really worth it to hard-deprecate this feature in favour of a wordier alternative (supplying both legend.position = "inside" and legend.position.inside = c(...)) that, to the best I can tell, will only simplify usage of ggplot() in relatively unusual cases where there are multiple legends that need to be placed independently. For that reason, it's unclear to me what the compelling reason for deprecating this usage might be.

Thanks!

* I now realise that this underestimates the prevalence of this usage by around a third, since there are many examples of dropping the leading zero before the decimal point, or just using 0, for the first element of the numeric vector.

The compelling reason is as you point out (multiple legend positions), and I get why this may not be compelling to everyone. For me, being able to place legends at various, differing positions is important and not merely an unusual case.

It gives more control for users and it is easier for us to maintain, because we don't have to consider at every step of building a guide whether a position is numeric and thus treat the guide as having position = 'inside'.

Besides, it is currently just soft-deprecated, meaning that packages that use ggplot2 will get warnings but not the users of aforementioned packages. Only in a few years will it become hard deprecated, at which point everybody will get the warning. That still does not necessarily break packages.

Thanks @teunbrand. I agree that being able to place legends at differing positions is useful. But my argument here is that the need to do so will be comparatively less common than the need to place a single legend (or having multiple legends, but giving them one placement) and the old numeric vector usage for placing a legend is nice and intuitive for end users.

I wasn't so much thinking about breaking packages, which one expects to have to maintain, but breaking projects, e.g. figures for scientific papers with code archived on github or other archives. I think I have probably used this feature in dozens of plots over many years. Now of course there will be other packages besides ggplot2 that drift out of compatibility with old projects, and I'm not an active contributor to ggplot2 so it's not my place to make a judgement on the maintainability. But the "old" system has a really convenient syntax so it seems like a shame to lose.

I did get confused over the difference between hard-deprecation and removal of the feature — seems like removing it entirely is still several years away.

Anyway, thank you for explaining the issues here. Happy for you to close the issue if you don't have anything to add.