Binned scales are silently incorrect in 4.0.0

Question

Binned scales are silently incorrect in 4.0.0

Opened this issue 2 months ago · 5 comments

I found a problem with using ggplot2::binned_scale(). When using 5 bins, it now requires 6 colors. This broke the stable version of ggredist on CRAN. It appears that this was not caught by revdep checks. Most importantly, this means that, if you provide a character vector of colors (instead of a vctrs vector), you will get an incorrect plot but no errors, as the bins are off. When using vctrs, the stricter behavior fails loudly.

I expected five colors to map onto five bins. I don't see this breaking change in the NEWS or in existing issues. I believe it is a subtle error in the subsetting logic and not intended.

Here is the code to reproduce the bug:

library(ggplot2)
library(ggredist)
data(oregon)

scale_fill_538 <- function(...) {
  ggplot2::binned_scale(aesthetics = 'fill',
                        palette = function(x) ggredist$fivethirtyeight,
                        breaks = c(0, 0.35, 0.45, 0.55, 0.65, 1),
                        limits = c(.25, .75),
                        oob = scales::squish,
                        guide = 'colourbar',
                        ...
  )
}

oregon |> 
  ggplot(aes(fill = ndv / (ndv + nrv))) +
  geom_sf() +
  scale_fill_538()
#> Error in `vec_slice()`:
#> ! Can't subset elements past the end.
#> ℹ Location 6 doesn't exist.
#> ℹ There are only 5 elements.

What's going on here?
scale_fill_538() is a binned scale, based on 538's old electoral maps. It has 5 colors stored in a vctrs class from palette:

structure(c("#FA5A50", "#FF998A", "#EAE3EB", "#A1A9ED", "#5768AC"
), class = c("palette", "vctrs_vctr"))

If we update the scale to simply repeat the first color, it now fills in everything as expected.

library(ggplot2)
library(ggredist)
data(oregon)

scale_fill_538 <- function(...) {
  ggplot2::binned_scale(aesthetics = 'fill',
                        palette = function(x) c(ggredist$fivethirtyeight[1], ggredist$fivethirtyeight),
                        breaks = c(0, 0.35, 0.45, 0.55, 0.65, 1),
                        limits = c(.25, .75),
                        oob = scales::squish,
                        guide = 'colourbar',
                        ...
  )
}

oregon |> 
  ggplot(aes(fill = ndv / (ndv + nrv))) +
  geom_sf() +
  scale_fill_538()

^{Created on 2025-09-15 with reprex v2.1.1}

Digging a little deeper: When we don't use a vctrs class for the colors, we see the following:

library(ggplot2)
library(ggredist)
data(oregon)

scale_fill_538 <- function(...) {
  ggplot2::binned_scale(aesthetics = 'fill',
                        palette = function(x) c("#FA5A50", "#FF998A", "#EAE3EB", "#A1A9ED", "#5768AC"
                        ),
                        breaks = c(0, 0.35, 0.45, 0.55, 0.65, 1),
                        limits = c(.25, .75),
                        oob = scales::squish,
                        guide = 'colourbar',
                        ...
  )
}

oregon |> 
  ggplot(aes(fill = ndv / (ndv + nrv))) +
  geom_sf() +
  scale_fill_538()

^{Created on 2025-09-15 with reprex v2.1.1}

ie, it produces an incorrect plot but doesn't error. It drops the first color silently, so that the 5 bins use 4 colors + white in the legend.

And again, we can correct the chart

library(ggplot2)
library(ggredist)
data(oregon)

scale_fill_538 <- function(...) {
  ggplot2::binned_scale(aesthetics = 'fill',
                        palette = function(x) c("#FA5A50", "#FA5A50", "#FF998A", "#EAE3EB", "#A1A9ED", "#5768AC"
                        ),
                        breaks = c(0, 0.35, 0.45, 0.55, 0.65, 1),
                        limits = c(.25, .75),
                        oob = scales::squish,
                        guide = 'colourbar',
                        ...
  )
}

oregon |> 
  ggplot(aes(fill = ndv / (ndv + nrv))) +
  geom_sf() +
  scale_fill_538()

^{Created on 2025-09-15 with reprex v2.1.1}

Finally, this is an important issue as it silently produces incorrect plots. Here is a manually binned version, using dplyr to do the bins and adding a manual class with the names.

library(ggplot2)
library(ggredist)
data(oregon)

oregon |> 
  dplyr::mutate(class = dplyr::case_when(
    ndv / (ndv + nrv) < 0.35 ~ "Strong R",
    ndv / (ndv + nrv) < 0.45 ~ "Lean R",
    ndv / (ndv + nrv) < 0.55 ~ "Tossup",
    ndv / (ndv + nrv) < 0.65 ~ "Lean D",
    TRUE ~ "Strong D"
  )) |>
  ggplot(aes(fill = class)) +
scale_fill_manual(
  values = c(
    "Strong R" = "#FA5A50",
    "Lean R" = "#FF998A",
    "Tossup" = "#EAE3EB",
    "Lean D" = "#A1A9ED",
    "Strong D" = "#5768AC"
  )
) +
  geom_sf()

^{Created on 2025-09-15 with reprex v2.1.1}

I'm relying on data from ggredist because it allows me to directly link an example pkgdown generated under the old ggplot version with the expected map, here: https://alarm-redist.org/ggredist/reference/scale_538.html (Sorry, I know it's never optimal to introduce a second package into the mix while reporting issues, but I figure the reference is net helpful.)

Answer 1 · 2025-09-15T21:02:40.000Z

Thanks for the report!
I have to dig a little bit deeper in this specific example, but in my digging into reverse dependencies I've often found the following. When the palette output had a class that was incompatible with na.value, there were some issues here and there that needed fixing.

Answer 2 · 2025-09-24T16:17:20.000Z

OK so it seems to me that the palette argument is misspecified. From the docs:

A palette function that when called with a numeric vector with values between 0 and 1 returns the corresponding output values.

Which is not what your example palettes are doing. They return a fixed number of colours regardless of the (length of the) input, which violates the 'corresponding' part of the docs.

Answer 3 · 2025-09-24T19:18:52.000Z

Thanks @teunbrand.

Though, the desire here is to have a fixed number of bins (5), each corresponding to one of the 5 colors with fixed breakpoints. Is there a recommended way to set that now that the mapping can't be known? (ie., sometimes it will add an extra bin to the start and end, depending on the relationship between the breaks and the limits)

To expand on that point:

What happens when you pass a character/vctrs vector

If you passed a character vector as palette, it would get processed here:

scaled <- pal[x_binned]

where

      x_binned <- cut(
          x,
          breaks,
          labels = FALSE,
          include.lowest = TRUE,
          right = self$right
        )

This hasn't changed.

Past behavior

The definition of breaks has changed, as it used to pass through something like:

breaks <- self$get_breaks(limits)
breaks <- sort(unique0(c(limits[1], breaks, limits[2])))
#> 0.25 0.35 0.45 0.55 0.65 0.75

and then rescale so:

breaks <- self$rescale(breaks, limits)
#> 0.0 0.2 0.4 0.6 0.8 1.0

4.0.0 behavior

However, it now gives you:

breaks <- self$get_breaks(limits)
breaks <- sort(unique0(c(limits[1], breaks, limits[2])))
#> 0 0.25 0.35 0.45 0.55 0.65 0.75 1

and then rescale so:

breaks <- self$rescale(breaks, limits)
#> -0.5 0.0 0.2 0.4 0.6 0.8 1.0 1.5

which gets passed to cut() producing values of 2:6, instead of 1:5, since the input data is rescaled to be within c(0, 1).

Remaining difficulty

I could write a fixed function (in this one case where I know the relationship between breaks and limits), but not in general.

For example, setting limits here to c(0, 0.75) would now produce "okay" 1:5 values because the lower limit exactly matches the lowest specified break, so an offset of 1 would not fix this.

This also explains the above issue, which is (in this case) fixed by repeating the first number twice.

Answer 4 · 2025-09-30T08:50:32.000Z

I think the solution would have to be to find a palette that satisfies your criteria.
If you know breaks and limits in advance, you can use these the find the correct intervals.

library(ggplot2)
library(ggredist)
data(oregon)

scale_fill_538 <- function(..., right = TRUE) {
  colours <- c("#FA5A50", "#FF998A", "#EAE3EB", "#A1A9ED", "#5768AC")
  breaks <- c(0, 0.35, 0.45, 0.55, 0.65, 1)
  limits <- c(0.25, 0.75)
  ivals <-  scales::rescale(breaks, from = limits)
  force(right)
  ggplot2::binned_scale(
    aesthetics = 'fill',
    palette = function(x) {
      i <- findInterval(x, ivals, all.inside = TRUE, rightmost.closed = right)
      colours[i]
    },
    breaks = breaks,
    limits = limits,
    oob = scales::squish,
    guide = 'colourbar',
    ...
  )
}

oregon |> 
  ggplot(aes(fill = ndv / (ndv + nrv))) +
  geom_sf() +
  scale_fill_538()

If you don't, you may indeed have to repeat the first/last colour and be sure to return a vector of the same length as the input.

scale_fill_538 <- function(...) {
  colours <- c("#FA5A50", "#FF998A", "#EAE3EB", "#A1A9ED", "#5768AC")
  ggplot2::binned_scale(
    aesthetics = 'fill',
    palette = function(x) {
      c(colours[1], colours, colours[length(colours)])[seq_along(x)]
    },
    breaks = c(0, 0.35, 0.45, 0.55, 0.65, 1),
    limits = c(0.25, 0.75),
    oob = scales::squish,
    guide = 'colourbar',
    ...
  )
}

oregon |> 
  ggplot(aes(fill = ndv / (ndv + nrv))) +
  geom_sf() +
  scale_fill_538()

^{Created on 2025-09-30 with reprex v2.1.1}

Answer 5 · 2025-09-30T12:45:57.000Z

Thanks for the follow-up. The first solution looks like a good approach for my case. I'm a bit hesitant with the second, as it depends on returning a longer vector, which seems like (potentially unintended) internal behavior that has changed before without warning.

I appreciate it!