tidymodels/spatialsample

Autoplot for `spatial_block_cv()` with geographic data draws the wrong grid

Closed this issue · 3 comments

The problem

The autoplot method for spatial_block_cv() draws a grid which is completely mis-aligned with the actual splits produced:

set.seed(123); spatial_block_cv(ames_sf, v = 2, method = "continuous") |> autoplot()

image

With method = continuous, these blocks should be alternating colors; they clearly are not.

However, under the hood it appears that folds are being created as expected. For instance, dropping into a debug() session here:

grid_blocks <- sf::st_make_grid(grid_box, ...)

And running plot(sf::st_geometry(data)); plot(grid_blocks, add = TRUE) produces:

image

The grid lines align with the folds, which are assigned from left to right correctly. Based on this grid, the folds are correct.

This means that somewhere in autoplot() the new grid is being drawn incorrectly, and should be updated.

It's worth flagging that this means the default grid being used is very different based on if the data is geographic or projected. For instance, with ames in a projected CRS, we get:

set.seed(123); spatial_block_cv(sf::st_transform(ames_sf, 5070), v = 2, method = "continuous") |> autoplot()

image

This has the behavior I expect from not passing arguments to make_grid: 10 cells in each direction.

This might come down to trying to ensure points are "within" the straight lines of their bounding box while on a spherical surface:

if (sf::st_is_longlat(data)) {

Think we need a few more zeros on that expansion factor. This is what autoplot draws if we expand the bounding box by the same amount of stplanr:

image

So on the one hand, staying on the sunny side, the grid is right and the splits are doing what they're told. On the other, we're telling them to do a silly thing and should probably add two zeros to our expansion factor here 😂

yeah, after changing the expansion factor to add two zeros:

x <- spatial_block_cv(ames_sf); autoplot(x)
image

all(purrr::map_lgl(x$splits, ~ length(.x$in_id) + length(.x$out_id) == nrow(ames))) # TRUE

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.