r-spatialecology/landscapemetrics

Patch centroids

Closed this issue · 15 comments

In the function Gyrate, the centroid is based on cell center-to-cell center distances.

It would be great if a separate function could be created to get the patch centroid of each patch and force the centroid to be within the patch (i.e. the closest raster cell to calculated mean coordinate).

        classes <- get_unique_values(landscape)[[1]]

        # get connected patches
        landscape_labeled <- get_patches(landscape,
                                         class = patches_class,
                                         directions = directions,
                                         return_raster = FALSE)[[1]]

        # transpose to get same direction of ID
        landscape_labeled <- t(landscape_labeled)

        # get coordinates of current class
        points <- matrix(points[which(!is.na(landscape_labeled)), ],
                         ncol = 3)

        # set ID from class ID to unique patch ID
        points[, 3] <- landscape_labeled[!is.na(landscape_labeled)]

        # conver to tibble -> do we still need to do this?
        points <- tibble::as_tibble(points)
        names(points) <- c("x", "y", "id")

        # calculate the centroid of each patch (mean of all coords)
        centroid <- stats::aggregate(points[, c(1, 2)],
                                     by = list(id = points$id),
                                     FUN = mean)

        #make new raster with only cell centroids
        #search for closest raster cells of coordinate list and set all other cells to NA

        #this could be based on the raster function
        dist <- distanceFromPoints(object, centroid)

        #make overlay and set all values that are not patches to NA
        centroid_dist <- overlay(landscape_labeled , dist, fun = function(x, dist) {
          landscape_labeled[!is.na(dist[])] <- NA
          return(x)
        })

Then it would be needed to get the minimum value for each patch of the rst1_mod and only keep that raster cell and give it the value of the landscape_labeled raster.

Not sure it would be efficient this way but now there is no other way than to transform the raster to polygon...

Hey there,

I'm not 100% sure if I get your idea correctly. The centroid is always within the patch, it's just not necessarily the cell centre of a patch. So what you want are the coordinates of the cell centre (and/or the cell id maybe) closest to the centroid coordinates?

Wait...your are correct. The centroid can actually be outside the patch. I think I get what you mean. I will think about how to do that.

I tried to make an example of what I think you mean:

Centroid

But there are potentially more than one patches "nearest", e.g. in @mhesselbarth 's example three. How do you decide which to choose?

Thank you for your help!

The drawing is exactly what I mean! This problem is related with preparing a large amount of input rasters for connectivity analysis in circuitscape. 3 inputs are needed and vectorizing the rasters on a continental scale is very inefficient in R.

  • calculate patches with clump (raster package) => short circuits for dispersal
  • calculate inner centroid => important that the centroid is within the patch for analysis
  • calculate nearest neighbour pairs. For this it would probably be needed to convert patches with the boundaries function (outer) and calculate the nearest neighbour of each leftover raster cell. The minimum distance to a patch with another ID would then be a "pair". The functionality would be related to the arcgis Near table function. As I have 436945 patches per raster, I am not sure it would be feasible this way.

@bitbacchus the example of @mhesselbarth would be 1 patch with clump. If distances are the same to multiple patches than it could either return all options or just take a random pair based on a "min" distance function.

@bitbacchus:

But there are potentially more than one patches "nearest", e.g. in @mhesselbarth 's example three. How do you decide which to choose?

I don't get it, sorry. In my example, there is only one patch and that patch has one centroid (mean location of all cells). Now, there should be in this example only one cell of the patch again that is the nearest to the mean location and this cell could be defined as within-patch centroid.

However, there might be cases in which several cells have the same distance to the mean location, e.g. for a square with a "hole" in the middle. In this case, the centroid would be exactly in the middle and all cells would have the same distance to it.

@frederikvand:
Are you aware of the get_patches() function? Should give you the same output as clumps but is faster and easier to use.

Thanks for the suggestion! I will try the get_patches function out.
I think what @bitbacchus means is that there would be rare occasions that the exact distance from the red coordinate (mean centroid) would the same to multiple blue (within patch) coordinates.
Allthough, if the red coordinate will be very precise (many digits), I think this would be rare.
image

Yes, that also what I tried to explain above. My example was a perfect square in which the distances actually would be exactly the same (sorry for the bad drawing 😄):

Centroid_square

But since I think the return would be a tibble anyways, we could simply return all cells with the same distance and a warning.

46960bc

Okay...I came up with a function called get_centroid() which returns a tibble with the layer id, patch id, class id and the coordinates of the centroid.

To use the function, you need to install the development branch using remotes::install_github("r-spatialecology/landscapemetrics", ref = "gyrate_inside_patch").

The function has an argument that allows to force the centroid to be the cell-center of a patch and by that to be inside the patch. However, because the cell-center centroid is defined as the cell-center with the shortest distance to the mean location of all cells (normally this is used as the centroid), this has the problem that for some patches, there are several cells the possible cell-center centroid. In such cases, all possible cells are returned as well as a warning. I tried to show this behaviour with the plot below. You can see that for some patches, there are several centroids for cell_center = TRUE (indicated by the points).

Please have a look at the function and see if that is what you are looking for. But, please be aware that I didn't test the function yet. So there still might be bugs or computational issues.

library(landscapemetrics)
library(ggplot2)

centroids_mean <- get_centroid(landscape, cell_center = FALSE)
centroids_cell <- get_centroid(landscape, cell_center = TRUE)
#> Warning: For some patches several cell centers are returned as centroid.

ggplot(data = raster::as.data.frame(landscape, xy = TRUE)) + 
    geom_raster(aes(x = x, y = y, fill = factor(clumps))) + 
    geom_point(data = centroids_mean, 
               aes(x = x, y = y, shape = factor(class), col = "Mean centroid")) + 
    geom_point(data = centroids_cell, 
               aes(x = x, y = y, shape = factor(class), col = "Cell centroid")) + 
    coord_equal() + 
    scale_fill_viridis_d(option = "A") + 
    scale_shape_manual(values = c(1,2,3)) +
    theme_dark()

Created on 2020-05-26 by the reprex package (v0.3.0)

84c7464

Also included for all gyrate metric functions.

@mhesselbarth Thank you for the great function. The functionality looks very promising.

However when I try to install the remote branch with remotes::install_github("r-spatialecology/landscapemetrics", ref = "gyrate_inside_patch"), I get Error: Could not find tools necessary to compile a package Call pkgbuild::check_build_tools(debug = TRUE) to diagnose the problem. And running the debug utility gives the exact same error. Do you know a solution?

What operation system are you using? Also, please try to update devtools and/or remotes.

Thank you kindly for all the help! It worked after I suppressed the error with options(buildtools.check = function(action) TRUE ) on windows and it worked directly with centos 7 on the cluster!

#some questions for the workflow to get patches, centroids and pairs with NN distances:

  1. Would be great to include a return_raster function similar to the get patches function in the centroid function!
  2. get_nearestneighbour doesn't return the Near patch ID. This is necessary to be useful to compare to resistance and genetic distance. Could this be changed?
  3. The workflow took approx. 450 seconds for a 1 million cell raster. As my workflow needs to be repeated many times on 2 billion cell rasters, I am not sure this is a feasible replacement for arcgis yet.

A small example for benchmarking with a raster of 1 million cells:
#step 1: get patches
system.time({PE_test_patch_ID <- get_patches(
Pe_small,
class = "all",
directions = 8,
#to_disk = getOption("to_disk", default = FALSE),
return_raster = TRUE
)})
user system elapsed
0.09 0.00 0.10

#step 2: get centroids and rasterize
system.time({centroids_cell <- get_centroid(PE_test_patch_ID, cell_center = TRUE)})
user system elapsed
148.78 22.75 172.06
Warning messages:
1: For some patches several cell centers are returned as centroid.
2: For some patches several cell centers are returned as centroid.

#remove duplicate centroids
library(dplyr)
centroids_cell_uni <- centroids_cell %>% group_by(id) %>% filter (! duplicated(id))

#convert to raster
center_matrix <- as.data.frame(centroids_cell_uni[,4:6])
coordinates(center_matrix) <- ~ x + y
crs(center_matrix) <-crs(PE_test_patch_ID[[2]])

centroid_raster <- rasterize(center_matrix, PE_test_patch_ID[[2]], field = "id", fun="first")

system.time({patch_distance <- get_nearestneighbour(PE_test_patch_ID)})
user system elapsed
246.16 0.06 246.52

  1. I think it would make more sense to allow to return sp SpatialPoints. That would be probably more useful for most people.

  2. This is a great idea. I will see how hard that would be to implement.

  3. We always try to increase the computational speed whenever we find some issues. However, at this point, I wouldn't expect any significant improvements in the nearer future I fear. But I had a look at your code and I think a) there is a faster way to do it (~5x faster, see attached code) and b) I also think there is a bug in your example somewhere. You drop too many centroids. There are 27 patches in the landscape, so you should also have 27 centroids at the end (using the example landscape).

library(bench)
library(dplyr)
library(landscapemetrics)
library(raster)

foo_a <- function(){
    
    #step 1: get patches
    PE_test_patch_ID <- get_patches(landscape, class = "all", directions = 8)

    #step 2: get centroids and rasterize
    centroids_cell <- get_centroids(PE_test_patch_ID, cell_center = TRUE, 
                                    verbose = FALSE)
    
    #step 3: remove duplicate centroids
    centroids_cell_uni <- centroids_cell %>% group_by(id) %>% filter(!duplicated(id))
    
    #step 4: convert to raster
    center_matrix <- as.data.frame(centroids_cell_uni[,4:6])
    coordinates(center_matrix) <- ~ x + y
    crs(center_matrix) <-crs(PE_test_patch_ID[[2]])
    centroid_raster <- rasterize(center_matrix, PE_test_patch_ID[[2]], field = "id", fun="first")

    return(centroid_raster)
    }

foo_b <- function(){
    centroids_cell <- get_centroids(landscape, cell_center = TRUE, 
                                    verbose = FALSE) %>% 
        dplyr::distinct(id, .keep_all = TRUE) %>% 
        dplyr::select(x, y, id) %>% 
        raster::rasterFromXYZ()
    return(centroids_cell)
}

bench::mark(
foo_a(),
foo_b(),
iterations = 250, relative = TRUE, check = FALSE)
#> # A tibble: 2 x 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 foo_a()     5.02   5.02      1         3.10     1   
#> 2 foo_b()     1      1         4.98      1        1.01

Created on 2020-05-27 by the reprex package (v0.3.0)

Thank you for the debug!

On my test case with 2033 patches it gave an improvement of 10 seconds (141.68 vs 151.12)!

Do you want me to close this issue and submit a new request for get_nearestneighbour to return the Near patch ID? Or will you submit a pull request if you find a solution for that one?

I will open a new issue and close this one.

Thanks