`util_extract_multibuffer` as a separate function

Question

`util_extract_multibuffer` as a separate function

Opened this issue 3 years ago · 10 comments

Hi,

discussing with @Nowosad by email, I was looking for a function in which I can calculate a more generic function based on raster values for multiple buffers around points. He pointed me to the show_shareplot() function.

So I started to build upon that. Currently it uses internally defined functions (.extract_multibuffer, that calls .share()) to create a data.frame (tibble, indeed) and plot it. Alternatively, it can also return the data.frame if return_df = TRUE.

I propose this internal function could be a separate function, in which one can calculate the landscape share (freq. of raster values). I extended this a little to allow (so far) the direct calculation of the relative frequency of raster values within each buffer (if rel_freq = TRUE) and a function (any, given by the user) of these values within each buffer (if fun is given).
I also made it more flexible the way the series of buffer sizes is created - now a vector of values can be passed to the buffer_width, and they are used if max_width = NULL. As an example, this might be useful when one wants to calculate the share from 250m to 50 km, but not for all increments of 250m, which can be quite computationally demanding.
I've put this as new function called util_extract_multibuffer(), which is found in the branch multibuffer in my fork of the repo, here: https://github.com/bniebuhr/landscapetools/tree/multibuffer
From my experience in landscape ecology, I think this can be quite useful, beyond just calculating the number or proportion of each raster class - and landscapetools a good place for that.

Based on that, a possibility is to set a new argument for this calculated data.frame to be a parameter in the show_shareplot function - in case this data.frame was already calculated through util_extract_multibuffer. I have also implemented that. So the main thing here would be to decouple both functions - show_shareplot() and the internal ones it uses - called here util_extract_buffer.

However, in this last case, if the output data.frame was calculated based on a function (parameter fun) instead of the "raw" landscape share, a different type of plot should be done. So maybe new function would be required here, instead of show_shareplot().

So I write this issue to raise this idea - you may test it through the branch multibuffer in my fork, and we can develop from that, if you think it is worth it.

Answer 1 · 2022-02-15T16:37:55.000Z

@bitbacchus @mhesselbarth @mattfrit

Answer 2 · 2022-03-07T11:26:46.000Z

I added a PR based on your branch (it is easier to see your changes) - #36.
@bniebuhr can you create a reprex with an example or two of your new code?

Answer 3 · 2022-03-07T12:35:43.000Z

Nice. I had added a few examples under @examples in the roxygen documentation but I just found a bug, so I'll work on that and commit it, and I come back with more comprehensive examples soon.

PS: just to understand the best practices, would it be better to make a PR to start the discussion then, apart from the issue?

Answer 4 · 2022-03-07T13:19:34.000Z

I do not know the best practices. For me it is easier to understand changes when there is: a) a code comparison (PR) and b) a set of examples (reprex).

Answer 5 · 2022-03-28T08:31:54.000Z

Note that the NAMESPACE doesn't list util_extract_multibuffer() yet so examples using it fail as it's not available. Could you please run document()?

Answer 6 · 2022-03-28T09:43:11.000Z

Hi,

Sorry about the delay in woking on the examples. I made another PR with further examples and a correction of a bug - please see it here: #39
I have also included the update documentation for these two functions in the PR.

Answer 7 · 2022-03-29T11:55:23.000Z

Can we close this then?

Answer 8 · 2022-03-29T13:32:14.000Z

I think there is still room for some improvement/discussion. Not sure if this should come in a separate issue, but I'll keep it here so far.

One thing is how the output of the function looks like. Testing it now, we have the following for the calculation of amount of land use classes in a classified landscape:

library(landscapetools)

# create single point
new_point = matrix(c(75,75), ncol = 2)

# show landscape and point of interest
show_landscape(classified_landscape, discrete = TRUE) +
ggplot2::geom_point(data = data.frame(x = new_point[,1], y = new_point[,2]),
                    ggplot2::aes(x = x, y = y),
                    col = "grey", size = 3)
#> Loading required package: raster
#> Loading required package: sp

# extract frequency of each pixel value within each buffer from 10 to 50 m width
util_extract_multibuffer(classified_landscape, new_point, 10, 50)
#> # A tibble: 14 x 4
#>    id          layer  freq buffer
#>    <chr>       <fct> <int>  <dbl>
#>  1 Point ID: 1 1        80     10
#>  2 Point ID: 1 2       236     10
#>  3 Point ID: 1 1       292     20
#>  4 Point ID: 1 2       964     20
#>  5 Point ID: 1 3         8     20
#>  6 Point ID: 1 1       484     30
#>  7 Point ID: 1 2      1898     30
#>  8 Point ID: 1 3       446     30
#>  9 Point ID: 1 1       825     40
#> 10 Point ID: 1 2      2810     40
#> 11 Point ID: 1 3      1389     40
#> 12 Point ID: 1 1      1362     50
#> 13 Point ID: 1 2      3750     50
#> 14 Point ID: 1 3      2748     50

^{Created on 2022-03-29 by the reprex package (v2.0.1)}

Regarding this output, I have two comments.

First, when there is no pixels of a given class within a given buffer (e.g. no pixels of land use class 3 in the buffer with 10 m radius around this point), we do not have a new line in the output table with layer = 3 and freq = 0. Do you think having such an output would make it easier for some specific analyses (i.e. including all classes that are present in any of the buffers in the output, and filling it with 0 if it is absent from the buffer)?
Second, I think "layer" is not really the best name for this output column (even though it is the default of raster::extract() used within the function). Maybe "class" would be better?

A second thing is how the function deals when functions are used within the buffer (instead of just counting absolute or relative frequency of pixels in each class). For example, let's say I want to take the most common land use class within each buffer size (mode function taken from here). I can do the following:

getmode <- function(v) {
    uniqv <- unique(v)
    uniqv[which.max(tabulate(match(v, uniqv)))]
}

# use a given function - e.g. mode in each buffer width
util_extract_multibuffer(classified_landscape, new_point, 10, 50, fun = getmode)
#> # A tibble: 5 x 4
#>   id          layer `function (v) \n{\n    uniqv <- unique(v)\n    uniq~` buffer
#>   <chr>       <fct>                                                 <int>  <dbl>
#> 1 Point ID: 1 2                                                         1     10
#> 2 Point ID: 1 2                                                         1     20
#> 3 Point ID: 1 2                                                         1     30
#> 4 Point ID: 1 2                                                         1     40
#> 5 Point ID: 1 2                                                         1     50

^{Created on 2022-03-29 by the reprex package (v2.0.1)}

In this case, here comes a few comments:

Obviously there is an issue with the name of the column, we should correct that. The idea was to take the name of the function but it seems I did it wrong. Maybe we can just change to correctly take the name of the function? Or having a parameter for the user to determine this name?
The other point is that here the column layer is the one which really represents what we want, and the other one does not mean anything. So we need to change the output when we use a function here. My suggestion is just to have Point ID, function name (e.g. getmode), value (the real value we are interested in), and buffer. Does it make sense? That would be much more appropriate for contiuous landscape maps as well (e.g. calculating some function such as the mean or weighted mean within a buffer).

I can implement all that and make a new PR but any thoughts are welcome.

Answer 9 · 2022-03-29T14:25:26.000Z

Just to complement, I guess when I deal with the last point, it will be more meaningful to use that for continuous maps as well, see example below:

library(landscapetools)

# create single point
new_point = matrix(c(75,75), ncol = 2)

# show landscape and point of interest
show_landscape(fractal_landscape) +
ggplot2::geom_point(data = data.frame(x = new_point[,1], y = new_point[,2]),
                    ggplot2::aes(x = x, y = y),
                    col = "grey", size = 3)
#> Loading required package: raster
#> Loading required package: sp

# use a given function - e.g. median in each buffer width
util_extract_multibuffer(fractal_landscape, new_point, 10, 50, fun = "mean")
#> # A tibble: 5 x 4
#>   id          layer              mean buffer
#>   <chr>       <fct>             <int>  <dbl>
#> 1 Point ID: 1 0.466132288392125     1     10
#> 2 Point ID: 1 0.475467530484216     1     20
#> 3 Point ID: 1 0.520185362184868     1     30
#> 4 Point ID: 1 0.547060865215994     1     40
#> 5 Point ID: 1 0.561445545416456     1     50

^{Created on 2022-03-29 by the reprex package (v2.0.1)}

What we want is the value (currently in the layer column).

Answer 10 · 2022-03-30T12:23:25.000Z

Hi @bniebuhr, my thinking about your ideas:

I think it would be nice to have 0 for non-existing classes (for the given buffer)
The class name is fine (for that example)
Regarding the use of custom functions -- it is possible to use an anonymous function here. Therefore, how about just naming this column with generic fun?
Custom functions may have different outputs compared to the default fun = NULL... Could you make a list of the most common (in your opinion) use of the custom functions? Then we could try to think about the best solution...