Check if the future.apply package can be useful
Closed this issue · 12 comments
Could be possible for calculat_lsm()
without too much struggle.
However, I would argue for a global option with a FALSE
default value. I once read a blog post arguing against a default internal parallelization in packages because we never know how the users apply the package. If we internally already request cores and the user does the same wrapped around our function, this could be a problem.
Definetely! I plan to add this package to the next version of the landscapemetrics package (1.2?).
I think we all agree that we rather write a vignette how to do it in parallel (maybe using different packages, e.g. future, clustermq, etc.). Doing that, we don't have to add another dependency and users can decide which parallel package they prefer. Using list_lsm(simplify = TRUE)
this should be straightforward.
Hi, was there ever a vignette written covering a "how to" for parallel computing in landscapemetrics? Would be highly interested!
Hey,
unfortunately not, sorry. However, it is still on our to-do list for the nearer future (hopefully...). We will post any updates in issue #112
But we can surely give hints when you are stuck somewhere. What are you trying to do?
I was trying to calculate some metrics using a moving window with window_lsm()
. With a 3x3 focal matrix and a one-class raster with ~350'000 cells to actually analyze (total >10 million cells of which most are NA) the function took forever and I had to abort it. Thus, I was wondering if parallelization could help speed up the process. I read in the other issues that for other landscapemetrics functions this could be done using futures, however I'm not very experienced with this backend.
You can also use any other backend with which you have more experience.
I think there is no straightforward way to paralyze the single "windows", but if you calculate several metrics, you could parallelize across the metrics. So not specifying several metrics in the window_lsm
function call but create a vector/list with all metrics and always give only one metrics to window_lsm
in parallel.
library(future)
library(future.apply)
library(landscapemetrics)
# create vector with metrics
subset_metrics <- landscapemetrics::list_lsm(level = "landscape",
type = "diversity metric",
simplify = TRUE)
# create window
window_mat <- matrix(1, nrow = 5,ncol = 5)
# setup future plan for parallel computing
future::plan(future::multisession)
# calculate each metric in parallel
result <- future.apply::future_lapply(X = subset_metrics, FUN = function(i) {
# run window_lsm and simplifiy result;
# 1st list level: number of layers, 2nd list level: number of metrics
window_lsm(landscape, window = window_mat, what = i)[[1]][[1]]
}, future.seed = TRUE)
#> Warning: No maximum number of classes provided: RPR = NA
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
# not sure why the future generates a random number? Maybe in raster::window()?
# set names
names(result) <- subset_metrics
Created on 2021-02-25 by the reprex package (v1.0.0)
Thanks for clarifying and whipping up some example code!
I guess I was more looking for a way of speeding up the function itself, since only using one metric already took a long time. I assume using a different format than raster would not help, since the underlying function is focal
and requires a raster input, right?
Strangely enough, I ran the same analysis (3x3 window for effective mesh size) with landmetrics::focal.lmetrics
(which seems to be somehow related to this package?) and it finished within 20 minutes. window_lsm(pfti_proj, window = matrix(1, nrow = 3, ncol = 3), what =c("lsm_l_mesh"),progress=TRUE)
did not finish after 3 hours...
I wasn't aware of landmetrics, but if their code is better than ours, we should get in contact and maybe borrow their code. I'll have a look. Feel invited to join #224 @lukasbaumbach .