Check if the future.apply package can be useful

Question

Check if the future.apply package can be useful

Closed this issue 5 years ago · 12 comments

Nowosad commented 6 years ago

https://github.com/HenrikBengtsson/future.apply

Nowosad commented 6 years ago

#106

Answer 1 · 2019-04-12T10:50:49.000Z

Could be possible for calculat_lsm() without too much struggle.

However, I would argue for a global option with a FALSE default value. I once read a blog post arguing against a default internal parallelization in packages because we never know how the users apply the package. If we internally already request cores and the user does the same wrapped around our function, this could be a problem.

Answer 2 · 2019-04-12T11:00:03.000Z

Definetely! I plan to add this package to the next version of the landscapemetrics package (1.2?).

Answer 3 · 2019-07-15T05:14:59.000Z

I think we all agree that we rather write a vignette how to do it in parallel (maybe using different packages, e.g. future, clustermq, etc.). Doing that, we don't have to add another dependency and users can decide which parallel package they prefer. Using list_lsm(simplify = TRUE) this should be straightforward.

Answer 4 · 2021-02-24T13:34:51.000Z

Hi, was there ever a vignette written covering a "how to" for parallel computing in landscapemetrics? Would be highly interested!

Answer 5 · 2021-02-24T15:34:51.000Z

Hey,

unfortunately not, sorry. However, it is still on our to-do list for the nearer future (hopefully...). We will post any updates in issue #112

Answer 6 · 2021-02-24T15:48:28.000Z

But we can surely give hints when you are stuck somewhere. What are you trying to do?

Answer 7 · 2021-02-25T09:18:54.000Z

I was trying to calculate some metrics using a moving window with window_lsm(). With a 3x3 focal matrix and a one-class raster with ~350'000 cells to actually analyze (total >10 million cells of which most are NA) the function took forever and I had to abort it. Thus, I was wondering if parallelization could help speed up the process. I read in the other issues that for other landscapemetrics functions this could be done using futures, however I'm not very experienced with this backend.

Answer 8 · 2021-02-25T12:59:18.000Z

You can also use any other backend with which you have more experience.

I think there is no straightforward way to paralyze the single "windows", but if you calculate several metrics, you could parallelize across the metrics. So not specifying several metrics in the window_lsm function call but create a vector/list with all metrics and always give only one metrics to window_lsm in parallel.

Answer 9 · 2021-02-25T13:10:23.000Z

library(future)
library(future.apply)
library(landscapemetrics)


# create vector with metrics
subset_metrics <- landscapemetrics::list_lsm(level = "landscape", 
                                             type = "diversity metric", 
                                             simplify = TRUE)

# create window
window_mat <- matrix(1, nrow = 5,ncol = 5)

# setup future plan for parallel computing
future::plan(future::multisession)

# calculate each metric in parallel 
result <- future.apply::future_lapply(X = subset_metrics, FUN = function(i) {
  
  # run window_lsm and simplifiy result; 
  # 1st list level: number of layers, 2nd list level: number of metrics
  window_lsm(landscape, window = window_mat, what = i)[[1]][[1]]

}, future.seed = TRUE)
#> Warning: No maximum number of classes provided: RPR = NA
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf

# not sure why the future generates a random number? Maybe in raster::window()?

# set names
names(result) <- subset_metrics

^{Created on 2021-02-25 by the reprex package (v1.0.0)}

Answer 10 · 2021-02-26T12:51:00.000Z

Thanks for clarifying and whipping up some example code!

I guess I was more looking for a way of speeding up the function itself, since only using one metric already took a long time. I assume using a different format than raster would not help, since the underlying function is focal and requires a raster input, right?

Strangely enough, I ran the same analysis (3x3 window for effective mesh size) with landmetrics::focal.lmetrics (which seems to be somehow related to this package?) and it finished within 20 minutes. window_lsm(pfti_proj, window = matrix(1, nrow = 3, ncol = 3), what =c("lsm_l_mesh"),progress=TRUE) did not finish after 3 hours...

Answer 11 · 2021-02-26T13:35:56.000Z

I wasn't aware of landmetrics, but if their code is better than ours, we should get in contact and maybe borrow their code. I'll have a look. Feel invited to join #224 @lukasbaumbach .