r-spatialecology/landscapemetrics

Overuse of memory in `lsm_c_ed()`

Closed this issue · 6 comments

Hi @mhesselbarth, @bitbacchus, and the rest,
below I try to explain and delve deeper into a problem reported by a landscapemetrics user (by email).

Let's start by downloading and reading our test raster. It is quite large (227 mln cells) and has only two values 0 and 1:

library(raster)
#> Loading required package: sp
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatialecology/landscapemetrics/files/6983925/testraster.tif.zip", tf)
unzip(tf, exdir = td)
my_path = paste0(td, "/testraster.tif")
tr = raster(my_path)
tr
#> class      : RasterLayer 
#> dimensions : 23325, 9747, 227348775  (nrow, ncol, ncell)
#> resolution : 30, 30  (x, y)
#> extent     : 5301720, 5594130, -2255820, -1556070  (xmin, xmax, ymin, ymax)
#> crs        : +proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs 
#> source     : testraster.tif 
#> names      : testraster 
#> values     : 0, 1  (min, max)

If you try to use lsm_c_ed() then you would get an error:

# lsm calculations 
c_ed = lsm_c_ed(tr, FALSE, 8)
#> Error: cannot allocate vector of size 151.9 Gb

This error does not happen in most of the other similar functions.
Therefore, I went into a few rabbit holes to find where the problem starts.

Starting with lsm_c_ed(), I then moved to lsm_c_ed_calc(), next to lsm_c_te_calc(), and finally to two internal functions (a) get_patches_int() and (b) rcpp_get_coocurrence_matrix().

The get_patches_int() function works correctly and returns another matrix with values between 1 and 201906:

landscape = raster::as.matrix(tr)
patches_class = 0 #or 1
landscape_labeled = landscapemetrics:::get_patches_int(landscape,
                                                        class = patches_class,
                                                        directions = 8)[[1]]

summary(c(landscape_labeled))

The culprit here is the rcpp_get_coocurrence_matrix() function:

neighbor_matrix = landscapemetrics:::rcpp_get_coocurrence_matrix(landscape_labeled,
                                               directions = as.matrix(4))
#> Error: cannot allocate vector of size 151.9 Gb

It tries to create a huge matrix of 201906 by 201906, and fails.

Looking further at the lsm_c_te_calc() code, it seems that we do not use the entire co-occurrence matrix, but only its one row or column -

edge_ik <- (sum(neighbor_matrix[2:ncol(neighbor_matrix), 1])) * resolution_x
.

Now, the question is: how to improve this problem? Do we have an alternative function to use here, or do we need to write a new c++ function, or maybe there is a different solution? What do you think?

testraster.tif.zip

Well...We only use the first row because this includes the adjacencies between -999 (all cells not belonging to the current class) and all cells belonging to the current class (labeled 1...n, where n = number of patches).

I am not really familiar with the underlying Cpp code and if it's possible and reasonable to already exclude all non--999 adjacencies within rcpp_get_coocurrence_matrix already

Well...We only use the first row because this includes the adjacencies between -999 (all cells not belonging to the current class) and all cells belonging to the current class (labeled 1...n, where n = number of patches).

Yes, exactly. We try to create a new object neighbor_matrix of 201906 rows by 201906 columns, but then only use 201906 rows of 1 column (we use 201,906 values instead of 201,906x201,906=40,766,032,836). If I am correct, we never use the complete neighbor_matrix object in this process.

I can create a new cpp function (e.g., rcpp_get_coocurrence_matrix_single() to calculate co-occurrence between one value and all of the rest, which would solve this problem. Or do you (or @bitbacchus) have any better idea?

Yeah that should work I guess

Ok, see a new PR at #241. It should fix this overuse of memory issue in this case. Are there any other functions that could use rcpp_get_coocurrence_matrix_single() instead of rcpp_get_coocurrence_matrix()?

EDIT: lsm_p_perim() is similar

library(raster)
library(landscapemetrics)
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatialecology/landscapemetrics/files/6983925/testraster.tif.zip", tf)
unzip(tf, exdir = td)
my_path = paste0(td, "/testraster.tif")
tr = raster(my_path)

landscape = raster::as.matrix(tr)
patches_class = 0 #or 1


landscape_labeled = landscapemetrics:::get_patches_int(landscape,
                                                       class = patches_class,
                                                       directions = 8)[[1]]

edge_cells <- which(!is.na(landscape) & landscape != patches_class)

landscape_labeled[edge_cells] <- -999

neighbor_matrix = landscapemetrics:::rcpp_get_coocurrence_matrix(landscape_labeled,
                                                                 directions = as.matrix(4))
#> Error: cannot allocate vector of size 151.9 Gb

neighbor_matrix2 = landscapemetrics:::rcpp_get_coocurrence_matrix_single(landscape_labeled,
                                                                         directions = as.matrix(4),
                                                                         single_class = -999)

# works!

Created on 2021-08-16 by the reprex package (v2.0.1)

Yeah the perimeter should use the same concept. For the total edge on landscape level we actually need the full matrix though.

I also tested the two other datasets and results are identical, which is good, as well. So from my side, we can merge this!

Awesome! Merging it now.