Overuse of memory in `lsm_c_ed()`
Closed this issue · 6 comments
Hi @mhesselbarth, @bitbacchus, and the rest,
below I try to explain and delve deeper into a problem reported by a landscapemetrics user (by email).
Let's start by downloading and reading our test raster. It is quite large (227 mln cells) and has only two values 0 and 1:
library(raster)
#> Loading required package: sp
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatialecology/landscapemetrics/files/6983925/testraster.tif.zip", tf)
unzip(tf, exdir = td)
my_path = paste0(td, "/testraster.tif")
tr = raster(my_path)
tr
#> class : RasterLayer
#> dimensions : 23325, 9747, 227348775 (nrow, ncol, ncell)
#> resolution : 30, 30 (x, y)
#> extent : 5301720, 5594130, -2255820, -1556070 (xmin, xmax, ymin, ymax)
#> crs : +proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs
#> source : testraster.tif
#> names : testraster
#> values : 0, 1 (min, max)
If you try to use lsm_c_ed()
then you would get an error:
# lsm calculations
c_ed = lsm_c_ed(tr, FALSE, 8)
#> Error: cannot allocate vector of size 151.9 Gb
This error does not happen in most of the other similar functions.
Therefore, I went into a few rabbit holes to find where the problem starts.
Starting with lsm_c_ed()
, I then moved to lsm_c_ed_calc()
, next to lsm_c_te_calc()
, and finally to two internal functions (a) get_patches_int()
and (b) rcpp_get_coocurrence_matrix()
.
The get_patches_int()
function works correctly and returns another matrix with values between 1 and 201906:
landscape = raster::as.matrix(tr)
patches_class = 0 #or 1
landscape_labeled = landscapemetrics:::get_patches_int(landscape,
class = patches_class,
directions = 8)[[1]]
summary(c(landscape_labeled))
The culprit here is the rcpp_get_coocurrence_matrix()
function:
neighbor_matrix = landscapemetrics:::rcpp_get_coocurrence_matrix(landscape_labeled,
directions = as.matrix(4))
#> Error: cannot allocate vector of size 151.9 Gb
It tries to create a huge matrix of 201906 by 201906, and fails.
Looking further at the lsm_c_te_calc()
code, it seems that we do not use the entire co-occurrence matrix, but only its one row or column -
Line 142 in c198289
Now, the question is: how to improve this problem? Do we have an alternative function to use here, or do we need to write a new c++ function, or maybe there is a different solution? What do you think?
Well...We only use the first row because this includes the adjacencies between -999
(all cells not belonging to the current class) and all cells belonging to the current class (labeled 1...n
, where n
= number of patches).
I am not really familiar with the underlying Cpp
code and if it's possible and reasonable to already exclude all non--999
adjacencies within rcpp_get_coocurrence_matrix
already
Well...We only use the first row because this includes the adjacencies between
-999
(all cells not belonging to the current class) and all cells belonging to the current class (labeled1...n
, wheren
= number of patches).
Yes, exactly. We try to create a new object neighbor_matrix
of 201906 rows by 201906 columns, but then only use 201906 rows of 1 column (we use 201,906 values instead of 201,906x201,906=40,766,032,836). If I am correct, we never use the complete neighbor_matrix
object in this process.
I can create a new cpp function (e.g., rcpp_get_coocurrence_matrix_single()
to calculate co-occurrence between one value and all of the rest, which would solve this problem. Or do you (or @bitbacchus) have any better idea?
Yeah that should work I guess
Ok, see a new PR at #241. It should fix this overuse of memory issue in this case. Are there any other functions that could use rcpp_get_coocurrence_matrix_single()
instead of rcpp_get_coocurrence_matrix()
?
EDIT: lsm_p_perim()
is similar
library(raster)
library(landscapemetrics)
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatialecology/landscapemetrics/files/6983925/testraster.tif.zip", tf)
unzip(tf, exdir = td)
my_path = paste0(td, "/testraster.tif")
tr = raster(my_path)
landscape = raster::as.matrix(tr)
patches_class = 0 #or 1
landscape_labeled = landscapemetrics:::get_patches_int(landscape,
class = patches_class,
directions = 8)[[1]]
edge_cells <- which(!is.na(landscape) & landscape != patches_class)
landscape_labeled[edge_cells] <- -999
neighbor_matrix = landscapemetrics:::rcpp_get_coocurrence_matrix(landscape_labeled,
directions = as.matrix(4))
#> Error: cannot allocate vector of size 151.9 Gb
neighbor_matrix2 = landscapemetrics:::rcpp_get_coocurrence_matrix_single(landscape_labeled,
directions = as.matrix(4),
single_class = -999)
# works!
Created on 2021-08-16 by the reprex package (v2.0.1)
Yeah the perimeter should use the same concept. For the total edge on landscape level we actually need the full matrix though.
I also tested the two other datasets and results are identical, which is good, as well. So from my side, we can merge this!
Awesome! Merging it now.