morrowcj/remotePARTS

Default output of fitCor is massive

Closed this issue · 3 comments

UPDATE

See comments below


Problem

Because I elected to save the fitted nls object (as $mod) in the output of fitCor, the results can be quite large. I was unaware of how big nls objects are.

Solution

The solution is to add a save_mod argument to fitCor(). Setting it to FALSE will save tons of memeory for large datasets.

Even with save_mod = FALSE, fitCor() is failing because it fails to allocate an enormous vector:

> fitCor(resids = temporal_residuals, coords = as.matrix(data_raw[, coord_cols]),
>        distm_FUN = "distm_km", start = list(r = 1000), fit.n = 3000, save_mod = FALSE)

Error: cannot allocate vector of size 62347.8 Gb 

even though the data is not particularly large:

> format(object.size(data_raw), units = "MB")
[1] "275.9 Mb"

> format(object.size(temporal_residuals), units = "MB")
[1] "441.4 Mb"

And this doesn't happen when the data is smaller (even though fit.n remains constant). This has to be an actual bug.

Problem

I believe I've found the problem. Lines 142-147 of R/fitCor.R fits a distance matrix to the full dataset. Which is impossible for very large datasets. I have no idea what I was thinking.

Solution?

Obviously, we need to remove this enormous distance matrix. It exists only to get the maximum distance between two points on the map, which is used to re-scale the spatial parameters. Perhaps it is simply enough to use an estimate of the max distance of the full data by actually calculating the max distance between the sampled points (as determined by fit.n). I don't know if that is sufficient, though.

Update:

I can build a pairwise function to find the max distance, as found in this example