KrishnamurthyLab/spatialEpisim.foundation

Population sum total differences exist between original codebase and foundation

Closed this issue · 5 comments

Original code algorithm

The algorithm the original code follows, as I recall it from our Tuesday Sept. 3rd meeting:

  1. a spatial raster class is instantiated from a TIFF file
  2. it is aggregated, if desired
  3. it is cropped, if desired
  4. it is resampled

Foundation algorithm

The new code-base follows this algorithm, as I recall (being its author):

  1. a spatial raster class is instantiated from a TIFF file
  2. it is cropped, if desired
  3. it is aggregated, if desired

TODO

I need to determine what the largest cause of the population sum total difference is and rectify that. The new and the old code-base shouldn't differ on such a detail.

This test validates that aggregating and cropping does not impact the population of the area(s) of interest.

  • Remove or use the layers object in test of aggregation and cropping of population raster

test_that("Population remains the same after cropping and aggregation", {
layers <- getSVEIRD.SpatRaster(subregionsSpatVector,
susceptibleSpatRaster,
aggregationFactor = 35)
options("geodata_default_path" = "/tmp")
Congo <- getCountryPopulation.SpatRaster("COD")
expect_equal(sum(terra::global(terra::aggregate(Congo, 10, "sum", na.rm = TRUE), "sum", na.rm = TRUE), na.rm = TRUE),
expected = sum(terra::global(Congo, "sum", na.rm = TRUE), na.rm = TRUE))
})

#27 is a duplicate of item 4 in this issue.

I think I'll resolve this issue by adding the ability to toggle between resampling or not.

Various resampling methods were tried in the original codebase, so having a drop-down to select the resampling method when resampling is enabled would be useful.

https://github.com/ashokkrish/spatialEpisim/blob/1b64d4538b5d0e4db14645859a3a600a13eed9bc/R/rasterStack.R#L69

I have done some review of the old code and it appears that terra::resample was only used as part of an effort to crop the raster of the population to the silhouette of the nation under study. It transfers values from a source to a sink spatRaster, and the sink is the population spatRaster with the source being a binary rasterization (the values are either zero or one) of the provincial and national borders of the nation.

In conclusion, I don't need to worry about the fact that resample was used. It was used improperly, more or less, in the original code.