rspatial/terra

terra::union() cannot seem to correctly interpret projection, spatial envelopes, generates incorrect output

Closed this issue · 4 comments

I have two SpatVects that raster::compareCRS() tells me have the same projection. I read them into R using terra::vect() from a geodatabase (.gdb).

I want to merge/combine the two SpatVects into one SpatVect.

The issue - when I run terra::union(myspatvect1, myspatvect2) the layers end up in the wrong place on the globe.

Some additional context /sleuthing - If I read the exact same layers from the GDB's into ArcPro and merge, they show up in the correct locations. If I plot each SpatVect individually on a world map in R, they also show up in the correct locations. If I reproject using terra::reproject() and then run terra::union() on the re-projected layers, they end up in the correct locations. However this last option is not a viable path forward because R crashes trying to reproject on the largest of the layers that I need to work with in the end (outside of my test workflow).

The two test datasets can be downloaded as GDB from https://www.fws.gov/program/national-wetlands-inventory/download-state-wetlands-data - "Hawaii" and "Puerto Rico and Virgin Islands".

#read in GDBs as spatvects
hi <- terra::vect("/Users/rachel/HI_geodatabase_wetlands.gdb/", layer = "HI_Wetlands")
prvi <- terra::vect("/Users/rachel/PRVI_geodatabase_wetlands.gdb/", layer = "PRVI_Wetlands")

#check projection
raster::compareCRS(hi, prvi)

#plot - warning, slow and high mem, but HI and PRIV where they should be on globe
world <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
ggplot() + 
  geom_sf(data = world) +
  geom_sf(data = prvi, fill = "purple", colour = "purple") +
  geom_sf(data = hi, fill = "blue", colour = "blue")

#combine into one spatvect
wetlands <- terra::union(hi,priv)

#look at output - HI and PRVI end up at essentially same longitude aka incorrect
plot(wetlands)

Both datasets have a CRS that is displayed as "NAD_1983_Albers" but that does not make them the same (see the code below). Most projections come with a number of parameters that are used to indicate the geographic position of the data to minimize distortion. In this case, one dataset is centered on Puerto Rico (lon_0=-66), the other on Hawaii (+lon_0=-157).

library(terra)
# using the outlines for speed
pr <- vect("PRVI_geodatabase_wetlands.gdb", "PuertoRico")
hi <- vect("HI_geodatabase_wetlands.gdb", "Hawaii")

crs(pr, proj=TRUE)
#[1] "+proj=aea +lat_0=3 +lon_0=-66 +lat_1=8 +lat_2=18 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs"
crs(hi, proj=TRUE)
#[1] "+proj=aea +lat_0=3 +lon_0=-157 +lat_1=8 +lat_2=18 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs"

So to combine them you need to transform them to the same CRS. Here I use lon/lat,

pr2 <- project(pr, "+proj=longlat") 
hi2 <- project(hi, "+proj=longlat") 

And now use rbind

prhi <- rbind(pr2, hi2)
plot(prhi)

this last option is not a viable path forward because R crashes trying to reproject on the largest of the layers that I need to work with in the end (outside of my test workflow).

If the problem is with terra::union then you can avoid it by using terra::rbind which has a much lower footprint.

Thanks. I suspected the issue was with the different central meridians. I seem to be stuck then - because project() crashes (with 128GB RAM) on the largest of the layers that I would need to reproject (before doing a union or rbind). ArcPro is smart enough to merge/combine the above layers correctly without any reprojecting. I was hopeful there was a way to make rbind or union behave the same.

As a potential (less streamlined/elegant) workaround, I reprojected all of my layers in ArcPro (which uses sig. less memory to reproject than terra, so no crashes) and then tried to read back into R using terra::vect() as above. Three of the layers work fine. One, the largest, now throws the error -
Error occurred in filegdbtable.cpp at line 1379 (GDAL error 1)
when I run terra::vect().

Advice on how to investigate this error may get me a step further, assuming the only way to join layers in this case is to reproject them first.

I can look into making terra::project more memory safe. Which file is it that causes the trouble? ArcXXX tends to things like reprojection automatically, but I am cautious with that, because it is often not obvious how that should be done. In this case, we would not know which output CRS you want.

Thanks for reply. Yes, makes sense about "not often obvious" what should be done if terra::union were recoded to do more decision making on its own. Although, it seems like current V of terra::union should be edited to at least print a warning - or error out - if you combine multiple layers that don't have identical projections (as I was doing. the only way I knew something was wrong was by plotting / visualizing merged layer).

Unfortunately I cannot publicly share the biggest file - that runs out of memory with terra::project - and has the "Error occurred in filegdbtable.cpp" error after reprojecting in ArcPro and trying to read back in. (File is ~30GB zipped. I can check with team that manages those data if I can share directly with you for debugging purposes. Email me directly if you'd like to go that route. rhtoczydlowski (at) gmail (dot) com.) Thanks!