r-spatial/sf

PROJ_LIB data conflicts?

Closed this issue ยท 11 comments

I added a new build of rwinlib/proj version 6.1.0. I noticed this version switches to a new format for the projdata files, which is now based on sqlite.

I am worried that if you use a global environment variable PROJ_LIB to set the path to the projdata, different gdal packages end up conflicting with each other, because they all set PROJ_LIB to the data that ships with that package, which might be different data from another pkg.

If one of these packages was compiled for proj4, and another one has proj6 data, the first package can probably no longer find the appropriate data after the second has been loaded? We have similar problems with other packages that require data files such as fontconfig.

If there is a libproj C api to set the data path instead of an environment variable, that would be much better.

edzer commented

Thanks for picking this up, @jeroen!

The current approach (for both rgdal, @rsbivand, and sf; also PROJ by @mdsumner ?) is that on load, if the package comes with the datum files, PROJ_LIB is set, and it is set back to the value at startup on unload. This is consistent with the assumption that a binary package comes with everything: static libraries as well as config (datum, EPSG) files, but inconsistent with a user wanting to set up their own library. It will also be robust against other packages setting it to an out-of-date value (e.g. older rgdal install setting it to PROJ 4, sf setting it to PROJ 6, or vice versa) but will cause havoc for functions in that other (first loaded) library.

I don't think that PROJ has another lib setting interface, and am not so optimistic installations with 2 incompatible proj versions will work, ever. Will discuss with @rsbivand shortly.

I think what you need is ask @rouault to add an API for specifying the location of the projdata directory, rather than having to rely on the global PROJ_LIB environment variable.

Thereby each package could use data from a package-specific location.

edzer commented

It looks like PROJ has such an interface, here: https://github.com/OSGeo/PROJ/blob/master/src/open_lib.cpp

It looks as though this has been changed, and from and including 6.0.0 it should be feasible. In 4.9.3 projects.h was needed, hence much mess in configure.ac.

OK so you can make a function that wraps that conditionally like this:

#include <proj.h>

SEXP C_set_data_dir(SEXP data_dir){
#if PROJ_VERSION_MAJOR >= 6
  proj_context_set_search_paths(CHAR(STRING_ELT(data_dir, 0));
#else
  Rf_error("Version of proj too old for proj_context_set_search_paths");
#endif
  return data_dir;
}

And then in your .onLoad() R code you call a function like this:

set_proj_data <- function(proj_data){
  if(sf:::CPL_proj_version() < 6){
    Sys.setenv(PROJ_LIB = proj_data)
  } else {
    .Call(C_set_data_dir, proj_data);
  }
}

So basically establish the convention for all R packages to only use PROJ_LIB for PROJ4 data.

... and conditionally include proj.h?

edzer commented

In sf, proj.cpp has all functions twice, first using the new API and then the old one; yes, proj.h is included conditionally. The current implementation seems to work, tested against PROJ 6.1 after manually copying the proj directory to the installed package dir.

edzer commented

@darkblue-b you can pick the released (CRAN) version for that.

Would it be possible to append the library path relevant to sf upon loading, instead of substituting what was originally matched to PROJ_LIB? Since I encountered issues when calling gdal through commandline, when sf is loaded from a package.

edzer commented

I'm afraid you'll have to wait until PROJ6 is rolled out, or install & compile against that yourself.