evetion/GeoArrays.jl

Abstracting projections

rafaqz opened this issue ยท 18 comments

I chunked out your CRS code and projection conversions into a few packages. The Base package is to provide types so that CRS metadata can be a lightweight, standard addition to any spatial array, without GDAL. The main package provides your conversions as Base.convert() methods between the projection types (which is a little more Julian and modular), with a GDAL dep.

The Base package could go into the GeoArrayBase.jl but it could also be used in other contexts, so I'm trying to keep things focussed for now. Let me know if you want push rights, half of it is your code.

https://github.com/rafaqz/CoordinateReferenceSystems.jl/blob/master/src/CoordinateReferenceSystems.jl

https://github.com/rafaqz/CoordinateReferenceSystemsBase.jl/blob/master/src/CoordinateReferenceSystemsBase.jl

visr commented

Nice, I think it's a good idea to have a CRS package that can be used by different packages.

Regarding the GDAL dependency. I would say let's use PROJ instead of GDAL for all CRS matters. It's a smaller dependency, and with the release of PROJ 6 / GDAL 3 result of the GDAL SRS barn raising, PROJ is basically doing all of the CRS work. Of course GDAL still offers it in it's API, but this work is all forwarded to PROJ.

From https://gdal.org/development/rfc/rfc73_proj6_wkt2_srsbarn.html#ogrspatialreference-rewrite:

WKT1, WKT2, ESRI WKT, PROJ strings import and export is now delegated to PROJ. The same holds for import of CRS from the EPSG database, that now relies on proj.db SQLite database.

Instead of PROJ strings, it's probably best to focus on WKT2 strings (next to SRID such as EPSG): https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems

I understand you are also interested in a package without a binary dependency. I agree that this would be nice to have, but currently given the absence of any julia implementation which understands any CRS format, I still find it hard to fully picture it. We could think about some trait-based interface that any CRS implementation could implement. Perhaps we could even use GeoInterface for vector+raster+CRS interfaces? First it'd be good to focus on getting a new PROJ.jl release out of the door such that we can start building on that.

CoordinateReferenceSystemsBase.jl already has no binary deps, that's all I need :)

The binary dependency is only required for CoordinateReferenceSystems.jl, which you would load when you need convert() or any other heavy lifting we might add.

If I don't need to do conversions (as in most of my simulations work) I don't need to load CoordinateReferenceSystems.jl, just CoordinateReferenceSystemsBase.jl as a dep in whatever spatial array package I'm using.

I agree Proj4 might be better than GDAL, but I literally just cleaned up the existing code. I'm not keen to write much more functionality as I'm not really a GDAL or Proj4 user and it will take me forever. But I'm open to any pull requests, letting people have push access or whatever :)

visr commented

Yeah I had a look at CoordinateReferenceSystemsBase.jl. So your use case is mainly to have a type which holds one of various forms of CRS, as a string or integer, but don't need any conversion capability? For my understanding, what does that use case look like? Do you want to dispatch on it, or do you just want to keep track of the CRS as metadata?

Regarding depending on GDAL v PROJ, I get that you used the existing GDAL code, just a point of where I think we should be heading towards. BTW it's probably best to use PROJ instead of Proj4. Since the project is now in v6, the Proj4 name is no longer used.

We need to be able to use the same spatial data types for multiple scenarios, wether using the data like a regular array or actually worrying about the coordinates/projection.

I'm writing spatial simulation packages: Cellular.jl, Dispersal.jl, GrowthRates.jl, Microclimate.jl. Most of the time the coordinates/projection etc just don't matter, data is treated as a matrix. But when you go to plot the output with a shapefile, check the projection, transform coordinates, you don't have any metadata and have to go back to the NetCDF, or wherever the data came from. Or just keep all that organised in R and send the array back there for processing (what we actually do).

Lightweight deps mean we can always keep metadata attached to spatial data by default, and just load another package if and when we need to do something with it. But it has no startup time cost to the use cases that don't care about the spatial metadata.

I also think it's a generally good thing for a spatial data ecosystem to reduce the overheads of using shared types.

Thanks for the work on creating a AbstractCRSdefinition package ecosystem! We were already thinking about having a dedicated package and I'm happy to have the projection functionality in a separate package.

I do agree with @visr about adding WKT2 support and using PROJ. If I find some time, I will remove the projection code here and refer to your package. I will probably also create some PRs over there as well.

Relevant in this discussion r-spatial/sf#1225 which is roughly

struct CRS
input::AbstractString  # user input
wkt::AbstractString  # translated into WKT(2 if GDAL3)
end

Yeah seems like a similar problem, but having a good type system in Julia can make the solution a lot cleaner, and lazy, instead of requiring GDAL/Proj. There is a Base.convert methods in yeesian/ArchGDAL.jl#97. It can convert crs or reproject geometries using arbitrary GeoFormatTypes objects, when you need it. So I'm not sure the wkt field is necessary.

ArchGDAL may not be the best place to put the (unavoidable) type piracy compared to some other glue package, but it's a proof of concept.

Edit: It would be good if you could use either GDAL or Proj for conversion depending what you have loaded already. Maybe using Requires.jl in GeoFormatTypes and some method like Plots.jl pyplot() etc could achieve this.

visr commented

ArchGDAL may not be the best place to put the (unavoidable) type piracy compared to some other glue package, but it's a proof of concept.

Could you explain the type piracy? In yeesian/ArchGDAL#97 there are methods to go from GeoFormatTypes types to ArchGDAL types and vice versa, that doesn't mean type piracy right?

Sure. There is type piracy on Base.convert and GeoFormatTypes.WellKnownText etc in that PR, but it's not totally uncommon.

An example of this is Colors.jl pirating convert on ColorTypes.jl types. I originally had that CoordinateReferenceSystems.jl package to do that for crs, basically copying Colors.jl. But Colors.jl doesn't need the external packages, and especially doesn't have two -Proj or ArchGDAL - that can do the conversions.

But... I'm thinking now that as Requires can precompile we can just put Proj and ArchGDAL convert methods in GeoFormatTypes inside a @requires block so convert will work if you load one of them. Then there will be no piracy.

GeoFormatTypes might also need a proj() and gdal() method to explicitly choose one backend (like in Plots.jl) as the conversions are probably slightly different and you would want to be explicit if you happened to load both packages for some other reason.

Edit: this also sounds more complicated than it is. It will be very little code, it's just about choosing how it should work to get the balance of genericness and a light-weight types package.

visr commented

Ah ok, I don't think that is the kind of type piracy we have to worry about.

Regarding PROJ/GDAL, since PROJ is doing all the CRS work, it would be good to check if that supports your use cases. PROJ is a required dependency of GDAL anyway. Though it seems the GDAL API perhaps supports more formats?

We just need to be clear where it is allowed to happen.

I didnt think Proj.jl was a GDAL or ArchGDAL dep. How are you suggesting to use it?

visr commented

Indeed GDAL.jl or ArchGDAL.jl does not have a dependency on the Proj4.jl wrapper, but the PROJ library itself is a required dependency of the GDAL library. So if (like in GeoArrays) we already depend on GDAL, the costs of depending on Proj4.jl is literally only the bit of julia wrapping code, since the PROJ library is already loaded anyway (soon via PROJ_jll).

And since PROJ is the CRS workhorse, it may be good to focus on interacting with that for CRS operations rather than GDAL when possible. Not saying that we should avoid using GDAL altogether, but it may be better to interact with it directly rather than indirectly, also since it is a smaller dependency.

visr commented

Ah ok. Reproject is defined in PROJ, and in GDAL, which calls PROJ to do the work. PROJ is not lower level than GDAL I would say, but it is true that compared to Proj4.jl, ArchGDAL.jl is a more developed package in terms of API/tests/docs. It would be nice to make Proj4.jl easier to use as well.

Yeah thats roughly what I figured.

BTW I mean the reproject as defined in the pull request. It's a one liner that reprojects geomteric data (including wtk/json etc) between any projections in any formats as a generic command.
https://github.com/yeesian/ArchGDAL.jl/pull/97/files#diff-4fb86b8b73d44e4f28b222f78c20ddb1R36

Notice how much less code it takes than the tests above it that do the same thing.

visr commented

Yeah I see how it can be useful to share CRS types across the ecosystem.

That is indeed the reproject that I was thinking about (in PROJ it's proj_trans). But enough about PROJ v GDAL, I just realized we were having the same discussion half a year ago in this same issue, ending with

I agree Proj4 might be better than GDAL, but I literally just cleaned up the existing code. I'm not keen to write much more functionality as I'm not really a GDAL or Proj4 user and it will take me forever. But I'm open to any pull requests, letting people have push access or whatever :)

So it's clear that rather than offering feedback I should just roll up my sleeves and do the work ;)

The conflation of geometry formats and CRS formats still irks me though, but that is also some old discussion from JuliaGeo/meta#7. Now I see you've added traits to distinguish Geometry/CRS/Mixed in JuliaGeo/GeoFormatTypes.jl#3. Although I would probably separate them altogether.

Haha yeah we did. Unfortunately I have 10 other packages in some stage of completion so I have to avoid lower level work as much as possible. Other than that I totally support your Proj suggestion.

Finally got around to get some work done. Converting to GeoFormatTypes was quick and painless. Well done!