opengeospatial/ogc-geosparql

Decouple CRS and WKT

Opened this issue · 7 comments

Official change request: OGC CR 595

WKT seems a good way to easily encode geographic geometry, but the datatype geo:wktLiteral makes it hard to work with GeoSPARQL. In a next version, there should be an option to use only a WKT literal and use a different way to express the CRS of a geometry. Reasons why concatenation of CRS URI and WKT can be considered bad design are:

  • GeoSPARQL deviates from the WKT standard, resulting in poor software support.
  • Allowing not to specify a CRS and defaulting to CRS84 may be useful in North America, but is of little value for serious usage in other parts of the world.
  • The proper data type for expression of CRS is an IRI. Therefore it should be defined as such, not as part of a string literal.

Especially when non-geographical geometry is considered, CRS is not necessarily a known property. Therefore it should be possible to leave out CRS data in publications, without this resulting in wrong interpretations.

CRS can be considered an intrinsic or fundamental aspect of geometry, but so are other properties like dimensionality or accuracy. This does not mean all of this information should be lumped together in one literal.

It seems better to introduce a new property for CRS and to let WKT literals be just WKT literals. Should a new property for indicating CRS be introduced, it would be good to allow it to be applied not only to individual geometries, but to geometry collections too.

Also note that current specification of WKTliteral allows any valid URI to be used to designate a CRS — which has led Wikidata to use the URI for the entity "the planet Mars" to designate the CRS for all geodata on Mars — though there are multiple CRS actually used for such Martian geodata.

This open-ness may have had good justification and led to some good effects, but it has also caused significant trouble, as under discussion in openlink/virtuoso-opensource#295.

Nice to see that you plan to decouple it. It would be better if the new version of the ontology allows developers to represent any CRS (e.g. ETRS89) in a simple manner.

Created MoSCoW poll




I have a few concerns with this decoupling.

  1. Other popular serializations encode CRS information in the geometry itself.
    GML has the srsName attribute
    KML and GeoJSON exclusively use WGS84
  2. A separate, decoupled specification of CRS information can contradict what is specified in the geometry encoding. Encoding a CRS URI inside a WKT geometry makes WKT consistent with GML, KML and GeoJSON and eliminates the possibility of a conflicting CRS specification.
  3. We could easily provide a query function to return the plain WKT with the CRS URI removed.
  4. Given that GeoSPARQL 1.0 used this WKT+CRS encoding, how would decoupling CRS from WKT be backwards compatible with GeoSPARQL 1.0? Wouldn't it break existing applications?

@mperry455 , thank you for sharing those concerns. Could they be somewhat alleviated by the following remarks?

  1. I don't know if GML and KML should be considered serializations of geometry comparable to the much leaner WKT. As the "L" says, GML and KML are designed as full-blown languages for spatial data. I would sooner regard them as alternatives to encoding spatial data using a graph model for the semantic web. Otherwise you would wind up with data in a graph that contain data in another graph that uses a different model. A matryoshka doll of models - not nice to work with.
  2. Personally, I would advise against using any format for spatial data that restricts the CRS that can be used, or regards a CRS that is much more usable in one region than the other as superior.
  3. I quite firmly believe that considerations about currently fashionable serialisations/data formats should play a much lesser role in ontology design than considerations about the logic of core modelling. An ontology for spatial data should be primarily based on being a good knowledge representation, a faithful description of how we perceive spatial data in general. That is a thing that would truly be of lasting value. In fact, I believe that a well designed, simple ontology for spatial data could do much good in reducing the large number of data formats currently in use. And that would be great for improving data interoperability.
  4. Yes, when CRS references are decoupled inconsistencies could arise when coordinate data are serialized in formats that have their own way of specifying a CRS. Firstly, I hope that an improved GeoSPARQL specification will lessen the need for data publishers to consider using those formats (because more efficient and generally available encodings are made possible). Secondly, there are thousands of ways to publish inconsistent data on the semantic web. I don't think web ontologists should adjust optimal designs to take that into account. However, good ontology documentation should go a long way in preventing inconsistent data publication. Furthermore, to prevent occurence of this particular inconsistency, automated data validation rules (using SHACL or ShEx in case of RDF data for example) could be made and publically shared.
  5. Yes, I can imagine an additional SPARQL query to return plain WKT. But I think that spatial data should be optimally workable without SPARQL or some other query language too. Much can be achieved by direct dereferencing of IRIs, or by downloading data dumps.
  6. Backward compatibility is a very important point, as rightly stated in the charter. One way of dealing with that could be to introduce alternative properties, data types and functions for WKT without CRS data, for example geo:asLeanWKT or geo:leanWKTLiteral.

@FransKnibbe , Thanks for your comments. There are certainly benefits for plain WKT when publishing linked geo data. However, I still much prefer having all geometry literals contain CRS information so that they are "complete", and WKT is the only serialization we are considering that doesn't already contain CRS information (either explicit or implied).
Having CRS information in the geometry literal itself makes things much cleaner from an implementation perspective. For example, at load time, we have SRID information as soon as we see the geometry, and it can be stored in some spatial database in the backend. Otherwise, we may have to hold on to the geometry somewhere until we (hopefully) see some other triple with CRS information. Similar issues can arise at query time.
We had the same discussion 8 years ago, and the group settled on adding SRS information to WKT.
In my mind, this is similar to xsd:dateTime literals, which encode timezone information in the literal value itself.
Regarding your second comment, #1 proposes adding geoJSONLiteral as a serialization type. This has quite a bit of support, and GeoJSON exclusively uses WGS84 Long Lat.

@mperry455 , why should a geometry description be ¨complete¨ with coordinate data and CRS reference put together in a single literal? Firstly, next to coordinates and identification of a CRS, a geometry can have other properties, for instance spatial resolution or dimension (which is already covered in the current spec). Why glue some properties together and others not? Secondly, is there really such a risk of the CRS reference, should that have been published at the geometry level, getting lost? If you download a data dump, you get all the data. If you directly dereference a geometry IRI, logically you would get all data describing that geometry. And if you use SPARQL, you get what you ask for. In RDF, or graphs in general, it is common practice to link bits of related data. String concatenation should not be necessary for both the inner or outer workings of a data manipulation system, I think.

Yes, combining coordinates and CRS in the same literal is similar to xsd:dateTime. But a design decision that was made for a data serialization language (XML) is not necessarily right for ontology design. Also it is good to note that there is no specific default time zone if one omits the time zone part in an xsd:dateTime literal, unlike the WKT literal in GeoSPARQL.

As for geoJSON, I personally see little benefit to using it as a data format (I see it as a (more limited) alternative to GeoSPARQL as a whole, just like GML) but on the other hand, if it is not too much trouble, I guess it could be supported. People would have to be extra careful of course, and data store implementers would need to take precautions against producing erroneous data. Of the set of geometry/coordinate serialisations currently in use, my guess is that WKT will be the most useful way to express coordinates, especially if it can be plain WKT. That would be the most versatile and simple option at the moment. But perhaps if a better way to deal with lists/arrays in RDF is developed that could change. Because in my mind even plain WKT puts too much data in a single literal: both the coordinate array and the geometry type identifier (e.g. POINT, POLYGON).