Better GeoSPARQL conformity
Opened this issue · 7 comments
We should be fully conform with the GeoSPARQL standards for types geo:SpatialObject and geo:Feature.
In particular, a geo:Feature
must have the following properties: geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid
and geo:hasBoundingBox
That is, osm:Node
s, osm:Way
s, osm:Relation
s and osm:Area
s should be of type geo:Feature
and offer these properties.
As far as I understand it, all of the properties geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid and geo:hasBoundingBox
must then point to an object of type geo:SpatialObject
. These must implement geo:hasSize, geo:hasMetricSize, geo:hasLength, geo:hasMetricLength, geo:hasPerimeterLength, geo:hasMetricPerimeterLength, geo:hasArea, geo:hasMetricArea, geo:hasVolume
and geo:hasMetricVolume
.
So far, I don't see any problem with implementing this.
However, AFAIK (@lehmann-4178656ch, @Danysan1, please correct me) , sfIntersects
and sfContains
should be properties between geo:SpatialObject
s. This would mean that we cannot write queries like
SELECT ?osm_id ?hasgeometry WHERE {
osmrel:1960198 ogc:sfContains ?osm_id .
?osm_id geo:hasGeometry/geo:asWKT ?hasgeometry
}
anymore. They would then look like this:
SELECT ?osm_id ?hasgeometry WHERE {
osmrel:1960198 geo:hasGeometry ?geoma .
?osm_id geo:hasGeometry ?geomb .
?geoma ogc:sfContains ?geomb .
?geomb geo:hasGeometry/geo:asWKT ?hasgeometry
}
@hannahbast, @joka921, is that a problem?
See also ad-freiburg/qlever#678 (comment)
Can/Shall we replace the current geo:hasGeometry
with geo:hasDefaultGeometry
as we only provide a single geometry? If both are needed we would have provide the same information with two predicated:
osmObject geo:hasGeometry ourGeoObject .
osmObject geo:hasDefaultGeometry ourGeoObject .
Regarding the properties of geo:SpatialObject
object specs allows for these to be implemented but we are not required to add them all and some could never associate any meaningful value, e.g. the area of a way without width, or the volume of a point.
Regarding the properties of
geo:SpatialObject
object specs allows for these to be implemented but we are not required to add them all and some could never associate any meaningful value, e.g. the area of a way without width, or the volume of a point.
I am not so sure - according to RFC 2119, SHALL
means an absolute requirement. So afaik we must provide both.
@lehmann-4178656ch and I discussed this further.
I think a sane approach would be to omit the properties we cannot fill with any meaningful value to keep the dataset size manageable. For example, it seems extremely redundant to add geo:hasArea
properties with a value of 0 to each node and way in the dataset.
In this spirit, I would also not add the geo:hasDefaultGeometry
triple. It's just overly redundant.
Looking at the geoSPARQL specification...
- in 6.4 I read that
geo:hasGeometry
,geo:hasDefaultGeometry
,geo:hasCentroid
andgeo:hasBoundingBox
havegeo:Feature
as domain andgeo:Geometry
as range - in 6.2.2 I read that
geo:Feature rdfs:subClassOf geo:SpatialObject
- in 6.8.1 I read that
geo:Geometry rdfs:subClassOf geo:SpatialObject
- in 6.3 I read that
geo:hasLength
,geo:hasArea
and all other properties "for associating Spatial Objects with scalar spatial measurements" havegeo:SpatialObject
as domain and range, NOTgeo:Geometry
- in 7.2 I read that
geo:sfContains
,geo:sfIntersects
and all other properties in the "Simple Features relation family" havegeo:SpatialObject
as domain and range, NOTgeo:Geometry
This is visualized in this diagram at the beginning of section 6 and this other diagram from this paper
So:
a
geo:Feature
must have the following properties:geo:hasGeometry
,geo:hasDefaultGeometry
,geo:hasCentroid
andgeo:hasBoundingBox
osm:Node
s,osm:Way
s,osm:Relation
s andosm:Area
s should be of typegeo:Feature
and offer these properties.
all of the properties
geo:hasGeometry
,geo:hasDefaultGeometry
,geo:hasCentroid
andgeo:hasBoundingBox
must then point to an object of typegeo:SpatialObject
. These must implementgeo:hasSize
,geo:hasMetricSize
,geo:hasLength
,geo:hasMetricLength
,geo:hasPerimeterLength
,geo:hasMetricPerimeterLength
,geo:hasArea
,geo:hasMetricArea
,geo:hasVolume
andgeo:hasMetricVolume
.
sfIntersects
andsfContains
should be properties betweengeo:SpatialObjects
I agree with all of the above
we cannot write queries like
SELECT ?osm_id ?hasgeometry WHERE { osmrel:1960198 ogc:sfContains ?osm_id . ?osm_id geo:hasGeometry/geo:asWKT ?hasgeometry }
anymore. They would then look like this:
SELECT ?osm_id ?hasgeometry WHERE { osmrel:1960198 geo:hasGeometry ?geoma . ?osm_id geo:hasGeometry ?geomb . ?geoma ogc:sfContains ?geomb . ?geomb geo:hasGeometry/geo:asWKT ?hasgeometry }
I believe this is not the case: given that
geo:Feature
isrdfs:subClassOf
geo:SpatialObject
- these relations have
geo:SpatialObject
as domain and range
then these relations can also link geo:Feature
s to other geo:Feature
s, so the old syntax is still correct.
If I understand correctly this also means that if these triples hold...
:x geo:hasGeometry :xGeom.
:y geo:hasGeometry :yGeom.
:x geo:sfContains :y
...then also these hold...
:x geo:sfContains :yGeom.
:xGeom geo:sfContains :y.
:xGeom geo:sfContains :yGeom.
This would require doing some inference (combining geo:hasGeometry
with the base relation), either materialized in the triples or done dynamically at query-time.
I think a sane approach would be to omit the properties we cannot fill with any meaningful value to keep the dataset size manageable. For example, it seems extremely redundant to add
geo:hasArea
properties with a value of 0 to each node and way in the dataset.In this spirit, I would also not add the geo:hasDefaultGeometry triple. It's just overly redundant.
Given what you pointed out about SHALL and that 6.3 reads "Implementations shall allow the properties ... to be used in SPARQL graph patterns" this probably would break the formal full conformity with GeoSPARQL, but still, in my opinion it is an acceptable tradeoff.
Thank you all for this discussion. One way to realize redundant predicates is to just let the SPARQL engine know about 100% equivalent predicates, have the triples in the index for exactly one and then map each equivalent predicate to this one at query time.
The situation is not new, just the scale. For example, each of the 90 M Wikidata items has exactly one rdfs:label
triple and a 100% equivalent (and therefore redundant) schema:name
triple. We didn't care about this so far, since it's just 90 M additional triples compared to 19 B triples overall. But if these redundant triples blow up the total size of the dataset considerably, we should care.
Similarly, for predicate paths <x>/<y>
, where you never need the intermediate node (typically, a blank node), the index builder could just discard the blank node, internally create a simple predicate <x/y>
, and then map the path <x>/<y>
to <x/y>
at query time. if a query asks for the blank node in between at query time, we could either create it on the fly or issue an error message.
Hi @patrickbr @lehmann-4178656ch @Danysan1 @hannahbast @joka921,
The desire to make your OSM representation GeoSPARQL compliant is highly appreciated!
GeoSPARQL is a voluminous and complex spec.
I copy here two main GeoSPARQL experts @nicholascar @situx to correct what I write below in case I made mistakes.
- You will find the examples https://docs.ogc.org/is/22-047r1/22-047r1.html#_cd8db17e-0f99-4d58-9496-b2ad49031748 most useful:
C.1. RDF Examples
C.2. Example SPARQL Queries & Rules - If you have time, I highly recommend reading "GeoSPARQL 1.1- Motivations, Details and Applications of the Decadal Update to the Most Important Geospatial LOD Standard (IJPRS 2022)"
Alternative Geometries
Currently you have
osmnode:679109323
geo:hasGeometry osm2rdfgeom:osm_node_679109323 ;
osm2rdfgeom:convex_hull "..."^^geo:wktLiteral ;
osm2rdfgeom:envelope "..."^^geo:wktLiteral ;
osm2rdfgeom:obb "..."^^geo:wktLiteral .
osm2rdfgeom:osm_node_679109323 geo:asWKT "..."^^geo:wktLiteral .
But all these are alternative geometries so I suggest to change it to:
osmnode:679109323
geo:hasGeometry
osmnode:679109323/geom, osmnode:679109323/convexHull, osmnode:679109323/boundingBox, osmnode:679109323/orientedBoundingBox;
geo:hasDefaultGeometry osmnode:679109323/geom;
geo:hasBoundingBox osmnode:679109323/boundingBox;
.
osmnode:679109323/geom a geo:Geometry; osm2rdf:role "geometry"; geo:asWKT "..."^^geo:wktLiteral.
osmnode:679109323/convexHull a geo:Geometry; osm2rdf:role "convexHull"; geo:asWKT "..."^^geo:wktLiteral.
osmnode:679109323/boundingBox a geo:Geometry; osm2rdf:role "boundingBox"; geo:asWKT "..."^^geo:wktLiteral.
osmnode:679109323/orientedBoundingBox a geo:Geometry; osm2rdf:role "orientedBoundingBox"; geo:asWKT "..."^^geo:wktLiteral.
Notes:
- I suggest to use hierarchical URLs, where geometry URLs of a feature use the feature URL as prefix. (Above I've shown CURIEs, which are not valid prefixed URLs, but you get the idea)
- You should use
hasGeometry
for all,hasDefaultGeometry
for the main (detailed) geometry,hasBoundingBox
for the envelope (I assume by "envelope" you mean the bounding box, right?) - You can add
geo:hasCentroid
if you can compute it, but it's optional. - For all, I've added
osm2rdf:role
to allow the user to distinguish between them. - I suggest to simplify your namespace
osm2rdfmember
to justosm2rdf
, so the same predicate can be used here and in "members" - OGC discussed the introduction of roles and "qualified geometries" opengeospatial/ogc-geosparql#241, opengeospatial/ogc-geosparql#430 but that is not yet standardized, so you can use your own roles.
- I think it's enough for roles to be strings, not "things"
Feature class and Relations
Currently you have eg
osmnode:679109323 rdf:type osm:node
Please also add geo:Feature
as type.
It's ok to keep the topological relations at the level of Features, eg:
osmrel:3766584 ogc:sfContains osmway:264339544
As you can see in C.2.3.1. All features or geometries overlapping with another feature, the relations apply at both levels of Feature and Geometry, and by keeping them at the level of Feature, you implement only the first (most efficient) branch of the UNION.
Magic Predicates
You have materialized topological relations using an unofficial namespace like this:
@prefix ogc: <http://www.opengis.net/rdf#> .
osmrel:3766584 ogc:sfContains osmway:264339544
Please consider using geo:sfContains
(the official namespace). This has pros and cons:
- For Qlever, which afaik doesn't support GeoSPARQL indexing, this will be ideal since it will allow a user to make standard queries, and have them execute quickly
- But for repositories that support GeoSPARQL indexing, it would conflict with the standard "magic predicate"
geo:sfContains
. Eg in GraphDB, that predicate is not consulted in the database, but is passed to the geospatial index to process.
I think you should use the standard predicate geo:sfContains
, but put those triples into separate dump files.
That way sem web developers can choose whether to load them to their repo, or let the repo compute the topological relations automatically.
BTW, have you implemented transitivity of sfContains
?
(This section applies to all topological relations that you support, not just sfContains
)
Measures
It's a good idea to provide measures if you can.
- I think you should provide these:
geo:hasMetricLength, geo:hasMetricPerimeterLength, geo:hasMetricArea
- Non-metric measures are not very useful since they don't fix the UoM system
- The other measures are a bit abstract (
hasSize
) or don't apply (hasVolume
)
Measures should be attached to Features not Geometries. Eg the Area of a boundingBox is typically bigger than the area of the detailed geometry, and only the latter is interesting.
No additions from my side. I think @VladimirAlexiev explained it very well.
I would also be happy to see the dataset published using the GeoSPARQL vocabulary.
If you find anything you would like to express but cannot express in GeoSPARQL, we are always happy to receive a pull request or an issue in the ogc-geosparql repository.