datacommonsorg/mixer

Accept SPARQL queries using RDF prefixes

alexkreidler opened this issue · 1 comments

Well-formed SPARQL queries assume that subjects, predicates, and objects can all be identified by IRIs.

It appears that DataCommons expects SPARQL queries that simply use the keywords defined in the mapping file, e.g. typeOf, with no IRI.

It would be great to also support passing in queries using

Thus an example query could be written as such:

SELECT ?name WHERE { 
    ?state https://datacommons.org/schema/typeOf https://datacommons.org/schema/State . 
    ?state https://datacommons.org/schema/dcid https://datacommons.org/schema/geoId/06 . 
    ?state https://datacommons.org/schema/name ?name 
}

I understand that supporting relative IRIs may introduce more complexity into the SPARQL processing, but it also doesn't seem to hard to add an additional function call here:

pred = lit

That uses the already parsed prologue to properly replace the IRIs.

Of course, if you didn't want to change the rest of the code and would rather make the change on that line, the function could potentially do all the relative IRI processing, and then simply strip the proper prefix, and just pass the string to the rest of the code.

At some point it might be valuable to throw an error if clients don't use the proper IRI prefix to encourage best practices.

Use-cases/reasoning

Many SPARQL query writing tools simply won't allow malformed queries to be sent out. For example, yasgui, which is used by Wikidata, and Stardog's SPARQL tool both do this.

Also, using prefixes would allow users to distinguish the SPARQL queries and quickly know they are for DataCommons. Also, potentially if users are federating SPARQL queries (e.g. they want to cache DataCommons results in a common triple store), non-compliant SPARQL would likely not be forwarded.

Thanks for your work on this project. Let me know if you need any more details!

Huh, I just saw the ontology from http://schema.datacommons.org. It's interesting that you define all those types/classes, and mention in the data model doc that they are extensions of schema.org types, but you don't include the what I'd consider "higher order" relationships (typeOf, etc) in any published schema/ontology.

If possible, I'd love to hear a bit about how the graph is mapped to the relational tables. Maybe this would help me understand why those have to be hardcoded.