semanticarts/gist

rdfs:range on DatatypeProperties produces invalid RDF (when reasoned over)

Opened this issue · 12 comments

here we have

gist:containedText
	a owl:DatatypeProperty ;
	rdfs:range xsd:string ;

if we use that in the tbox with an rdfs reasoner and some instance data...

:~/github/gist$ cat ontologies/ct.ttl 
@prefix : <https://w3id.org/semanticarts/ontology/gistCore#> .
@prefix gist: <https://w3id.org/semanticarts/ns/ontology/gist/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

gist:containedText
        a owl:DatatypeProperty ;
        rdfs:range xsd:string .

:~/github/gist$ cat instance.ttl 
@prefix : <https://w3id.org/semanticarts/ontology/gistCore#> .
@prefix gist: <https://w3id.org/semanticarts/ns/ontology/gist/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:thing0 gist:containedText "this is my text"^^xsd:string .

# now let's run the rdfs reasoner and see the inferred triples

:~/github/gist$ ~/Downloads/apache-jena-5.0.0-rc1/bin/riot --formatted=turtle --rdfs=ontologies/ct.ttl instance.ttl 
PREFIX :     <https://w3id.org/semanticarts/ontology/gistCore#>
PREFIX gist: <https://w3id.org/semanticarts/ns/ontology/gist/>
PREFIX owl:  <http://www.w3.org/2002/07/owl#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sh:   <http://www.w3.org/ns/shacl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xml:  <http://www.w3.org/XML/1998/namespace>
PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

"this is my text"  rdf:type  xsd:string .

:thing0  gist:containedText  "this is my text" .

# BUT notice that isn't well-formed RDF since we have a literal in the subject position

# we can confirm it is invalid RDF

:~/github/gist$ ~/Downloads/apache-jena-5.0.0-rc1/bin/riot --rdfs=ontologies/ct.ttl instance.ttl  | ~/Downloads/apache-jena-5.0.0-rc1/bin/riot --validate --syntax=turtle -
16:34:29 ERROR riot            :: [line: 2, col: 69] Subject is not a URI or blank node

i think we might need to use owl:onDataRange and/or owl:onDatatype to say what we mean here?

@justin2004 -- I've seen something like this before, where a reasoner generates triples that have literals as subjects. (The rule prp-fp in OWL-RL does this--@rjyounes will remember that one.) Arguably it says more about the reasoner's rule set than the formal defs of the properties. Then again, if we can't edit the rule sets and the issue is pervasive enough, it is worth considering whether an edit to gist could keep those triples from being generated.

i think the reasoner is behaving exactly as rdfs wants it to:

The triple

P rdfs:range C

states that P is an instance of the class [rdf:Property](https://www.w3.org/TR/rdf-schema/#ch_property), that C is an instance of the class [rdfs:Class](https://www.w3.org/TR/rdf-schema/#ch_class) and that the resources denoted by the objects of triples whose predicate is P are instances of the class C.

https://www.w3.org/TR/rdf-schema/#ch_range

we all know what we meant by gist:containedText rdfs:range xsd:string though. :)

This is interesting, and is certainly not ideal. However, leaving it as it is may not result in any problem in practice. This is to say, is it worth devoting energy to this problem? It may be more a problem with RDFS & OWL, not with gist.

i think there is a correct way to say "the thing hanging off this predicate is a literal of datatype X."
it might be something like this. we'll find the correct way to express it.

it might be something like this. we'll find the correct way to express it.

This defines a Datatype, I don't see how this relates to defining the range of a datatype property to be some datatype.

@dylan-sa is correct, we went around and around with RDFox over this - their implementation of the OWL RL rules generates literal subjects as well. We contacted Ian Horrocks about this, and his response was essentially: the rules shown in the spec are suggestions, and if they result in invalid RDF they should not be implemented as such; OWL RL itself is RDF-compliant. Weird situation, IMO, but that's the way it is. (In the RDFox case it's particularly odd because Ian is one of the founders of OST.) I don't think we should stop using perfectly valid OWL because of some misunderstanding between the spec authors and the tool implementers. Everyone in the world uses this locution.

There are a couple of OWL RL rule "suggestions" that generate the invalid literal subjects. The example Dylan mentioned, prp-fp, infers owl:sameAs from any functional properties, not just functional object properties. The OWL DL reasoner is logically correct in inferring owl:sameAs only from functional object properties, and declaring a set of triples logically inconsistent in the case of functional datatype properties.

Everyone in the world uses this locution.

I think RDFS is very clear about what rdfs:range means. I think I (and lots of other people) have been misunderstanding it (and following the misunderstanding of others) when it comes to literals.

Let's see if this can clear it up.

Lots of people watch the rdf tag in SO.

Interesting, @justin2004, you're right, I mistakenly jumped to the conclusion that it had something to do with the OWL RL reasoner issue I mentioned.

@uscholdm says:
This is interesting, and is certainly not ideal. However, leaving it as it is may not result in any problem in practice. This is to say, is it worth devoting energy to this problem? It may be more a problem with RDFS & OWL, not with gist.

I don't agree. We've gone to great lengths recently, for example, to ensure that gist is OWL DL compliant. I don't think we want to now say that it is not RDF(S)-compliant! It's worth thinking about a better way to express this.

Update:

It turns out that OWL (re)defined rdfs:range. @sa-bpelakh pointed that out to me.
I posed a question to the Apache Jena Mailing List about this.

Let's see what they say but at the moment it seems like we need to say that we are using rdfs:range as re-defined by OWL. That is, we don't mean RDFS 1.1 spec of rdfs:range in gist.

it seems like we need to say that we are using rdfs:range as re-defined by OWL

Doesn't the fact that gist is an OWL ontology suffice without explicit statement?