CrossRef/cayenne

What's the correct subject id for RDF representation?

Opened this issue · 4 comments

Given the RDF representation returned by this query:

https://api.crossref.org/works/10.1007/978-3-319-60131-1_33/transform/application/rdf+xml

The current RDF response refers to the subject using the http://dx.doi.org resolver:

<rdf:Description rdf:about="http://dx.doi.org/10.1007/978-3-319-60131-1_33">

The same behaviour is observed when using a full URL DOI with a different resolver:

http://api.crossref.org/works/https://doi.org/10.1007/978-3-319-60131-1_33/transform/application/rdf+xml

This response is used for content negotiation queries, which may use a variety of DOI resolvers. If Content Negotiation is performed against https://doi.org but the response uses http://dx.doi.org then there is a mis-match.

Question: Is this behaviour incorrect? Should the RDF response always use exactly the same DOI representation as was used in the query?

I think there are two areas to consider:

  1. Being able to discover a description of a resource eg. if https://doi.org/10.1007/978-3-319-60131-1_33 is dereferenced, the representation should say something about that.

  2. Having "equivalences" (in the general sense of the word) across identifiers.

1 would be ideal because it is the primary resource (assuming that the HTTPS DOI URI is used as PID) that's describing itself. This is the most common RDF/LD-safe route.

If 1 is not possible (which seems to be case right now that's live), 2 should definitely be in place ie.:

<http://dx.doi.org/10.1007/978-3-319-60131-1_33>
  owl:sameAs <https://doi.org/10.1007/978-3-319-60131-1_33> .

This is so that there is at least some semantic path to equating those things. Otherwise, we are forced to fall back to having consuming applications make "out of band" decisions (and hardcoding it) on the idea that the dx.doi.org is equivalent/interchangeable with the doi.org - so, that's not particularly "interoperable".

We already have the following:

<owl:sameAs rdf:resource="doi:10.1007/978-3-319-60131-1_33"/>

So it's not such a stretch to expand this to include e.g.

<owl:sameAs rdf:resource="https://doi.org/10.1007/978-3-319-60131-1_33"/>

I don't know the history of this too well, but I think that:

owl:sameAs <doi:10.1007/978-3-319-60131-1_33>

should be:
owl:sameAs <urn:doi:10.1007/978-3-319-60131-1_33>

But, yes, adding:
owl:sameAs <https://doi.org/10.1007/978-3-319-60131-1_33>

would be a good start.

Again, assuming that we want to use https://doi.org/10.1007/978-3-319-60131-1_33 as a PID as well as a machine-readable, then a request like:
curl -LH'Accept: text/turtle' https://doi.org/10.1007/978-3-319-60131-1_33

should have a response including the subject URI (as the primary thing that's being described) ie. using doi.org as opposed to dx.doi.org.

I'm not sure what purpose the dx.doi.org serves currently.. whether it is there for legacy reasons.. being phased out or.. so whether to include it or to signal that it is deprecated or not is something you'd know better.

Note also the current HTTP headers:

HTTP/2 302
location: https://data.crossref.org/10.1007%2F978-3-319-60131-1_33

HTTP/1.1 200 OK
Link: <http://dx.doi.org/10.1007/978-3-319-60131-1_33>; rel="canonical"

and whether dx.doi.org as canonical is still accurate..

Issue moved to https://gitlab.com/crossref/rest_api/-/issues/91 as part of our transition to Gitlab.