IIIF/api

clarify CC canonical URLs

Opened this issue · 13 comments

eroux commented

The new rights statement states that URL must be drawn from the set of Creative Commons license URIs, the RightsStatements.org rights statement URIs, etc.

For RightStatements, there's no issue as each license has a canonical URI indicated on each pages with "URI for this statement:", example: http://rightsstatements.org/vocab/InC/1.0/ (note the http, not https), and there's even a nice rdf file accessible at

curl -L -H "Accept: text/turtle" http://rightsstatements.org/vocab/1.0/

Now, for CC, it's a bit more confusing as there are many URLs that point to very similar pages, and no clear indication of a canonical link. Although when visiting an RDF version (example) or the index it becomes quite clear that the canonical URIs are http, which is probably a bit counter intuitive.

I'm thinking that giving an example or pointing to the RDF index could be useful to avoid users using non-canonical URLs

There is an example in the rights definition for the Presentation API that uses https://creativecommons.org/licenses/by/4.0/.

We should clarify that it is the "license deed" URI that needs to be used from the linked "licenses" page.

In Image, there is a rights entry in the example, but the same clarification about the "license deed" as the URI to use should be added.

eroux commented

Yes, note that I think the URL of the example should be http and not https, as iiif json is json-ld (and thus RDF), and the URIs used by CC in RDF are http. Wdyt?

Indeed in the RDF the URI uses http, whereas in the human readable HTML pages, the URI is https. I understand now, thank you!

@iiif/editors We should pick http or https and be clear which, and why. Please weigh in!

Retagging with discuss, removing editorial as there is a normative difference, even if it's only one character :)

I'm not sure if it adds any clarity, but if you request

http://rightsstatements.org/vocab/InC-NC/1.0/

... the response header asserts HSTS:

Location: https://rightsstatements.org/vocab/InC-NC/1.0/
Non-Authoritative-Reason: HSTS

But - that's still a browser interaction, experienced when, say, a user clicks the license link in a viewer. It doesn't alter the fact that the published vocab uses http.
Which suggests to me that we need to use http here, even though we'd rather not.

Editors minus Stroop (because he's not here) plus Matt believe that HTTP is the canonical form of the license URIs for software infrastructures, and as this is primarily a software-driven enumeration of values (the URIs), then the presentation API should require the HTTP form to be published.

However, for presentation to end users, if a client wants to create a link to the license itself, then it SHOULD rewrite the URI to use the HTTPS scheme, as that is the canonical form of the URI for humans (the "license deed").

Editors will also contact creativecommons and ask if:

  • they can make the canonical software form use https
  • or, they can add owl:sameAs to the license rdf to ensure that the URIs line up

If either of these are okay and implemented before the final version of the 3.0 APIs and thus creativecommons recommends using HTTPS, then we will instead use the https URIs, and would re-update our examples.

The canonical URIs for CC are indeed the HTTP ones. Asking to use the HTTPs ones would be a bit like asking to use https://rightsstatements.org/page/InC-NC/1.0/ instead of http://rightsstatements.org/vocab/InC-NC/1.0/ in the case of a rightsstatements.org statement.

per @aisaac, in a rightsstatements.org context we discussed this w3c blog post when we looked at implementing HTTPS. while it is not a spec, i believe we were following Sandro Hawke's suggestion:

In short, keep writing “http:” and trust that the infrastructure will quietly switch over to TLS (https) whenever both client and server can handle it.

I'm not sure that Sandro's post is the right context for this discussion.

My understanding is that the concern there is about the ontology / namespace layer, and where there is only one thing being identified -- the term (or namespace). In this case however there are two significant differences:

  • The http URI is defined by creative commons as being a different "layer" to the https URI -- http is the machine readable license and https is the human readable deed. There is then a legal code as the third layer. So it seems that creative commons have explicitly chosen to think about these as different things, differentiated only by the URI scheme. As per Sandro's post and the comments on it, this is not the greatest pattern.

  • These are the URIs of instances, not terms. Thus the change is vastly more likely to affect user interfaces, especially as humans are bound to click on the link or icon provided by the client, and the client is thus very likely to present the URI directly from the content. Most UIs do not natively present the class and relationship URIs to end users.

But ...

Something else that has came up in looking into how HSTS works -- creativecommons.org is on the HSTS preloaded list for browsers: https://hstspreload.org/?domain=creativecommons.org

This means that browsers will automatically use https for URLs in the cc domain without making an http request, getting the UIR header, flipping to HTTPS and then getting HSTS to stay there.
So the privacy concern for the initial HTTP request is handled already.

@azaroth42 wrt the "license deed" would you happen to have found some documentation that mention (the abandon of) "/deed" in the URIs? I remember some time ago CC had them. But now I can't find them anymore... and the documentation on the various CC github/wiki pages is hard to track down. I found https://support.crossref.org/hc/en-us/articles/214298886-License-URIs-Technical-Details but that's not canon, I guess. Maybe this would help clarify the discussion...

Use of http for Creative Commons URIs (for now) approved by IIIF/trc#32

Closed by #1903

I think we could reverse this in 4.0, as now the RDF descriptions contain:

  <cc:License rdf:about="http://creativecommons.org/publicdomain/zero/1.0/">
<!-- ... -->
   <owl:sameAs rdf:resource="https://creativecommons.org/publicdomain/zero/1.0/"/>
  </cc:License>

to assert that both http and https are the same. The human readable pages also list https as the canonical URI.

Hi @azaroth42 I had asked for clarification about this a while ago, which resulted in this bit of text being added: https://github.com/creativecommons/cc-legal-tools-app/blob/main/docs/rdf.md#rdf-canonical-url
I would say we can keep to the current situation. Having a situation with both https and http being used is not going to be great, for the (many/all?) tools that don't do owl:sameAs inference.