URIs should be IRIs
Closed this issue · 5 comments
Backend counterpart of mmisw/orr-portal#48
(Summary of changes in RDF 1.1, which uses IRIs - https://www.w3.org/TR/rdf11-new/#identifiers)
@graybeal perhaps I haven't spent enough time yet and this would probably have to wait until some time next year. Would you be able to provide some concrete technical guidance, perhaps with assistance from your groups?. I've looked at rfc398 (some excerpts below), but still finding it confusing in terms of the various pieces involved (browser, http servers, and actual underlying orr-ont code along with its used Jena and OWLAPI libraries...). Perhaps it doesn't have to be complete set of adjustments at once, but that we could take some incremental steps toward fully supporting IRIs.
For example, since "every URI is by definition an IRI" (p. 11), perhaps an initial step could be a systematic renaming of all parameters and documentation from "URI" to "IRI", and of course along with appropriate clarification in texts/tooltips and documentation that only URIs are actually supported (but that we are working toward fully supporting IRIs, etc, etc...). This would be "low hanging fruit" in terms of at least showing some progress in this regard sooner than later ...
from https://www.ietf.org/rfc/rfc3987.txt:
IRIs are meant to replace URIs in identifying resources for protocols, formats, and software components that use a UCS-based character repertoire. These protocols and components may never need to use URIs directly, especially when the resource identifier is used simply for identification purposes. However, when the resource identifier is used for resource retrieval, it is in many cases necessary to determine the associated URI, because currently most retrieval mechanisms are only defined for URIs. In this case, IRIs can serve as presentation elements for URI protocol elements. An example would be an address bar in a Web user agent.
[7..] This informative section provides guidelines for supporting IRIs in the same software components and operations that currently process URIs: Software interfaces that handle URIs, software that allows users to enter URIs, software that creates or generates URIs, software that displays URIs, formats and protocols that transport URIs, and software that interprets URIs. These may all require modification before functioning properly with IRIs.
Well, I've perhaps prematurely taken you up on renaming the documentation, most of it now uses the term IRIs.
I was under the impression that making it possible to use IRIs in software would be basically a matter of using IRI-compatible libraries, and that most libraries at this stage (that we'd be likely to use, anyway) would be IRI-compatible. As I understood it last time I looked (several years ago), IRIs are basically just a different definition of the legal strings allowed, in order to support internationalization.
So I'm a little surprised if there is a big cost here. Will check a little with the team here.
Agree, hopefully no "big cost", but I'd like to get some input that you can get from your team in terms of any gotchas and such. Yes, the libraries used by ORR have long supported IRIs, but I guess my concern is more at the level of how such IRIs are handled by browsers and client application in terms of corresponding encoding toward transfer to the backend, and how they should be decoded in the backend; how to properly handle special IRI symbols for display purposes; etc.
(just some personal note)
ORR uses both Apache Jena and OWL-API libraries.
Jena: Although claimed to be "incomplete", https://jena.apache.org/documentation/notes/iri.html offers some acceptable description of how to deal with IRIs. In particular, Jena seems to put great attention to validation.
OWL-API: There's some Javadoc for their IRI
class but no other specific IRI documentation AFAICT (https://github.com/owlcs/owlapi/wiki). In particular, I'm not seeing any explicit methods to validate a string as representing an IRI.