owlcs/owlapi

Accept Headers to include all MIME types with supporting parsers

Closed this issue · 3 comments

We received the post below on the Protege mailing list. Is there a good way to deal with this?

It seems that Protegé doesn't put HTTP headers when loading ontologies from URLs.

I return either plain json, or json-ld depending on the HTTP "Accept" header:
https://elixir.bsc.es/tool/bio.tools:pmut/web/mmb.irbbarcelona.org

Could Protegé set appropriate headers while loading ontologies (i.e. "Accept: application/ld+json")?

The algorithm used by RDF4J to generate a weighted (using "q" values) HTTP Accept header is at:

https://github.com/eclipse/rdf4j/blob/a283e758fb79e7b62e94b669547e6e64c64106cf/core/rio/api/src/main/java/org/eclipse/rdf4j/rio/RDFFormat.java#L242

To get an equivalent function for OWLAPI, we would need to have document formats (OWLDocumentFormatFactory) specify their content types (possibly using Optional<String> or Optional<List<String>> as a return type to allow some document formats such as the in-memory types to specify an empty result). Then you would go through the list of parsers that were registered, pull out their document format and construct the Accept header in a similar way.

The weightings in RDF4J are based on the ability of the format to support namespaces and quads/graphs as opposed to just triples. That part may not be as simple for OWLAPI.

The MIMETypeAware interface is exposing the known MIME types for formats - not all formats have mime types, but that can be amended simply enough.

The weight issue is more complicated, because atm we're opening the remote stream once and attempting to read and cache locally the remote streams only once - so we can't use the same weighing algorithm without (for failures) attempting to download the same file multiple times; if the remote is serving the same ontology in multiple formats, or ignoring the accept headers, this could cause a big slowdown.

We could approximate the weights with a 'guess', so to speak - it would still provide most of the functionality, because as long as the formats served are among the supported ones we'll still get the data. This might be suboptimal - if we declare to prefer a format whose parser has bugs, we might end up stuck because of a bad choice.

However, there's a workaround already in the ontology source interface, where a mime type can be specified; we could use that to override the default headers.

So, the good news is that none of this requires interface changes.

Tested on the ontology mentioned in the original email, the negotiated content type is now set to application/ld+json, as desired. Now to cherry pick on version 4.