ISAITB/shacl-validator

[feature request] Validator does not implement content negotiation

Closed this issue · 3 comments

When validating SHACL with input URI, it seems the content negotiation does not work. Based on some Wireshark sniffing, it seems that every HTTP(S) request gets the following header:

Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2

Our metadata server thus thinks this is a browser and will return a regular HTML page, instead of machine readable metadata.

Desired behavior would be to have this header to be for example Accept: text/turtle when specifying for the Turtle syntax or Accept: application/ld+json for JSON/LD. Even better would of course for all supported syntax types to be specified when leaving the default on "Based on file extension", and the validator parsing the Content-Type header.

You're right @Markus92. When specifying a remote URI the validator doesn't currently leverage the Accept header (in the request) nor the Content-Type header of the response (from the remote server). Your proposal would be a nice improvement whereby (recapping as you suggested):

  • If the expected content type is specified, it figures in the request's Accept header.
  • If not specified, the Accept header is set with all supported RDF types and then the specific content type is determined using the Content-Type from the response.

Besides applying this for the content to validate we can extend the logic also to user-provided SHACL shapes (if a given validator instance supports them).

We'll work on this update asap. It might take a bit longer than usual due to the summer holidays but I'll ping you here as soon as the update is published.

Hi @Markus92. The validator (latest docker image and managed service) is now updated to correctly perform content negotiation as summarised earlier. In brief, the Accept header is set with the selected content type, or if one is not selected, to all supported RDF content types. The content type of the retrieved content is then determined from the response's Content-Type header (if present). The fix applies both to the content to validate as well as user-provided shapes (if applicable).

Once you've had the chance to check on your end, would you please confirm so that we close this issue? Thanks!

Hi @costas80 , I checked the latest version on a few endpoints that I know do support content negotation or content types and it works flawlessly!

Endpoints I tested:
orcid: https://orcid.org/0000-0002-0604-1204 (seems to negotiate turtle)
FAIR Data point: https://fdp.healthdata.nl (supports turtle, json/ld and xml. Judging by the validated content output, it grabs the XML, which I guess is the first type in the list).
Molgenis EMX2: https://emx2.dev.molgenis.org/api/fdp (doesn't negotiate, but makes clear its output is text/turtle).

Thanks a lot for picking this up, it's a nice QOL upgrade for our users.