gkellogg/rdf-distiller

Handle UI input format and server Content-Type mismatch

csarven opened this issue · 1 comments

Under "Form URL", user inputs:
URI: http://csarven.ca/.ttl
Input format: rdfa, Output format: turtle

The Accept header is sent as application/n-triples, text/plain;q=0.2, application/n-quads, text/x-nquads;q=0.2, application/ld+json, application/x-ld+json, application/rdf+json, text/html;q=0.5, application/xhtml+xml;q=0.7, image/svg+xml;q=0.4, text/n3, text/rdf+n3;q=0.2, application/rdf+n3;q=0.2, text/turtle, text/rdf+turtle, application/turtle;q=0.2, application/x-turtle;q=0.2, application/rdf+xml, text/csv;q=0.4, text/tab-separated-values;q=0.4, application/csvm+json, application/trig, application/x-trig;q=0.2, application/trix, */*;q=0.1

If the server returns Content-Type: text/turtle, distiller behind the scenes still tries to parse the response body as RDFa. Returns Error:

Errors found during processing
<http://csarven.ca/.ttl>: error parsing attribute name
Tag http: invalid
Tag https: invalid
Tag irc: invalid
Namespace prefix mailto is not defined
Tag mailto:info invalid

There should be an alert to the user that the response was in a serialization format that's different than the input format that was selected in the UI, and maybe suggest that something along the lines of "Expected RDFa but parsing as Turtle". Same goes for any input format and actual Content-Type value.

I think you're correct that the distiller is not re-setting the Accept header when a specific format is selected. Arguably, RDF::Reader.open when given a :format option, should set Accept to those formats specifically defined for the reader associated with the given format, rather than send the whole list of formats along, and keep */*;q=0.1 as a fallback.

Probably shouldn't simply patch it in the distiller.