NatLibFi/bib-rdf-pipeline

Fails on bad URIs with Jena >3.1.1

osma opened this issue · 0 comments

osma commented

As demonstrated by the latest Travis build, newer Jena versions are stricter with URI parsing and thus the riot command used for converting from marc2bibframe2 output (RDF/XML) to N-Triples fails.

This is the same bad URI problem that was already discussed on the Jena users' list in October 2016, just more severe since Jena is stricter nowadays. The solution implemented back then (filter-bad-uris.py) comes too late in the pipeline.

I think the only viable solution is to catch bad URIs (or e.g. bad language tags in MARC records that will become bad URIs) before the BIBFRAME conversion step, preferably using Catmandu Fix scripts.

For now I will revert to Jena 3.1.1 because it still works.