CSV on the Web metadata files are not valid JSON-LD
Opened this issue · 10 comments
At https://www.w3.org/ns/csvw#datatype-definitions, the URI Template data type used for example by the valueUrl property is defined to be a subclass of xsd:anyURI
csvw:uriTemplate a rdfs:Datatype;
rdfs:label "uri template"@en;
rdfs:comment """"""@en;
rdfs:subClassOf xsd:anyURI;
rdfs:isDefinedBy csvw: .
However, URI Templates can contain {
and }
characters which are not syntactically valid in various places in a URI. I believe this rdfs:subClassOf
statement is mistaken and renders some csvw
annotations which use URI templates syntactically invalid.
I discovered this while attempting to deposit a csvw
metadata file in JSON-LD format into a Fedora repository, which attempted to parse it as RDF, and gave me the following stack trace:
java.lang.IllegalArgumentException: Illegal character in path at index 7: medium-{photographic_media_type}
at java.net.URI.create(URI.java:852)
at java.net.URI.resolve(URI.java:1036)
at com.github.jsonldjava.utils.JsonLdUrl.resolve(JsonLdUrl.java:274)
at com.github.jsonldjava.core.Context.expandIri(Context.java:538)
at com.github.jsonldjava.core.Context.expandValue(Context.java:1099)
at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:979)
at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:819)
at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:517)
at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:819)
at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:819)
at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:997)
at com.github.jsonldjava.core.JsonLdProcessor.expand(JsonLdProcessor.java:146)
at com.github.jsonldjava.core.JsonLdProcessor.toRDF(JsonLdProcessor.java:482)
at org.apache.jena.riot.lang.JsonLDReader.read$(JsonLDReader.java:143)
at org.apache.jena.riot.lang.JsonLDReader.read(JsonLDReader.java:83)
at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:259)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:245)
at org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:69)
at org.apache.jena.rdf.model.impl.ModelCom.read(ModelCom.java:305)
at org.fcrepo.http.api.ContentExposingResource.replaceResourceWithStream(ContentExposingResource.java:627)
at org.fcrepo.http.api.FedoraLdp.createOrReplaceObjectRdf(FedoraLdp.java:364)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:522)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1095)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:672)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1504)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1460)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.URISyntaxException: Illegal character in path at index 7: medium-{photographic_media_type}
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3105)
at java.net.URI$Parser.parse(URI.java:3063)
at java.net.URI.<init>(URI.java:588)
at java.net.URI.create(URI.java:850)
... 68 more
A bit more investigation revealed the problem is actually that the properties aboutUrl
, propertyUrl
, and valueUrl
, are declared to have a @type
of @id
in the CSVW JSON-LD context file. In RDF terms this makes them an object property and requires that the property values are valid URIs, though in fact they are in general not valid URIs because they generally contain URI Template markup such as {
, :
, }
and other reserved characters.
By patching the csvw context to change these properties to have a @type
of xsd:string
, and changing my CSV metadata file to refer to that patched context instead of https://www.w3.org/ns/csvw, I was able to make my metadata file into valid JSON-LD, and I was still able to use it with CSV2RDF software to interpret CSV.
"aboutUrl": {
"@id": "csvw:aboutUrl",
"@type": "xsd:string"
},
"propertyUrl": {
"@id": "csvw:propertyUrl",
"@type": "xsd:string"
},
"valueUrl": {
"@id": "csvw:valueUrl",
"@type": "xsd:string"
}
@Conal-Tuohy Yes, you're right; not sure how we missed this. It would seem that changing the context and RDFS definitions for URI Template Properties would do the trick, as I don't see any specific references to this in the metadata document, itself. However, I'd like to be sure that doing this doesn't break something else.
@iherman I'm not sure this rises to the level of an Erratum, as no recommendation will change, just the context and RDFS definitions, which aren't normative. Still, no harm in adding this to the Erratum document.
A previous version of the JSON-LD Context erroneously defined
csvw:uriTemplate
as being a subclass ofxsd:anyURI
, and specified the@type
ofaboutUrl
,propertyUrl
, andvalueUrl
as being@id
. As a URI Template includes the characters{
and}
, which are not valid in a URI, the context has been changed to change the subclass ofcsvw:uriTemplate
toxsd:string
, and defined the@type
of affected properties tocsvw:uriTemplate
. This should have no affect on CSVW Processors which treat CSVW Metadata documents as JSON, rather than RDF.
Alternatively, we could eliminate csvw:uriTemplate
, and just make the values xsd:string
, but it really shouldn't affect processors in any case.
@gkellogg good to check, of course, but I don't expect changing the ontology will break anything; my guess is that existing implementations of the CSV2RDF spec are ignoring the RDF semantics and processing the metadata files as JSON. Otherwise someone else would surely have raised my issue already.
I think it would be good to retain the data type csvw:uriTemplate
(rather than just using xsd:string
, as I did, above) so as to be able to constrain the lexical space of URI templates with a regex. This could catch unmatched {
and }
characters, at least.
@gkellogg I think it is good to record this as an erratum; it also gives a stronger historical point why the context and ontology files have changed.
@gkellogg in your proposed erratum, the second occurrence of "subclass" should be "superclass". Cheers!
Summary: in the ontology, the csvw:uriTemplate
(data)type is defined as a subclass of xsd:anyURI
, although the text says that URI patterns can also be used. This is a bug in the context file and the ontology (not in the written recommendations, though). The ontology files have been modified to refer to xsd:string
instead.
The files have been updated on the W3C site, see the PR #850.
Thanks for the very speedy fix