rdf2patch will flush null characters when character set is larger than 4096
mhoffman-tq opened this issue · 2 comments
The underlying error has already been reported to Jena as JENA-1920, but I thought it best to record here as well given this is where the issue was uncovered.
When attempting to create a patch file where the source .ttl has a character set larger than 4096 but less than 8192 characters in length will result in null characters being written to the patch file. This in turn will cause a RiotParseException when attempting to send the patch file to the rdf-delta-server.
Using the attached file data.ttl.txt (uploaded as txt)
dcmd r2p data.ttl > data.rdfp
followed by
(assumes the existence of 'examples' log)
dcmd add --server=http://localhost:1066 --log examples data.rdfp
Results in
org.apache.jena.riot.RiotParseException: [line: 6, col: 1 ] Failed to find a prefix name or keyword: (0;0x0000)
at org.apache.jena.riot.tokens.TokenizerText$ErrorHandlerTokenizer.error(TokenizerText.java:65)
at org.apache.jena.riot.tokens.TokenizerText.error(TokenizerText.java:1244)
at org.apache.jena.riot.tokens.TokenizerText.readPrefixedNameOrKeyword(TokenizerText.java:536)
at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:445)
at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:99)
at org.seaborne.patch.text.RDFPatchReaderText.apply1(RDFPatchReaderText.java:77)
at org.seaborne.patch.text.RDFPatchReaderText.read(RDFPatchReaderText.java:57)
at org.seaborne.patch.text.RDFPatchReaderText.apply(RDFPatchReaderText.java:67)
at org.seaborne.patch.RDFPatchOps.read(RDFPatchOps.java:182)
at org.seaborne.delta.cmds.append.toPatch(append.java:106)
at org.seaborne.delta.cmds.append.exec1(append.java:71)
at org.seaborne.delta.cmds.append.lambda$execCmd$0(append.java:64)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at org.seaborne.delta.cmds.append.execCmd(append.java:64)
at org.seaborne.delta.cmds.DeltaCmd.exec(DeltaCmd.java:107)
at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
at org.seaborne.delta.cmds.append.main(append.java:47)
at org.seaborne.delta.cmds.dcmd.main(dcmd.java:125)
at dcmd.main(dcmd.java:38)
Thanks for the details - indeed, it is the bug JENA-1920 and is fixed by apache/jena#762. When that rolls though to RDF Delta, it'll be fixed - a source code fix is to lift the BufferingWriter
source code out of Jena.
This fixed in codebase because Apache Jena has been upgraded to v 3.16.0.