Wimmics/corese

[Bug] Turtle parsing of prefixed URIs trigger an error

MaillPierre opened this issue · 1 comments

Issue Description:
An exception is triggered when parsing a turtle document containing some prefixed URIs with the % after the prefix.

Bug Details:

The exception triggered is Lexical error at line 2, column 21. Encountered: "%" (37), after: ""

Steps to Reproduce:
Load the example:

@prefix ex: <http://example.com/> .
ex:1 ex:property ex:%3CspanStyle .

Expected Behavior:

Strings looking like %[alphanum] are common in URIs as they result from the encoding into URL characters of Unicode characters. Because of this, they are part of the authorized characters in a URI. Furthermore, the Turtle recommendation states clearly that they are acceptable in prefixed names (Cf the note in section 6.3).

Actual Behavior:

The loading is said to have been completed after raising the exception, not sure that the entirety of the file has been loaded

System
Tested on corese-gui-4.5.0.jar and corese-server-4.5.0.jar

Adding another case: URIs containing "-" characters such as ORCiD URLs.

Example:

File test.ttl, generated by a Corese server:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ns177: <http://orcid.org/> .

<http://example.org> dc:contributor ns177:0000-0001-6938-0820 ;
  dc:contributor ns177:0000-0002-0643-3144 ;
  dc:contributor ns177:0000-0002-5711-4872 .

For any query, corese-command returns:

java -jar corese-command-4.5.0.jar sparql -i test.ttl -q query.rq -of text/csv
Error: Failed to parse RDF file. Check if file is well-formed and that the input format is correct. Encountered "-0001 -6938" at line 4, column 47.
Was expecting one of:
...