Invalid percent encoding
elad-shaked opened this issue · 3 comments
Line 184839 In https://downloads.dbpedia.org/repo/lts/transition/links/2019.02.01/links_domain=yago_lang=en.nt.bz2
<http://dbpedia.org/resource/555%> <http://www.w3.org/2002/07/owl#sameAs> <http://yago-knowledge.org/resource/555%25> .
Subject has a percent character at the end without any trailing encoding.
This fails at:
http://akswnc7.informatik.uni-leipzig.de:8088/
http://sparql.org/iri-validator.html
But succeeds at:
http://ttl.summerofcode.be/
Hi thank you are right this is not correct. For transition there is no parsing enabled at the moment, since these are legacy artifact kept for reference. I leave the issue open because it needs to be clarified whether this triple is still extracted in the new releases.
Can we consider https://en.wikipedia.org/wiki/555%
as a correct IRI? Or percent sign must be also encoded as %25
in it?
Because it seems to me that for dbr
triples this problem is fixed but there are still triples with not encoded percent:
http://en.wikipedia.org/wiki/555% | http://xmlns.com/foaf/0.1/primaryTopic | http://dbpedia.org/resource/555%25
http://en.wikipedia.org/wiki/555% | http://purl.org/dc/elements/1.1/language | en
http://en.wikipedia.org/wiki/555% | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://xmlns.com/foaf/0.1/Document
as I said it is incorrect and should not be extracted like this but escaped. The strategy is to not escape any Unicode character unless it is violating the IRI standard (which it is in this case).