RDFa 1.0 xmlns namespace not parsed ?
tfrancart opened this issue · 3 comments
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:eli="http://data.europa.eu/eli/ontology#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="XHTML+RDFa 1.0" lang="fr">
<head>
<title>xxx</title>
<meta property="eli:passed_by" content="Foo" />
</head>
<body>
</body>
</html>
Parsed with the following code :
Model model = ModelFactory.createDefaultModel();
StreamProcessor streamProcessor = new StreamProcessor(RdfaParser.connect(JenaSink.connect(model)));
nu.validator.htmlparser.sax.HtmlParser reader = new nu.validator.htmlparser.sax.HtmlParser(XmlViolationPolicy.ALTER_INFOSET);
streamProcessor.setProperty(StreamProcessor.XML_READER_PROPERTY, reader);
streamProcessor.process(htmlPage.openStream(), htmlPage.toString());
return model;
Returns :
<file:/home/thomas/temp/test.html>
<eli:passed_by> "Foo"@fr .
Note how the prefix "eli" is not resolved. Are the prefix declarations using xmlns supported ? setting .setProperty(RdfaParser.RDFA_VERSION_PROPERTY, RDFa.VERSION_10)
doesn't change.
Is there anything I could do in the code to parse the above HTML without changing it ? if no, does anyone sees which modifications need to be done in the XHTML above ?
Thanks a lot !
Actually, I think the problem is in nu.validator.htmlparser.sax.HtmlParser that does not pass in the SAX events corresponding to the xmlns: declarations. The situation is a bit confuse because HTML, strictly speaking and as far as I can see, does not allow xmlns declarations, other than the html namespace. So I don't know what should happen if an alternate DTD is declared like in this case.
The same happens when preprocessing the HTML using TagSoup as suggested in #37. TagSoup removes the xmlns declarations.
hello,
I am getting an error at: "JenaSink.connect(model)" point. Error says: "The method connect(com.hp.hpl.jena.rdf.model.Model) in the type JenaSink is not applicable for the arguments (org.apache.jena.rdf.model.Model)"
Please help me with the problem.