eblondel/zen4R

Issue reading an HTML file after loading zen4R

maelle opened this issue ยท 7 comments

maelle commented

๐Ÿ‘‹ here!

I am seeing a surprising error message when reading an HTML file with xml2 but only after loading zen4R,

> xml2::read_html("<html><body><nav>bla</nav></body></html>")
{html_document}
<html>
[1] <body><nav>bla</nav></body>
> library("zen4R")
> xml2::read_html("<html><body><nav>bla</nav></body></html>")
librdf error - HTML parser error: Tag nav invalid
{html_document}
<html>
[1] <body><nav>bla</nav></body>

Any idea what might be the reason? Thank you!

r-lib/xml2#427

Can you repeat the exercise but instead of zen4R, use atom4R package, and next try to do it loading rdflib. I suspect this would be the right target. Thanks

I can't reproduce the issue on my side

maelle commented
> xml2::read_html("<html><body><nav>bla</nav></body></html>")
{html_document}
<html>
[1] <body><nav>bla</nav></body>
> library("atom4R")
> xml2::read_html("<html><body><nav>bla</nav></body></html>")
librdf error - HTML parser error: Tag nav invalid
{html_document}
<html>
[1] <body><nav>bla</nav></body>

and

> xml2::read_html("<html><body><nav>bla</nav></body></html>")
{html_document}
<html>
[1] <body><nav>bla</nav></body>
> library("rdflib")
> xml2::read_html("<html><body><nav>bla</nav></body></html>")
{html_document}
<html>
[1] <body><nav>bla</nav></body>

so I still see the message with atom4R but not rdflib.

maelle commented

FWIW I am on Ubuntu.

Ok thanks so first it has nothing to deal with zen4R, not even atom4R itself. However, in atom4R we load DCMI vocabularies and this is done through rdflib package function rdf_query. The call to this function is causing the error you see. You could try to run a rdf query and see if you get similar error

maelle commented

ah, indeed, thank you! Over to the rdflib repo I go. ๐Ÿ˜ธ