HTML5 Validation
Closed this issue · 3 comments
hi @lddubeau,
does salve allow to validate html
files against the official w3c html5 .rng files?
If I read it correct salve does not support parsing the html file - is there any recommended parser which works with salve out of the box?
It should be fine, with some caveats:
-
salve-convert
is used to convert the schema to something salve understands internally. The input must be a Relax NG schema in XML (.rng
), rather than the compact notation.(.rnc
). I see all the files there are in the compact notation. However, since there's a 1 for 1 equivalence between the Relax NG in XML and the compact notation, it should just be a matter of using a tool liketrang
to convert the.rnc
files to.rng
files. -
As the comment at the top of
html5.rnc
states, the HTML needs to be converted to XML first. This must be also taken care of when trying to validate with salve. It may be possible to avoid doing an actual conversion of the HTML to XML. Instead, it is conceivable that whatever parses the HTML could just emit events as if the HTML were XML. It would make the HTML look like XML as far as salve is concerned. For instance, upon encountering the HTML<input ...>
it could emitenterStartTag
,leaveStartTag
, the events for the attributes and immediately emitendTag
to close the element. This would effectively replicate the sequence of event that would be emitted for the XML equivalent<input .../>
.
As far as recommended parsers go, I've used sax for the test suite. The main examples of its use are:
- https://github.com/mangalam-research/salve/blob/develop/lib/salve/parse.js
- https://github.com/mangalam-research/salve/blob/develop/test/validation.js
If you happen to have HTML/XML in a DOM tree you can also walk the tree and emit appropriate events. That's how it is done for wed.
Wow cool thanks for your quick response and all the insights 👍
I managed to convert the .rng
file using rng2srng
java -jar rng2srng.jar -c validator/schema/html5/html5.rnc > html5.rng
As a parser I would like to use https://github.com/fb55/htmlparser2 as it is already used by https://github.com/htmllint/htmllint but I have to find out if it is able to parse <input>
as <input/>
out of the box.
I'm going to close this, but questions may still be posted if necessary.
Two things that have changed since my initial answer:
-
For people who want to validate XML documents represented as DOM trees there's salve-dom.
-
Salve 6.0.0 no longer requires that schemas be processed ahead of time with the command line tool
salve-convert
. You can load salve, and callconvertRNGToPattern
to convert the Relax NG schema in XML form (salve still does not read the compact form directly) into salve's internal form and then use the result to get a walker on which to fire events.