servo/html5ever

Possible to disable sanitizing?

himat opened this issue · 3 comments

himat commented

Hi, according to the w3 specs, you're not allowed to have a <div> inside of a <p>, but I came across a website that had such a case. When I parse it with html5ever, it turns the <p><div></div></p> into <p></p> <div></div> <p></p>.

But for my use case, I need the DOM to be exactly as it was originally without any changes.

Is it possible to disable the sanitization or checker that runs that handles such cases when running so that no fixing is done?

jdm commented

The html5ever parsing algorithm is intentionally forgiving for input that isn't valid when parsing web pages. We are not interested in maintaining a stricter mode of operation that Servo would not use.

himat commented

@jdm But I'm asking for a less strict mode?
I just want the DOM to be parsed without following the w3c specs exactly.

jdm commented

Sorry, I did mean less strict rather than stricter. The point is that it's not a mode that the Servo web browser would use, and it's easier for us to maintain a parser that matches the specification as closely as possible.