Allow < or > characters in attribute values
tbranyen opened this issue · 1 comments
tbranyen commented
This bug was found in Nokogiri via @nbianca in Discourse. I tried this out in the diffHTML parser and it also bugged out. Producing incorrect results.
Here is how it works in the browser:
const parser = new DOMParser();
const result = parser.parseFromString(`<img src="<>">`, 'text/html');
console.log(result.querySelector('img').attributes.src.value); // <>
tbranyen commented
Notes on a potential fix: Inside the HTML parsing where attributesEx is used, detect when we are inside an attribute parse, and continue looping, instead of using a single pass; and build up the markup until we reach the logical end of the attribute. Also worth nothing that the browser does the best job it can, so if you provide markup like src=<>
without quotes, it will only parse up to <
, treating >
as closing.