fb55/htmlparser2

Is it possible to prevent certain symbols from getting encoded?

joeosburn opened this issue · 2 comments

I am using htmlparser2 with cheerio. Here is sample code to reproduce the issue I'm running into:

const cheerio = require('cheerio');
const htmlparser2 = require('htmlparser2');
    
const template = `
<html>
    <body>
    {{@button { label: vals.map((n)=>n) }}}
    </body>
</html>
`;
  
const dom = htmlparser2.parseDocument(template, {
  xmlMode: false,
  decodeEntities: true
});

const doc = cheerio.load(dom);

console.log(doc.html());

The output of this is:

"\n<html>\n    <body>\n    {{@button { label: vals.map((n)=&gt;n) }}}\n    </body>\n</html>\n"

The > character gets encoded as &gt;. Is this expected behavior? Is there any way to stop this from happening? I've tried it with decodeEntities set to false and that does not seem to make any difference.

fb55 commented

You'll have a much easier time if you use Cheerio directly:

const doc = cheerio.load(template, { xml: { xmlMode: false, decodeEntities: false });

There should be a better way of telling Cheerio to use htmlparser2, but this works. If Cheerio isn't told to use htmlparser2, it will default to parse5 (for your example: parse5's serializer), which will always follow the HTML spec.

Thank you, that solved it.