cheeriojs/cheerio

Possible bug around .text() and <noscript> tags

jacobrosenthal opened this issue · 3 comments

I would have expected this to log

test div1
test div2
const html = `<noscript>
<img src="https://www.abc123.com/blah.jpg"
srcset="https://www.abc123.com/blah.jpg"
alt="ABC" />
<div>test div1</div>
</noscript>
<body>
<img src="https://www.abc123.com/blah.jpg"
srcset="https://www.abc123.com/blah.jpg"
alt="ABC" />
<div>test div1</div>
</body>`;

    const $ = cheerio.load(html);
    $('*').each((_i, elem: any) => {
      const content = $(elem).clone().children().remove().end().text().trim();
      console.log(content)
    });

Instead it logs

<img src="https://www.abc123.com/blah.jpg"
srcset="https://www.abc123.com/blah.jpg"
alt="ABC" />
<div>test div1</div>

test div2

oh thanks. are there security concerns or downsides with scriptingEnabled?

In the case of these calls specifically $el.text() $el.prop("tagName") $el.find("img[alt]").attr("alt") $el.attr("href")