Exception at findTagContent in getHeadAndBodyContent.js

Question

Exception at findTagContent in getHeadAndBodyContent.js

Closed this issue 8 years ago · 2 comments

This function has severe limitations and causes breakage on some sites (null reference exception at line 8: Cannot read property '0' of null).

It doesn't account for whitespace in tags, e.g.: < body>, < / body > and others won't match.
It is case-sensitive while HTML tagnames are case-insensitive
It doesn't necessarily match the correct closing tag (shouldn't be a problem for the head/body use-case but then the function's name should probably be changed to reflect that, e.g.: findUniqueTagContent ).

I can submit a PR later if you want.

Answer 1 · 2016-10-24T15:36:17.000Z

Those are all good points! I think another issue with that function is that it will mistake a commented out <body> tag with the real thing, since it just operates on a string.

I'm using cheerio to parse HTML in other parts of the code, so I think that's the way to go here as well. However, if we just run

cheerio.load("<body><div ></div></body>")("body").html()

The result will be <div></div>, but findTagContent should return <div ></div> (extra space at end of div tag).

Thanks for offering to submit a PR! Let me know if what I wrote here makes sense :)

Answer 2 · 2016-10-26T23:01:56.000Z

Let me know if you're still interested in looking into this. Otherwise I'll try and fix it later this week.

Thanks again for reporting!