mathiasbynens/he

Edge case: does not decode example string on w3 spec

youming-lin opened this issue · 4 comments

I was testing encode/decode via https://mothereff.in/html-entities while cross-referencing the spec, and I noticed that he is not able to decode certain named references correctly. On the w3 spec page, it lists this example string, I'm &notit; I tell you, which should be parsed into I'm ¬it; I tell you with a parse error. he returns the string un-parsed. It appears that he is not able to parse legacy named references if there are one or more alphanumeric characters after the legacy named reference followed by a semicolon ; character. he parses correctly if the tail of alphanumeric characters ends with a character other than semicolon.

Good catch! Thanks for the excellent bug report.

Got bitten by this too, but can't find what would be the way to fix it in he...

Surely this has been fixed by now...

128th character in ASCII table which looks like a small square when printed with this code alert(String.fromCharCode(128)); is not being encoded. While it's next character 129 in ASCII is encoded as .