Using strip_tags for real-world html sanitization is vulnerable to circumvention.
scryptonite opened this issue · 1 comments
Description
As the title says, it is possible to circumvent the purpose of strip_tags
by crafting a string so that the final output still contains uncensored HTML. This differs with the behavior of the same function in PHP-land, which seems to always guarantee that unpermitted html tags are removed.
Example:
const strip_tags = require("locutus/php/strings/strip_tags");
let treat = strip_tags('<script>console.log("everything is fine")</script>');
console.assert(treat == 'console.log("everything is fine")');
// > true
// Worked exactly as intended...
let trick = strip_tags('<<foo>script>console.log("all your base are belong to us")<</foo>/script>');
console.assert(trick == 'console.log("all your base are belong to us")');
// > false!
// It would be dangerous and unwise to put the contents of `trick` in browser-land without
// doing something else to the string.
I actually discovered and (ab)used this technique against a twitch overlay that was attempting to sanitize chat messages before displaying them on stream... Their 'fix' at the time was to remove all <
characters from chat messages being displayed. 💔
I think to reach parity with how strip_tags
works in PHP the function will need to recursively strip tags until there is nothing left to remove. I might also recommend adding a comment or remark somewhere that educates unwitting users that the htmlentities
function might be better suited for their sanitization needs in browser-land.
🎃
Thanks for reporting the issue. I'll try to fix it. Currently the function relies little too much on regular expressions, which are not well suited for HTML markup.