Using strip_tags for real-world html sanitization is vulnerable to circumvention.

Question

Using strip_tags for real-world html sanitization is vulnerable to circumvention.

scryptonite opened this issue 7 years ago · 1 comments

Description

As the title says, it is possible to circumvent the purpose of strip_tags by crafting a string so that the final output still contains uncensored HTML. This differs with the behavior of the same function in PHP-land, which seems to always guarantee that unpermitted html tags are removed.

Example:

const strip_tags = require("locutus/php/strings/strip_tags");


let treat = strip_tags('<script>console.log("everything is fine")</script>');
console.assert(treat == 'console.log("everything is fine")'); 
// > true
// Worked exactly as intended...


let trick = strip_tags('<<foo>script>console.log("all your base are belong to us")<</foo>/script>');
console.assert(trick == 'console.log("all your base are belong to us")'); 
// > false! 
// It would be dangerous and unwise to put the contents of `trick` in browser-land without 
//   doing something else to the string.

I actually discovered and (ab)used this technique against a twitch overlay that was attempting to sanitize chat messages before displaying them on stream... Their 'fix' at the time was to remove all < characters from chat messages being displayed. 💔

I think to reach parity with how strip_tags works in PHP the function will need to recursively strip tags until there is nothing left to remove. I might also recommend adding a comment or remark somewhere that educates unwitting users that the htmlentities function might be better suited for their sanitization needs in browser-land.

🎃

Answer 1 · 2017-11-01T06:41:30.000Z

Thanks for reporting the issue. I'll try to fix it. Currently the function relies little too much on regular expressions, which are not well suited for HTML markup.