`strip_tags(input).html_safe? # => false to true` ?

Question

`strip_tags(input).html_safe? # => false to true` ?

dorianmariecom opened this issue 4 years ago · 8 comments

It seems like strip_tags is escaping the input, shouldn't it be html_safe? then?

I don't like adding .html_safe in my code but seems like strip_tags already escapes the input for strip_tags(input).html_safe should be safe

what do you think?

Answer 1 · 2022-01-18T15:29:29.000Z

This seems reasonable to me, but I'd like to hear from @kaspth in case there's a reason this wasn't done before now.

Answer 2 · 2022-01-19T16:02:15.000Z

I'd say no, because we've stripped all the HTML tags and thus there's no HTML left to declare safe. And since the tags are gone <%= strip_tags(content) %>, which is really short for <%= html_escape strip_tags(content) %> wouldn't be any different regardless of html_safe? or not.

Unless there's something I'm missing here?

Answer 3 · 2022-01-19T21:19:01.000Z

My issue is that if I have an input like "Meet & Crush", I do:

> helper.strip_tags("Meet & Crush")
=> "Meet &amp; Crush"

So it escapes the & correctly, except that if I do:

> helper.strip_tags("Meet & Crush").html_safe?
=> false

So it will be escaped twice:

> helper.send(:h, helper.strip_tags("Meet & Crush"))
=> "Meet &amp;amp; Crush"

Which is what rails does in the views when the string is not html_safe?.

Which is why I think strip_tags(input).html_safe? should be true

Answer 4 · 2022-04-09T15:52:58.000Z

Sorry I've taken so long to respond - I've queued this up for a closer look this weekend.

Answer 5 · 2022-05-29T15:31:29.000Z

Sorry for the delay. Let me explain my mental models to help readers understand how I'm making the decision about this behavior.

Here's a quick demonstration of the difference between plain text serialization and html-safe plain text serialization:

frag = "<div>hello <!-- comment --><br>bye < then</div>" # note this has broken markup in it - unescaped `<`

# note that the broken markup gets fixed from `<` to `&lt;`
Nokogiri::HTML::DocumentFragment.parse(frag).to_html
# => "<div>hello <!-- comment --><br>bye &lt; then</div>"

# the naïve implementation of "strip tags" is to ask Nokogiri only to give us back the plaintext content of the HTML
# note that this is NOT html-safe because it contains a bare `<` character
Nokogiri::HTML::DocumentFragment.parse(frag).content
# => "hello bye < then"

# these two invocations are equivalent
# note that the `<` is escaped properly, meaning it's html-safe
>> Loofah.fragment(frag).text
=> "hello bye &lt; then"
>> strip_tags(frag)
=> "hello bye &lt; then"

The vulnerability that we're avoiding by using Loofah::DocumentFragment#text instead of Nokogiri::XML::Node#content is demonstrated here:

frag = "<div>&lt;<span>script</span>&gt;alert()&lt;<span>/script</span>&gt;</div>"

# this is why we cannot use Nokogiri::XML::Node#content for sanitization
Nokogiri::HTML::DocumentFragment.parse(frag).content
# => "<script>alert()</script>" # this is an XSS attack

Loofah.fragment(frag).text
# => "&lt;script&gt;alert()&lt;/script&gt;" # html-safe!
strip_tags(frag)
# => "&lt;script&gt;alert()&lt;/script&gt;" # html-safe!

So, what I hope I've shown above is that the content returned by strip_tags is indeed html-safe and the string should be marked as such.

Answer 6 · 2022-05-29T15:32:16.000Z

I'll work on fixing this for an upcoming release.

Answer 7 · 2022-05-30T15:23:41.000Z

See rails/rails#45218 for a PR that fixes this behavior.

Answer 8 · 2022-05-31T16:46:49.000Z

Note that the rails PR is being backported to 7-0-stable. Thanks again for pointing this out!