rails/rails-html-sanitizer

Issue w/ upgrade to rails 4.2.0

mattt416 opened this issue · 33 comments

Apologies if I'm reporting this in the wrong place, or if I'm doing something incorrectly, but I noticed the following changed behaviour upon upgrading to rails 4.2.0. If I put this in an ERB template:

<p><%= strip_tags("&") %></p>

... and then view page source, I see:

<p>&amp;amp;</p>

If I add rails-deprecated_sanitizer to my Gemfile and try again, I see:

<p>&amp;</p>

Please let me know if I can provide any further information.

Thanks!

--Matt

The internal parsing is done by Nokogiri, so the version affect your results.

Here's a test using Nokogiri 1.6.5, the code is equivalent to strip_tags:

Rails::Html::FullSanitizer.new.sanitize('&') # => &amp;

What Nokogiri version are you using?

Oh, and thanks for filing ❤️

Hi @kaspth,

I can't seem to replicate this via the console with rails-deprecated_sanitizer commented in my Gemfile:

irb(main):001:0> helper.strip_tags('&')
=> "&amp;"
irb(main):002:0> Rails::Html::FullSanitizer.new.sanitize('&')
=> "&amp;"
irb(main):003:0>

According to my Gemfile.lock, I am using nokogiri 1.6.5.

--Matt

Ok, try leaving your bundle out of it for now. Can you try requiring Rails Html Sanitizer in irb and then running the code?

Hi @kaspth,

Apologies for the delay. I just spun up a new Ubuntu 14.10 VM and installed ruby (2.1.2p95), rails 4.2.0 (which pulled in nokogiri 1.6.5 somewhere along the way), etc. to rule out my local environment being the cause here. Unfortunately, I still see the correct behaviour when popping into irb and doing require 'rails-html-sanitizer', but when I create a generic view w/ <%= strip_tags('&') %> in it I still see &amp;amp; in the page source.

--Matt

Hold on a second, when you're viewing page source you see it escaped, but what does it look like when you view it on the page?

Signs like '&' should be escaped, be '&', in the source code.

Kasper

Den 15/01/2015 kl. 09.14 skrev Matt Thompson notifications@github.com:

Hi @kaspth,

Apologies for the delay. I just spun up a new Ubuntu 14.10 VM and installed ruby (2.1.2p95), rails 4.2.0 (which pulled in nokogiri 1.6.5 somewhere along the way), etc. to rule out my local environment being the cause here. Unfortunately, I still see the same thing when popping into irb and doing require 'rails-html-sanitizer', but when I create a generic view w/ <%= strip_tags('&') %>' in it I still see&` in the page source.

--Matt


Reply to this email directly or view it on GitHub.

Hi @kaspth,

If I drop this into a template:

<p>This is a test <%= strip_tags('&') %></p>

I see this in the page source:

<p>This is a test &amp;amp;</p>

... and this on the page itself:

This is a test &amp;

Hope that helps. :)

--Matt

Does the double escaping happen on the Linux machine you set up as well?

Kasper

Den 15/01/2015 kl. 09.34 skrev Matt Thompson notifications@github.com:

Hi @kaspth,

If I drop this into a template:

This is a test <%= strip_tags('&') %>

I see this in the page source:

This is a test &amp;

... and this on the page itself:

This is a test &

Hope that helps. :)

--Matt


Reply to this email directly or view it on GitHub.

Hi @kaspth,

I see the same behaviour on all. When I reported this bug I was seeing this issue on my dev laptop (Mac OS X) and production (Ubuntu) both running ruby 1.9.3. I then created a test Ubuntu VM (ruby 2.1.2) and created a dummy rails app to eliminate it being something in my app itself. The issue appears the same across all three.

--Matt

A bit off-topic, but why does strip_tags convert special chars to html entities - & to &amp;, quotes to &quot; etc? This is not the documented behaviour, which just states:

Strips all HTML tags from the html, including comments.

@mtarnovan this is what rails/rails#18527 was about, which got closed as a duplicate of this. I've noted there what I use a workaround for now.

@kommen Thanks. If I understood the issues correctly they don't seem to be duplicates.

Not sure if this refers to @mtarnovan or @mattt416 problem, but strip_tags is calling loofag's text method which encodes special chars by default:

Loofah.fragment("H&M").text
#=> "H&amp;M"
Loofah.fragment("H&M").text(encode_special_chars: false)
#=> "H&M"

Is this something that should be added to https://github.com/rails/rails-html-sanitizer/blob/master/lib/rails/html/sanitizer.rb#L25 ? Not sure about security implications....

We're closer to a solution on #31, so I'll close this in favor of that. Thanks for your help and sorry it took so long ❤️

thanks @kaspth!

Just to be clear, does #31 fix ampersand encoding? It seems to from the description, but the problem is stated only in terms of newlines.

As far as I see #31 is still open. But as soon as encode_special_chars: false is used internally, it will also fix ampersand encoding, at least it did work for me.

hi, I still have this bug. How to get rid of it ? I need to strip html tag, but to also to keep & and normal < and > which are not html tag.
thanks

Hi @kaspth and @lacco
After testing, #31 does not actually fix this issue. Here is me using Rails 4.2.5.2
and ampersand still gets escaped.
Can you confirm and reopen this issue please? :P

Loading development environment (Rails 4.2.5.2)
irb(main):001:0> ActionController::Base.helpers.strip_tags("test\r\n\r\ntest")
=> "test\r\n\r\ntest"
irb(main):002:0> ActionController::Base.helpers.strip_tags("a & b")
=> "a &amp; b"
irb(main):003:0>

We don't bump rails-html-sanitizer versions on Rails upgrades. You're required to upgrade that yourselves. So are you on the latest version of this library?

@kaspth I am on 1.0.3, so should be the latest right?

Loading development environment (Rails 4.2.5.2)
irb(main):001:0> ActionController::Base.helpers.strip_tags("test\r\n\r\ntest")
=> "test\r\n\r\ntest"
irb(main):002:0> ActionController::Base.helpers.strip_tags("a & b")
=> "a &amp; b"
irb(main):003:0> Rails::Html::Sanitizer::VERSION
=> "1.0.3"

@lulalala why are you calling this on Action Controller?

It works fine on Action View:

kaspers-macbook-pro:~ kasperhansen$ irb -r action_view
irb(main):001:0> include ActionView::Helpers::SanitizeHelper
=> Object
irb(main):002:0> strip_tags('a & b')
=> "a & b"
irb(main):003:0> 

However, it could also have to do with your Nokogiri version or even your Libxml version.

Because the other issue used ActionController::Base.helpers.strip_tags("test\r\n\r\ntest") as the example :)

irb(main):001:0> include ActionView::Helpers::SanitizeHelper
=> Object
irb(main):002:0> strip_tags('a & b')
=> "a &amp; b"
irb(main):003:0> Nokogiri::VERSION
=> "1.6.7.2"

My libxml is:

xsltproc --version
Using libxml 20900, libxslt 10128 and libexslt 817

I believe those would be libxml2 2.9.0, libxslt 1.1.28, and libexslt 0.8.17.

What are yours?

I remember now. This was removed because it introduced a security issue. Read more here: 49dfc15

cc @rafaelfranca

Hi @kaspth , so what exactly is the solution here? :) I thought this problem went away but now I'm seeing this &amp;amp; again. Any info appreciated!

--Matt

Nothing more to say than my comment just above did :)

So would you consider reopen this issue, so more people can discussion how to deal with it? Because otherwise this method remains unusable.

Sure

there is no way to fix it. It will generate escaped HTML that will be displayed correctly in the HTML page. If you escaping &amp; it will be &amp;amp;. If you are escaping & it will be &amp;.

@rafaelfranca then can it be marked as html_safe??

That's looks like a major issue!

I've thought as per best practice a function does one thing and one thing only and used strip_tags to remove tags from user input BEFORE saving it into DB

Now I have DB full of stuff like "Bank Name" = "JPMorgan Chase &amp; Co"

Moreover, if you do

link_to(strip_tags("JPMorgan Chase & Co"))

You'll get link to JPMorgan Chase &amp; Co

How come it's not a major bug

IMO it should either JUST strip tags (without escaping it, without replacing new lines with <br> or I don't know replacing * ... * with <b> ... </b>)
Or perhaps to deprecate it and call new function strip_tags_and_escape

Or AT LEAST we need to put warnings in docs so people will know that it can be used only in narrow use case (only direct output into HTML, not even in link_to etc)

@dutgcom Lots to unpack here, I'll try to help.

First: this issue has been closed for almost six years. My advice is that you should open a new issue if you'd like to discuss something. I'll be locking this issue after this response, but I'd be happy to continue the conversation in a new issue!

This library is intended to be used in the view layer at page render time, and not by Active Record to sanitize database records. Though I can point you at Loofah::ActiveRecord which is intended precisely for that.

Finally, though, if you've html-sanitized a string before putting it into your database, I'm not sure why you're surprised to see HTML entities present in the string. Replacing < with &lt;, as an example, is an important part of HTML sanitization.

If you want finer-grained control over sanitization (say, stripping but not replacing entities) then please take a look at Loofah which provides a feature that does this (see https://github.com/flavorjones/loofah/blob/main/lib/loofah/instance_methods.rb#L84-L87).

I'll update the README today with some of this info as well.