Issue w/ upgrade to rails 4.2.0

Question

Issue w/ upgrade to rails 4.2.0

mattt416 opened this issue 11 years ago · 33 comments

Apologies if I'm reporting this in the wrong place, or if I'm doing something incorrectly, but I noticed the following changed behaviour upon upgrading to rails 4.2.0. If I put this in an ERB template:

<p><%= strip_tags("&") %></p>

... and then view page source, I see:

<p>&amp;amp;</p>

If I add rails-deprecated_sanitizer to my Gemfile and try again, I see:

<p>&amp;</p>

Please let me know if I can provide any further information.

Thanks!

--Matt

kaspth commented 10 years ago

Sure

Answer 1 · 2015-01-12T15:32:30.000Z

The internal parsing is done by Nokogiri, so the version affect your results.

Here's a test using Nokogiri 1.6.5, the code is equivalent to strip_tags:

Rails::Html::FullSanitizer.new.sanitize('&') # => &amp;

What Nokogiri version are you using?

Answer 2 · 2015-01-12T15:32:59.000Z

Oh, and thanks for filing ❤️

Answer 3 · 2015-01-12T16:47:26.000Z

Hi @kaspth,

I can't seem to replicate this via the console with rails-deprecated_sanitizer commented in my Gemfile:

irb(main):001:0> helper.strip_tags('&')
=> "&amp;"
irb(main):002:0> Rails::Html::FullSanitizer.new.sanitize('&')
=> "&amp;"
irb(main):003:0>

According to my Gemfile.lock, I am using nokogiri 1.6.5.

--Matt

Answer 4 · 2015-01-12T16:49:53.000Z

Ok, try leaving your bundle out of it for now. Can you try requiring Rails Html Sanitizer in irb and then running the code?

Answer 5 · 2015-01-15T08:14:39.000Z

Hi @kaspth,

Apologies for the delay. I just spun up a new Ubuntu 14.10 VM and installed ruby (2.1.2p95), rails 4.2.0 (which pulled in nokogiri 1.6.5 somewhere along the way), etc. to rule out my local environment being the cause here. Unfortunately, I still see the correct behaviour when popping into irb and doing require 'rails-html-sanitizer', but when I create a generic view w/ <%= strip_tags('&') %> in it I still see &amp; in the page source.

--Matt

Answer 6 · 2015-01-15T08:18:54.000Z

Hold on a second, when you're viewing page source you see it escaped, but what does it look like when you view it on the page?

Signs like '&' should be escaped, be '&', in the source code.

Kasper

Den 15/01/2015 kl. 09.14 skrev Matt Thompson notifications@github.com:

Hi @kaspth,

Apologies for the delay. I just spun up a new Ubuntu 14.10 VM and installed ruby (2.1.2p95), rails 4.2.0 (which pulled in nokogiri 1.6.5 somewhere along the way), etc. to rule out my local environment being the cause here. Unfortunately, I still see the same thing when popping into irb and doing require 'rails-html-sanitizer', but when I create a generic view w/ <%= strip_tags('&') %>' in it I still see&` in the page source.

--Matt

—
Reply to this email directly or view it on GitHub.

Answer 7 · 2015-01-15T08:34:16.000Z

Hi @kaspth,

If I drop this into a template:

This is a test <%= strip_tags('&') %>

I see this in the page source:

This is a test &amp;

... and this on the page itself:

This is a test &

Hope that helps. :)

--Matt

Answer 8 · 2015-01-15T08:48:36.000Z

Does the double escaping happen on the Linux machine you set up as well?

Kasper

Den 15/01/2015 kl. 09.34 skrev Matt Thompson notifications@github.com:

Hi @kaspth,

If I drop this into a template:

This is a test <%= strip_tags('&') %>

I see this in the page source:

This is a test &

... and this on the page itself:

This is a test &

Hope that helps. :)

--Matt

—
Reply to this email directly or view it on GitHub.

Answer 9 · 2015-01-15T09:21:44.000Z

Hi @kaspth,

I see the same behaviour on all. When I reported this bug I was seeing this issue on my dev laptop (Mac OS X) and production (Ubuntu) both running ruby 1.9.3. I then created a test Ubuntu VM (ruby 2.1.2) and created a dummy rails app to eliminate it being something in my app itself. The issue appears the same across all three.

--Matt

Answer 10 · 2015-02-03T08:10:23.000Z

A bit off-topic, but why does strip_tags convert special chars to html entities - & to &, quotes to " etc? This is not the documented behaviour, which just states:

Strips all HTML tags from the html, including comments.

Answer 11 · 2015-02-03T09:40:14.000Z

@mtarnovan this is what rails/rails#18527 was about, which got closed as a duplicate of this. I've noted there what I use a workaround for now.

Answer 12 · 2015-02-03T10:00:01.000Z

@kommen Thanks. If I understood the issues correctly they don't seem to be duplicates.

Answer 13 · 2015-02-16T15:50:31.000Z

Not sure if this refers to @mtarnovan or @mattt416 problem, but strip_tags is calling loofag's text method which encodes special chars by default:

Loofah.fragment("H&M").text
#=> "H&amp;M"
Loofah.fragment("H&M").text(encode_special_chars: false)
#=> "H&M"

Is this something that should be added to https://github.com/rails/rails-html-sanitizer/blob/master/lib/rails/html/sanitizer.rb#L25 ? Not sure about security implications....

Answer 14 · 2015-02-24T11:58:13.000Z

We're closer to a solution on #31, so I'll close this in favor of that. Thanks for your help and sorry it took so long ❤️

Answer 15 · 2015-02-24T20:17:45.000Z

thanks @kaspth!

Answer 16 · 2015-03-02T09:02:27.000Z

Just to be clear, does #31 fix ampersand encoding? It seems to from the description, but the problem is stated only in terms of newlines.

Answer 17 · 2015-03-02T09:06:22.000Z

As far as I see #31 is still open. But as soon as encode_special_chars: false is used internally, it will also fix ampersand encoding, at least it did work for me.

Answer 18 · 2016-02-26T16:12:45.000Z

hi, I still have this bug. How to get rid of it ? I need to strip html tag, but to also to keep & and normal < and > which are not html tag.
thanks

Answer 19 · 2016-03-04T07:16:07.000Z

Hi @kaspth and @lacco
After testing, #31 does not actually fix this issue. Here is me using Rails 4.2.5.2
and ampersand still gets escaped.
Can you confirm and reopen this issue please? :P

Loading development environment (Rails 4.2.5.2)
irb(main):001:0> ActionController::Base.helpers.strip_tags("test\r\n\r\ntest")
=> "test\r\n\r\ntest"
irb(main):002:0> ActionController::Base.helpers.strip_tags("a & b")
=> "a &amp; b"
irb(main):003:0>

Answer 20 · 2016-03-04T08:29:55.000Z

We don't bump rails-html-sanitizer versions on Rails upgrades. You're required to upgrade that yourselves. So are you on the latest version of this library?

Answer 21 · 2016-03-04T08:39:12.000Z

@kaspth I am on 1.0.3, so should be the latest right?

Loading development environment (Rails 4.2.5.2)
irb(main):001:0> ActionController::Base.helpers.strip_tags("test\r\n\r\ntest")
=> "test\r\n\r\ntest"
irb(main):002:0> ActionController::Base.helpers.strip_tags("a & b")
=> "a &amp; b"
irb(main):003:0> Rails::Html::Sanitizer::VERSION
=> "1.0.3"

Answer 22 · 2016-03-09T20:12:38.000Z

@lulalala why are you calling this on Action Controller?

It works fine on Action View:

kaspers-macbook-pro:~ kasperhansen$ irb -r action_view
irb(main):001:0> include ActionView::Helpers::SanitizeHelper
=> Object
irb(main):002:0> strip_tags('a & b')
=> "a & b"
irb(main):003:0>

However, it could also have to do with your Nokogiri version or even your Libxml version.

Answer 23 · 2016-03-10T01:31:06.000Z

Because the other issue used ActionController::Base.helpers.strip_tags("test\r\n\r\ntest") as the example :)

irb(main):001:0> include ActionView::Helpers::SanitizeHelper
=> Object
irb(main):002:0> strip_tags('a & b')
=> "a &amp; b"
irb(main):003:0> Nokogiri::VERSION
=> "1.6.7.2"

My libxml is:

xsltproc --version
Using libxml 20900, libxslt 10128 and libexslt 817

I believe those would be libxml2 2.9.0, libxslt 1.1.28, and libexslt 0.8.17.

What are yours?

Answer 24 · 2016-03-10T20:17:48.000Z

I remember now. This was removed because it introduced a security issue. Read more here: 49dfc15

cc @rafaelfranca

Answer 25 · 2016-04-12T07:43:08.000Z

Hi @kaspth , so what exactly is the solution here? :) I thought this problem went away but now I'm seeing this &amp; again. Any info appreciated!

--Matt

Answer 26 · 2016-04-15T08:36:12.000Z

Nothing more to say than my comment just above did :)

Answer 27 · 2016-04-15T15:23:32.000Z

So would you consider reopen this issue, so more people can discussion how to deal with it? Because otherwise this method remains unusable.

Answer 28 · 2016-04-15T17:58:51.000Z

there is no way to fix it. It will generate escaped HTML that will be displayed correctly in the HTML page. If you escaping & it will be &amp;. If you are escaping & it will be &.

Answer 29 · 2016-04-16T01:04:06.000Z

@rafaelfranca then can it be marked as html_safe??

Answer 30 · 2021-12-08T06:24:24.000Z

That's looks like a major issue!

I've thought as per best practice a function does one thing and one thing only and used strip_tags to remove tags from user input BEFORE saving it into DB

Now I have DB full of stuff like "Bank Name" = "JPMorgan Chase & Co"

Moreover, if you do

link_to(strip_tags("JPMorgan Chase & Co"))

You'll get link to JPMorgan Chase & Co

How come it's not a major bug

IMO it should either JUST strip tags (without escaping it, without replacing new lines with   or I don't know replacing * ... * with  ... )
Or perhaps to deprecate it and call new function strip_tags_and_escape

Or AT LEAST we need to put warnings in docs so people will know that it can be used only in narrow use case (only direct output into HTML, not even in link_to etc)

Answer 31 · 2021-12-08T14:23:15.000Z

@dutgcom Lots to unpack here, I'll try to help.

First: this issue has been closed for almost six years. My advice is that you should open a new issue if you'd like to discuss something. I'll be locking this issue after this response, but I'd be happy to continue the conversation in a new issue!

This library is intended to be used in the view layer at page render time, and not by Active Record to sanitize database records. Though I can point you at Loofah::ActiveRecord which is intended precisely for that.

Finally, though, if you've html-sanitized a string before putting it into your database, I'm not sure why you're surprised to see HTML entities present in the string. Replacing < with <, as an example, is an important part of HTML sanitization.

If you want finer-grained control over sanitization (say, stripping but not replacing entities) then please take a look at Loofah which provides a feature that does this (see https://github.com/flavorjones/loofah/blob/main/lib/loofah/instance_methods.rb#L84-L87).

Answer 32 · 2021-12-08T15:33:48.000Z

I'll update the README today with some of this info as well.