rails/rails-html-sanitizer

Environment-based sanitizer difference with open lt tags

jackphelps opened this issue · 6 comments

On heroku:

Rails::Html::WhiteListSanitizer.new.sanitize("stuff < things")
=> "stuff "

Locally (all rails environments)

Rails::Html::WhiteListSanitizer.new.sanitize("stuff < things")
"stuff &lt; things"

I'd expect the local version to be correct, though I'm not sure what the developer intended.

We are using ruby 2.5.0 with the following locked dependencies shared across environments:
rails (4.2.10)
rails-html-sanitizer (1.0.3)
loofah (2.2.2)
crass (1.0.3)
nokogiri (1.8.2)

We all use macbooks on sierra or high sierra, it's the same in both.

It seems like the production case must affect more people than just us, so thought I'd report it. Please let me know if there's any other info I can provide, and thanks for all your hard work.

Which version of libxml is being used locally and in production. That is the cause for the difference in behavior.

Hi @rafaelfranca -- sorry for the delay, I missed the GH notification :(

On heroku (the borked one)

~ $ apt-cache policy libxml2
libxml2:
  Installed: 2.9.1+dfsg1-3ubuntu4.12
  Candidate: 2.9.1+dfsg1-3ubuntu4.12
  Version table:
 *** 2.9.1+dfsg1-3ubuntu4.12 0
        100 /var/lib/dpkg/status

And looks like I'm on 2.2 on my mac

That is the reason. The behavior of the sanitizer depends on the version of libxml. If you want consistent behavior, make sure you are in the same version.

Yes, I understand, and thanks for identifying libxml as the cause here; I guess my further question is: I thought the output when based on libxml2.9.1 was probably not the intended behavior; is that not the case?

I haven't dug through your source and don't know too much about libxml or the other dependencies, so not sure if it's something you can do anything about, but if it's a bug I could help notify other parties.

We don't know if < things can be transformed later to a tag by the next character in the next line so we consider it an unclosed tag, because of this it is removed from the output. How libxml consider that snippet changed to be more aggressive. I consider the behavior on libxml2.9.1 safer.

Understood, thanks for the clarity. We'll probably use the most stable older version going forward, we have separate sanitization methods for different cases and are confident our use here is safe.

Thanks again!