rails/rails-html-sanitizer

Unfinished open tag being escaped

mariovisic opened this issue · 2 comments

Hi there

Moving from the built in rails 3 sanitizer to the one supplied with 4.2 yields a difference in the way string featuring the less than character are escaped. An example

# Pre-Rails 4.2: 
> sanitize("good<better") # => "good&lt;better" 

# Rails 4.2:
> sanitize("good<better") # => "good" 

To me; the old behavior seems more correct, there are plenty of valid uses for the less than symbol touching another character. For example, a simple ascii emoji:

# Pre-Rails 4.2: 
> sanitize("<:)") # => "&lt;:)" 

# Rails 4.2:
> sanitize("<:)") # => "" 

I'd write a failing test, but there seems to be tests already for the opposite behavior. https://github.com/rails/rails-html-sanitizer/blob/master/test/sanitizer_test.rb#L131-L133

Looking at the code, it seems like it may not even be possible to implement this behavior anymore due to the fact we're now using nokogiri instead of regepx matching for detecting tags.

Is that the case? What's the go here?

I suppose a bigger problem is that everything trailing the open tag character is removed, an innocent smiley face means that the end of your message is deleted

sanitize("<:) foo bar there hello") # => ""

... ice cream cones are fine though:

sanitize("<3)") #=> "&lt;3)" 

If the character proceeding the open tag is a number then it's not removed.

Yeah, that behavior is not possible anymore. It is by design that open tags are being removed. What you can do is to escape your content instead of sanitizing.