rails/rails-html-sanitizer

In the sanitize method, the value of the `multiple` attribute of the html tag is missing.

naitoh opened this issue · 2 comments

Description

In the sanitize method, the value of the multiple attribute of the html tag is missing.

Steps to Reproduce

$ ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
$ gem list rails-html-sanitizer loofah nokogiri crass

*** LOCAL GEMS ***

rails-html-sanitizer (1.5.0)
loofah (2.20.0)
nokogiri (1.14.2 arm64-darwin
crass (1.0.6)

No problem case

> Rails::Html::SafeListSanitizer.new.sanitize('<select multiplea="bar"></select>', tags: %w(select), attributes: %w(multiplea))
=> "<select multiplea=\"bar\"></select>"

problem case

> Rails::Html::SafeListSanitizer.new.sanitize('<select multiple="bar"></select>', tags: %w(select), attributes: %w(multiple))
=> "<select multiple></select>"

I would expect <select multiple="bar"></select> response.

Hi @naitoh, thanks for asking this question.

What you're seeing is behavior from libxml2 (Nokogiri's HTML4 parser):

Nokogiri::HTML4('<select multiplea="bar"></select>').to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n" +
#    "<html><body><select multiplea=\"bar\"></select></body></html>\n"

Nokogiri::HTML4('<select multiple="bar"></select>').to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n" +
#    "<html><body><select multiple></select></body></html>\n"

This is because, in HTML4, the multiple attribute on a select element is considered to be a "boolean" attribute which means it's either present or absent, and does not have a value:

https://www.w3.org/TR/html401/interact/forms.html#h-17.6

It's worth noting, though, that HTML5 and Nokogiri's HTML5 parser (libgumbo) has slightly different behavior:

Nokogiri::HTML5('<select multiplea="bar"></select>').to_html
# => "<html><head></head><body><select multiplea=\"bar\"></select></body></html>"

Nokogiri::HTML5('<select multiple="bar"></select>').to_html
# => "<html><head></head><body><select multiple=\"bar\"></select></body></html>"

however, that attribute is still considered boolean in HTML5 and should only have an empty value or the value "multiple".

Related, I'm working on updating rails-html-sanitizer to use Nokogiri's HTML5 parser, see Release 2.21.0.rc1 / 2023-04-02 · flavorjones/loofah and #133 for some work in progress.

@flavorjones
I now understand the reason for this.
Thanks!