In the sanitize method, the value of the `multiple` attribute of the html tag is missing.
naitoh opened this issue · 2 comments
Description
In the sanitize method, the value of the multiple attribute of the html tag is missing.
Steps to Reproduce
$ ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
$ gem list rails-html-sanitizer loofah nokogiri crass
*** LOCAL GEMS ***
rails-html-sanitizer (1.5.0)
loofah (2.20.0)
nokogiri (1.14.2 arm64-darwin
crass (1.0.6)
No problem case
> Rails::Html::SafeListSanitizer.new.sanitize('<select multiplea="bar"></select>', tags: %w(select), attributes: %w(multiplea))
=> "<select multiplea=\"bar\"></select>"
problem case
> Rails::Html::SafeListSanitizer.new.sanitize('<select multiple="bar"></select>', tags: %w(select), attributes: %w(multiple))
=> "<select multiple></select>"
I would expect <select multiple="bar"></select> response.
Hi @naitoh, thanks for asking this question.
What you're seeing is behavior from libxml2 (Nokogiri's HTML4 parser):
Nokogiri::HTML4('<select multiplea="bar"></select>').to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n" +
# "<html><body><select multiplea=\"bar\"></select></body></html>\n"
Nokogiri::HTML4('<select multiple="bar"></select>').to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n" +
# "<html><body><select multiple></select></body></html>\n"This is because, in HTML4, the multiple attribute on a select element is considered to be a "boolean" attribute which means it's either present or absent, and does not have a value:
https://www.w3.org/TR/html401/interact/forms.html#h-17.6
It's worth noting, though, that HTML5 and Nokogiri's HTML5 parser (libgumbo) has slightly different behavior:
Nokogiri::HTML5('<select multiplea="bar"></select>').to_html
# => "<html><head></head><body><select multiplea=\"bar\"></select></body></html>"
Nokogiri::HTML5('<select multiple="bar"></select>').to_html
# => "<html><head></head><body><select multiple=\"bar\"></select></body></html>"however, that attribute is still considered boolean in HTML5 and should only have an empty value or the value "multiple".
Related, I'm working on updating rails-html-sanitizer to use Nokogiri's HTML5 parser, see Release 2.21.0.rc1 / 2023-04-02 · flavorjones/loofah and #133 for some work in progress.
@flavorjones
I now understand the reason for this.
Thanks!