coerce(str) results in Text node rather than html
dan42 opened this issue · 8 comments
I ran into a surprise when trying to insert content into a document using element.before("<div/>")
For regular elements it works, but if it's a script element, "<div/>"
is escaped and inserted as text.
Apparently due to coerce
behaving weirdly for this case?
Nokogiri.HTML('<script>').at('//script').send(:coerce, '<div/>')
#=> [#<Nokogiri::XML::Element:0x2adb350529dc name="div">]
Nokogiri.HTML5('<script>').at('//script').send(:coerce, '<div/>')
#=> [#<Nokogiri::XML::Text:0x2adb3505bb2c "<div>">]
Nokogiri.HTML5('<span>').at('//span').send(:coerce, '<div/>')
#=> [#<Nokogiri::XML::Element:0x2adb3505b014 name="div">]
I think the source of this bug is in Nokogiri but not in a way that matters for Nokogiri.
What's happening is that when #before
is called, it ends up calling #add_sibling
which calls #coerce
, as you've discovered. #coerce
sees that the input is a string and then calls #fragment
on it.
Here's the documentation for #fragment
.
###
# Create a DocumentFragment containing +tags+ that is relative to _this_
# context node.
And this explains what's happening. self
is a script
element. When Nokogumbo parses an HTML fragment with a script
element as the context element, it does what the standard says to do in this case, namely to start tokenizing in the script data state
. Inside a script
element, <div/>
has no special meaning, so it returns a text node.
I think what should happen is the parent element's #coerce
should be called rather than the current element. That is the #add_sibling
should be changed from
def add_sibling(next_or_previous, node_or_tags)
impl = (next_or_previous == :next) ? :add_next_sibling_node : :add_previous_sibling_node
iter = (next_or_previous == :next) ? :reverse_each : :each
node_or_tags = coerce node_or_tags
...
to
def add_sibling(next_or_previous, node_or_tags)
impl = (next_or_previous == :next) ? :add_next_sibling_node : :add_previous_sibling_node
iter = (next_or_previous == :next) ? :reverse_each : :each
node_or_tags = parent.coerce node_or_tags
...
I verified that passing the parent element as the context element for the DocumentFragment
works in this case.
irb(main):001:0> doc = Nokogiri.HTML5('<script>')
=> #<Nokogiri::HTML5::Document:0x4c4 name="document" children=[#<Nokogiri::XML::Element:0x4b0 name="html" children=[#<Nokogiri::XML::Element:0x488 name="he...
irb(main):002:0> doc.at('//script').before('<div/>')
=> #<Nokogiri::XML::Element:0x474 name="script">
irb(main):003:0> doc.to_s
=> "<html><head><div></div><script></script></head><body></body></html>"
irb(main):004:0> doc
=> #<Nokogiri::HTML5::Document:0x4c4 name="document" children=[#<Nokogiri::XML::Element:0x4b0 name="html" children=[#<Nokogiri::XML::Element:0x488 name="head" children=[#<Nokogiri::XML::Element:0x4d8 name="div">, #<Nokogiri::XML::Element:0x474 name="script">]>, #<Nokogiri::XML::Element:0x49c name="body">]>]>
but I did that by modifying Nokogiri::HTML5::Node#fragment
which is going to be incorrect in general.
I'm not sure why this doesn't affect Nokogiri. Maybe the rules are different for earlier HTML or maybe there's some bug in xmlParseInNodeContext
.
@flavorjones Do you have any thoughts on this? Should I create a PR with this change for Nokogiri? I'm not really sure how I would test it though.
Hey, @stevecheckoway. I agree with your take, that Nokogiri should be calling #coerce
on the parent node when adding siblings.
I'm looking at how to test this by examining what's going wrong. You're right that xmlParseInNodeContext
does not differentiate between a script
tag and any other element when it parses in-context, which is why we haven't caught this bug before now.
I'm going to spike on a testing approach that will assert that the method is called on the parent, and see how I feel about it.
@stevecheckoway PR sparklemotion/nokogiri#2116 is making its way through CI now. If it goes green, and I think it will, it'll be in v1.11.0.rc4, hopefully within the next week.
@flavorjones Great! Thanks so much for looking into this and getting it fixed so quickly.
@stevecheckoway I'm going to spend some time today getting nokogiri master in shape to integration-test this with nokogumbo. I'm going to reopen this until I can do that, if that's OK?
EDIT: the work I'm referring to is at sparklemotion/nokogiri#1788
By all means!
Thanks to everyone for the very fast fix!
OK, I'm going to close this since Nokogiri master is once again compatible with Nokogumbo, and I've also got a draft PR at #163 to further improve the integration. Thanks all for your patience!