sparklemotion/nokogiri

:first-child broken with libxml 2.9.0

betelgeuse opened this issue · 27 comments

A test suite for my application started failing and I think this is due to upgrading libxml to 2.9.0. It seems nokogiri is not able to do first-child properly with it atm:

1.9.3-p327 :007 > Nokogiri::XML('<foo><bar/></foo>').css('bar:first-child')
 => [] 
1.9.3-p327 :008 > Nokogiri::XML('<foo><bar/></foo>').css('bar')
 => [#<Nokogiri::XML::Element:0x11826430 name="bar">] 

verified by downgrading:

WARNING: Nokogiri was built against LibXML version 2.9.0, but has dynamically loaded 2.8.0
1.9.3p327 :001 > Nokogiri::XML('<foo><bar/></foo>').css('bar:first-child')
 => [#<Nokogiri::XML::Element:0x5ed69c6 name="bar">] 

How did you downgraded libxml?

@lucasgertel using the package manager of my Linux distribution (emerge on Gentoo)

+1 for me also nth-child(n) seems to be broken with libxml 2.9.0

Please note that Team Nokogiri hasn't made any effort yet to support libxml 2.9.0. It's next on my list after we release 1.5.7.

The problem seems odd. comparing on another machine running 2.8.0 works normally. In 2.9.0 it appears that a blank node is found for each odd :nth-child. You can see the debugger output to prove this issue

https://gist.github.com/Themitchell/5135296

So 1st, 3rd and 5th nodes work which should be 1st, 2nd and 3rd (and indeed are correctly returned as 1st, 2nd, 3rd on a 2.8.0 system)

Hope that gives a little more context on the issue.

This appears to be a bug in libxml 2.9.0. I will be submitting a bug report to Daniel Veillard today. In the meantime, downgrade to 2.8.0 if you can.

To be specific, it appears that the xpath function position() is off-by-one. Keeping in mind that Nokogiri transforms CSS to XPath:

puts Nokogiri::CSS.xpath_for("foo[2]")           # => "//*[position() = 2 and self::foo]"
puts Nokogiri::CSS.xpath_for("foo:nth-child(2)") # => "//*[position() = 2 and self::foo]"

We can run the following script:

xml = <<EOXML
<root>
  <foo>1</foo>
  <foo>2</foo>
  <foo>3</foo>
  <foo>4</foo>
</root>
EOXML

doc = Nokogiri::XML xml

puts Nokogiri::VERSION_INFO["libxml"]

%w{foo[1] foo[2] foo[3] foo[4]}.each do |css|
  puts "---- #{css}"
  puts (doc.at_css(css) || "")
end

and for 2.8.0 and 2.9.0 get the following (different) outputs:

{"binding"=>"extension", "compiled"=>"2.8.0", "loaded"=>"2.8.0"}
---- foo[1]
<foo>1</foo>
---- foo[2]
<foo>2</foo>
---- foo[3]
<foo>3</foo>
---- foo[4]
<foo>4</foo>

and

{"binding"=>"extension", "compiled"=>"2.9.0", "loaded"=>"2.9.0"}
---- foo[1]

---- foo[2]
<foo>1</foo>
---- foo[3]
<foo>2</foo>
---- foo[4]
<foo>3</foo>

The bug reported on libxml2 is resolved as NOTABUG.

OK, so the libxml2 team is saying that the XPath queries in 2.8.x and earlier releases of libxml2 are buggy, and 2.9.0 is doing the right thing:

https://bugzilla.gnome.org/show_bug.cgi?id=695699#c3

So I'm going to (gag) put in a libxml2-version-specific workaround (gag).

Hmm. I'm pushing back against the libxml2 list on this one. Libxml 2.9.0 is the only implementation I can find with behavior, which makes it suspect. Hang tight and I'll let you know how it progresses.

Please note that Daniel has confirmed that this behavior appears to be a bug in libxml 2.9.0.

Possibly relevant to followers of this issue: Nokogiri 1.6.0.rc1 (just released) includes libxml 2.8.0, which is compiled and installed at gem-install time. So, if you've got libxml 2.9.0 installed on your system and can't work around it, please try Nokogiri 1.6.0.rc1 and let me know how you get on.

@flavorjones is that intended as a temporary workaround? I hope so to avoid bundled libraries down the road.

@flavorjones Nokogiri 1.6.0.rc1 works fine for me with libxml 2.9.0 - thanks for the hint

@betelgeuse Bundling libxml is probably the future. Read the release notes for info on how to use system libraries.

@flavorjones ok that at least makes sure Linux distributions can continue using system libraries. Are you open to having that info also in the README?

To all Mac users wondering how to solve this issue. It turns out I've had libxml 2.9.0 installed through the homebrew. You can check directory /usr/local/opt/libxml2. If it's exists then that's the case. The issue was solved for me after uninstalling it (brew uninstall libxml2) and re-installing nokogiri gem.

I can also confirm that using 1.6.0.rc1 fixed the problem for me.

Here is my test case: https://gist.github.com/mkwiatkowski/5612006

Anyone having this problem / nth-child problem on Mavericks, I found that 1.6.0 apparently didn't solve the problem, but it turned out (with help from http://blog.planetargon.com/entries/2013/10/24/os-x-mavericks-failing-specs-and-libxml and #742) that the solution was to have Nokogiri explicitly listed very early in my Gemfile before other dependencies load system libxml2 (probably pg was the problem).

BTW tried using brew to install libxml2 2.8.0 & corresponding libxslt -- worked in a limited fashion but gave other problems elsewhere (too many apple-supplied libraries depends on libxml2 2.9+ ) - even changing DYLD_LIBRARY_PATH just for test runs was a problem (save_and_open_page would fail for example).

moving the nokogiri entry in my Gemfile above all other gems solved the problem for me as well

I had the same problem using nokogiri 1.5.10 (to be used in ree)

I fixed it by manually linked to libxml2 v2.8.0 nokogiri on install

gem install nokogiri -v 1.5.10 -- 
--with-xml2-include=/usr/local/Cellar/libxml2/2.8.0/include/libxml2 
--with-xml2-lib=/usr/local/Cellar/libxml2/2.8.0/lib 
--with-xslt-dir=/usr/local/Cellar/libxslt/1.1.28 
--with-iconv-include=/usr/local/Cellar/libiconv/1.13.1/include 
--with-iconv-lib=/usr/local/Cellar/libiconv/1.13.1/lib

libxml2 and libxslt installed with homebrew, libiconv was installed from source.

Solved my problem by upgrading the Nokogiri gem to 1.6.1

same problem on Mavericks. currently unresolved.

nokogiri -v

# Nokogiri (1.6.3.1)
    ---
    warnings: []
    nokogiri: 1.6.3.1
    ruby:
      version: 2.1.2
      platform: x86_64-darwin13.0
      description: ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
      engine: ruby
    libxml:
      binding: extension
      source: system
      compiled: 2.9.0
      loaded: 2.9.0

This is fixed in 1.6.4 and later, which is when we upgraded to libxml 2.9.2.