sparklemotion/nokogiri

Segfault with readability

julien-duponchelle opened this issue · 11 comments

Hi,

I got segfault on nokogiri when i use it with ruby-readability:
require 'rubygems'
require 'readability'
require 'open-uri'

source = open('http://lab.arc90.com/experiments/readability/').read
puts Readability::Document.new(source).content

/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:809: [BUG] Segmentation fault
ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-darwin10.6.0]

-- control frame ----------
c:0044 p:---- s:0170 b:0170 l:000169 d:000169 CFUNC :native_write_to
c:0043 p:0189 s:0163 b:0163 l:000162 d:000162 METHOD /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:809
c:0042 p:0181 s:0153 b:0153 l:000152 d:000152 METHOD /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:728
c:0041 p:0143 s:0144 b:0144 l:000143 d:000143 METHOD /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:748
c:0040 p:0014 s:0140 b:0140 l:001b60 d:000139 BLOCK /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:617
c:0039 p:---- s:0137 b:0137 l:000136 d:000136 FINISH
c:0038 p:---- s:0135 b:0135 l:001c20 d:000134 IFUNC
c:0037 p:0015 s:0133 b:0132 l:000122 d:000131 BLOCK /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:239
c:0036 p:---- s:0129 b:0129 l:000128 d:000128 FINISH
c:0035 p:---- s:0127 b:0127 l:000126 d:000126 CFUNC :upto
c:0034 p:0023 s:0123 b:0123 l:000122 d:000122 METHOD /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238
c:0033 p:---- s:0119 b:0119 l:000118 d:000118 FINISH
c:0032 p:---- s:0117 b:0117 l:001c20 d:001c20 CFUNC :map
c:0031 p:0017 s:0114 b:0114 l:001b60 d:001b60 METHOD /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:617
c:0030 p:0034 s:0110 b:0110 l:000bf8 d:000109 BLOCK /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/ruby-readability-0.2.3/lib/readability.rb:216
c:0029 p:0015 s:0107 b:0107 l:000097 d:000106 BLOCK /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:239
c:0028 p:---- s:0104 b:0104 l:000103 d:000103 FINISH
c:0027 p:---- s:0102 b:0102 l:000101 d:000101 CFUNC :upto
c:0026 p:0023 s:0098 b:0098 l:000097 d:000097 METHOD /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238
c:0025 p:0021 s:0094 b:0094 l:000bf8 d:000bf8 METHOD /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/ruby-readability-0.2.3/lib/readability.rb:213
c:0024 p:0065 s:0091 b:0091 l:000e00 d:000e00 METHOD /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/ruby-readability-0.2.3/lib/readability.rb:49
c:0023 p:0027 s:0083 b:0082 l:000c18 d:000081 EVAL test.rb:6
c:0022 p:---- s:0080 b:0080 l:000079 d:000079 FINISH
c:0021 p:---- s:0078 b:0078 l:000077 d:000077 CFUNC :eval
c:0020 p:0028 s:0071 b:0071 l:000070 d:000070 METHOD /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/workspace.rb:80
c:0019 p:0033 s:0064 b:0063 l:000062 d:000062 METHOD /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/context.rb:254
c:0018 p:0031 s:0058 b:0058 l:0013d8 d:000057 BLOCK /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:159
c:0017 p:0042 s:0050 b:0050 l:000049 d:000049 METHOD /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:273
c:0016 p:0011 s:0045 b:0045 l:0013d8 d:000044 BLOCK /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:156
c:0015 p:0144 s:0041 b:0041 l:000024 d:000040 BLOCK /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:243
c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH
c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop
c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:229
c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch
c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:228
c:0008 p:0046 s:0022 b:0022 l:0013d8 d:0013d8 METHOD /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:155
c:0007 p:0011 s:0019 b:0019 l:001808 d:000018 BLOCK /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:70
c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH
c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch
c:0004 p:0183 s:0011 b:0011 l:001808 d:001808 METHOD /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:69
c:0003 p:0142 s:0006 b:0006 l:0014e8 d:002148 EVAL /Users/jd/.rvm/rubies/ruby-1.9.2-p136/bin/irb:16
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH

c:0001 p:0000 s:0002 b:0002 l:0014e8 d:0014e8 TOP

-- Ruby level backtrace information ----------------------------------------
/Users/jd/.rvm/rubies/ruby-1.9.2-p136/bin/irb:16:in <main>' /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:69:instart'
/Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:69:in catch' /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:70:inblock in start'
/Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:155:in eval_input' /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:228:ineach_top_level_statement'
/Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in catch' /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:229:inblock in each_top_level_statement'
/Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in loop' /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/ruby-lex.rb:243:inblock (2 levels) in each_top_level_statement'
/Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:156:in block in eval_input' /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:273:insignal_status'
/Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb.rb:159:in block (2 levels) in eval_input' /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/context.rb:254:inevaluate'
/Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/workspace.rb:80:in evaluate' /Users/jd/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/irb/workspace.rb:80:ineval'
test.rb:6:in irb_binding' /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/ruby-readability-0.2.3/lib/readability.rb:49:incontent'
/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/ruby-readability-0.2.3/lib/readability.rb:213:in transform_misused_divs_into_paragraphs!' /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238:ineach'
/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238:in upto' /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:239:inblock in each'
/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/ruby-readability-0.2.3/lib/readability.rb:216:in block in transform_misused_divs_into_paragraphs!' /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:617:ininner_html'
/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:617:in map' /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238:ineach'
/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:238:in upto' /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node_set.rb:239:inblock in each'
/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:617:in block in inner_html' /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:748:into_html'
/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:728:in serialize' /Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:809:inwrite_to'
/Users/jd/.rvm/gems/ruby-1.9.2-p136@xxx-xxx/gems/nokogiri-1.4.4/lib/nokogiri/xml/node.rb:809:in `native_write_to'

-- C level backtrace information -------------------------------------------

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

Thanks

Hi, can you provide us with the output of nokogiri -v? Thanks!

Yes of course:

jd$ nokogiri -v

warnings: []

nokogiri: 1.4.4
ruby:
version: 1.9.2
platform: x86_64-darwin10.6.0
engine: ruby
libxml:
binding: extension
compiled: 2.7.3
loaded: 2.7.3

In the interest of eliminating libxml2 as the problem, could you upgrade to libxml2 version 2.7.8 and see if the problem still occurs?

To upgrade, you should be able to either install with macports:

$ sudo port install libxml2 libxslt

and reinstall nokogiri:

$ sudo gem install nokogiri

You are right LibXML 2 is the problem....

I just rebuild with homebrew and it's work:
gem install nokogiri -- --with-xml2-include=/usr/local/Cellar/libxml2/2.7.7/include/libxml2 --with-xml2-lib=/usr/local/Cellar/libxml2/2.7.7/lib --with-xslt-dir=/usr/local/Cellar/libxslt/1.1.26

I had this issue, but I had 2.7.7. I upgraded to 2.7.8 and it worked. Perhaps something to do with the system installed version on Macs?

Thanks for the discussion here. I'll leave a comment about it in the Readability readme.

Commenting here: I met the same issue and upgrading to 2.7.8 indeed solved the issue.

Documented here: https://gist.github.com/1342913

I've encountered this problem and the app is hosted on Heroku, which is running 2.7.6. I opened a support ticket with them, and they sent me the following instructions:

I did all this, but I still can't seem to get nokogiri built against my vendored copy of libxml2. (When I 'heroku run bash' and 'nokogiri -v', I still see libxml2 2.7.6.) Is there a way to configure bundler to build nokogiri against libxml2 in a specific folder?

@patgannon You shouldn't have to roll your own libxml2 on Heroku anymore. I tried posting two apps using ruby_readability to their cedar stack and they worked without installing anything extra.

Hi,

could it be that this issue is back? I get segmentation faults again:

/Users/wieland/.rvm/gems/ruby-1.9.3-p194@newstral/gems/nokogiri-1.5.5/lib/nokogiri/xml/node.rb:830: [BUG] Segmentation fault
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.2]

I am on Mac OS X Lion. So I am having an old and outdated libxml2 and libxslt by default. This is why I followed the instructions on how to install Nokogiri with Homebrew 0.9: http://nokogiri.org/tutorials/installing_nokogiri.html

I simply changed the instructions on installing the the gem to libxml2 version 2.8.0 which is the current libxml2 version comming with brew install libxml2

gem install nokogiri -- --with-xml2-include=/usr/local/Cellar/libxml2/2.8.0/include/libxml2 --with-xml2-lib=/usr/local/Cellar/libxml2/2.8.0/lib --with-xslt-dir=/usr/local/Cellar/libxslt/1.1.26  --with-iconv-include=/usr/local/Cellar/libiconv/1.13.1/include  --with-iconv-lib=/usr/local/Cellar/libiconv/1.13.1/lib

Now nokogiri -v gives me:

# Nokogiri (1.5.5)
---
warnings: []
nokogiri: 1.5.5
ruby:
  version: 1.9.3
  platform: x86_64-darwin11.4.2
  description: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.2]
  engine: ruby
libxml:
  binding: extension
  compiled: 2.8.0
  loaded: 2.8.0

So this looks all fine. But what is weird is that I still get this message when launching Rails:

WARNING: Nokogiri was built against LibXML version 2.8.0, but has dynamically loaded 2.7.3

brew doctor is telling me this (and I think that apart from the MacGPG2 part, every thing looks expected):

Warning: "config" scripts exist outside your system or Homebrew directories.
`./configure` scripts often look for *-config scripts to determine if
software packages are installed, and what additional flags to use when
compiling and linking.

Having additional scripts in your path can confuse software installed via
Homebrew if the config script overrides a system or Homebrew provided
script of the same name. We found the following "config" scripts:

/Users/wieland/.rvm/gems/ruby-1.9.3-p194@newstral/bin/passenger-config
/usr/local/MacGPG2/bin/gpg-error-config
/usr/local/MacGPG2/bin/ksba-config
/usr/local/MacGPG2/bin/libassuan-config
/usr/local/MacGPG2/bin/libgcrypt-config
/usr/local/MacGPG2/bin/libusb-config
/usr/local/MacGPG2/bin/pth-config

Warning: Some keg-only formula are linked into the Cellar.
Linking a keg-only formula, such as gettext, into the cellar with
`brew link f` will cause other formulae to detect them during the
`./configure` step. This may cause problems when compiling those
other formulae.

Binaries provided by keg-only formulae may override system binaries
with other strange results.

You may wish to `brew unlink` these brews:

libxml2
libxslt
Warning: You may have installed MacGPG2 via the package installer.
Several other checks in this script will turn up problems, such as stray
dylibs in /usr/local and permissions issues with share and man in /usr/local/.
Warning: You have unlinked kegs in your Cellar
Leaving kegs unlinked can lead to build-trouble and cause brews that depend on
those kegs to fail to run properly once built. Run `brew link` on these:

libiconv

Any ideas anybody? Which important step miss I out? Or do I simply need to buy me a new Mac which allows me to run Mountain Lion? Btw on Debian my code works.

I had (as a final step) to put nokogiri further to the top of my Gemfile.

OK, I just found the solution here:

http://art-of-fine-code.redbubble.com/blog/2012/06/20/nokogiri-goes-bump-or-segfaults-in-the-night-dot-dot-dot/

It turns out that other gems in our Gemfile were also using libxml2, and they were loading the version supplied by the operating system. Then when Nokogiri would load, it would just use the already loaded library. By changing our Gemfile to list Nokogiri at the top, it would ensure the newer library would be loaded before the older one had a chance to, and thus (hopefully) fix the issue.