postmodern/spidr

#<NoMethodError: undefined method `closed?' for nil:NilClass>

ethicalhack3r opened this issue · 1 comments

Hi.

I got the following error while spidering a site. I suspect it was because the remote site dropped the connection, however I am unsure.

<NoMethodError: undefined method `closed?' for nil:NilClass>

/usr/lib/ruby/1.8/net/http.rb:1060:in request' /usr/lib/ruby/1.8/net/http.rb:772:inget'
/usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:521:in get_page' /usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:693:inprepare_request'
/usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:520:in get_page' /usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:586:invisit_page'
/usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:256:in run' /usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:238:instart_at'
/usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:209:in site' /usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:136:ininitialize'
/usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:206:in new' /usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/agent.rb:206:insite'
/usr/lib/ruby/gems/1.8/gems/spidr-0.3.1/lib/spidr/spidr.rb:96:in `site'

I was thinking that a possible solution would be to wrap the agent.rb get_page method contents in a begin/rescue block, report the error, but carry on, or maybe try again?

def get_page(url)
url = URI(url.to_s)

  begin
  prepare_request(url) do |session,path,headers|
    new_page = Page.new(url,session.get(path,headers))

    # save any new cookies
    @cookies.from_page(new_page)

    yield new_page if block_given?
    return new_page
  end
 rescue => e
      puts '+++++ ERROR IN SPIDR GEM ' + e.inspect
      return ''
 end
end

Closing out older issues. Looks like the bug was coming from Ruby's own net/http. Also I dropped support for 1.8 long ago.