felipecsl/wombat

local file support

neves opened this issue · 3 comments

neves commented

Wombat can't parse local files:
/.gem/ruby/2.3.1/gems/wombat-2.5.1/lib/wombat/processing/parser.rb:33:in block (2 levels) in initialize': undefined method content_type' for #<Mechanize::FileResponse:0x007fe856a62d90> (NoMethodError)

I have the same issue. In principle it should work using the "file://" protocol with mechanize or creating a Mechanize page or file by hand and setting it via page(https://github.com/felipecsl/wombat/wiki) . However in practise this did not work for me. I really would like to see that work.

http://stackoverflow.com/questions/7586627/read-a-local-html-file-with-mechanize lists a FakeWeb workaround (another could be a dead simple proxy server), but I'd really like to see this working in wombat directly.

I would really prefer to use the page mechanism but have no time looking into why mechanize fails there.

For prototyping I now use webmock like this:

require 'webmock'
include WebMock::API

WebMock.enable!
stub_request(:get, "www.example.com").to_return(body: File.read("page.html"))

result = Wombat.crawl do
  base_url "http://www.example.com"
  path "/"
  # ...
end

Thanks @fwolfst! Your approach with webmock worked for me.