sathish316/scrapify

Support multiple html pages

Closed this issue · 3 comments

Support multiple htmls if content is spread across pages.

Most lists like Kindle Top 100 books or Time Top 100 books are spread across 5 pages with 20 books/page.

http://www.amazon.com/Best-Sellers-Kindle-Store/zgbs/digital-text

It would be really easy if html supports an array of pages and crawls all 5 pages

class KindleTop100
  include Scrapify::Base
  html "http://amazon.com/kindle/1-25", "http://amazon.com/kindle/26-50", "http://amazon.com/kindle/51-75", "http://amazon.com/kindle/76-100"
end

What do you expect KindleTop100.url to return in this case? I'm thinking of renaming it to urls and hold all the values. Let me know if you think otherwise.

Duplicate of #29

This is still in an experimental branch (nextpage). The implementation is not optimal because each page is fetched N times if there are N attributes without any caching.

#url or #urls won't matter bcos it's not supposed to be a public method like find and all.

https://github.com/sathish316/scrapify/blob/nextpage/spec/models/magazine.rb