Support multiple html pages
Closed this issue · 3 comments
Support multiple htmls if content is spread across pages.
Most lists like Kindle Top 100 books or Time Top 100 books are spread across 5 pages with 20 books/page.
http://www.amazon.com/Best-Sellers-Kindle-Store/zgbs/digital-text
It would be really easy if html supports an array of pages and crawls all 5 pages
class KindleTop100
include Scrapify::Base
html "http://amazon.com/kindle/1-25", "http://amazon.com/kindle/26-50", "http://amazon.com/kindle/51-75", "http://amazon.com/kindle/76-100"
end
What do you expect KindleTop100.url
to return in this case? I'm thinking of renaming it to urls
and hold all the values. Let me know if you think otherwise.
This is still in an experimental branch (nextpage). The implementation is not optimal because each page is fetched N times if there are N attributes without any caching.
#url or #urls won't matter bcos it's not supposed to be a public method like find and all.
https://github.com/sathish316/scrapify/blob/nextpage/spec/models/magazine.rb