Helper methods for scraping one page and for scraping multiple

Question

Helper methods for scraping one page and for scraping multiple

Opened this issue 11 years ago · 5 comments

That Scraper.new takes EITHER a url and a selector OR an array of URLs is confusing. Should keep both on new for backwards compatibility, but add a helper method for each pattern -- and use those helper methods in the README.

This will hopefully allay some of the confusion in #30 and address the API problems that were mentioned in #5 without such a dramatic refactor.

Answer 1 · 2014-02-15T22:21:33.000Z

Scraper#index will return a Scraper instance with (perhaps deferred for actual fetching later) on which a #scrape call will fetch the links on the index specified by the selector expression. Scraper#instances will return a Scraper instance on which a #scrape call will fetch the links on the index specified in the argument to #instances.

Answer 2 · 2014-02-15T22:26:47.000Z

I think for 1.0.0 the Scraper returned by "index" will immediately fetch the index page, so that the Scraper can be added to other scrapers, see #35. For now, it'll still only be fetched on#scrape.

Answer 3 · 2014-02-15T22:58:39.000Z

I changed my mind in the last 31 minutes.

For 0.4.0 the semantics of #initialize will change. The index page will be scraped immediately. However, the syntax will not change.

Answer 4 · 2014-02-16T00:14:07.000Z

Hmm, if it makes requests on the first call (e.g. Scraper.new, Scraper.index), when are options set? I guess as a hash on that first call? That'll be a breaking change. So I'll cue that up for 1.0.0

Answer 5 · 2014-02-16T18:53:53.000Z

Mostly implemented in future (1.0.0) at a25e84e

Partially implemented for 0.4.0 at 24cb65e