Create ScrapedPage object

Question

Create ScrapedPage object

jeremybmerrill opened this issue 11 years ago · 1 comments

Which is what would be yielded out of Scraper#scrape instead of the HTML, the URL, and instance page's index, etc.

This ScrapedPage object -- which might inherit from Nokogiri::HTML -- would contain the raw HTML, the parsed HTML, the URL, the index page from which the instance page was linked (if present), a reference to the index page's ScrapedPage object, and the instance page's index (i.e. ordinal count) of pages linked to from the index page.

This would be a breaking change, so is farther away from being implemented into stable Upton.

Answer 1 · 2014-02-16T22:45:50.000Z

Implemented in future (for 1.0.0) in 31cbf41

Will be minimally breaking, since missing methods on Page are passed through to Nokogiri::HTML.

Maybe I should implement this even-less-breakingly in 0.4.0 by still passing the instance_index, instance_url, etc. attrs through to blk.call?