propublica/upton

Handle pagination out-of-the-box

Closed this issue · 2 comments

bxjx commented

It would be nice if upton handled common implementations of pagination with minimal configuration.

As the docs point out, you've already made it super easy to handle paginated indexes by overriding next_index_page_url, but I think it could be nice to have it implemented neatly as part of the library. It could maybe be enabled with an instance variable like propubscraper.paginate = true. There could possibly be other options to set the query string parameter name (by default use page or p) and to set the maximum number of results to scrape.

I'm happy to give you a pull request if you think it's worth doing. Thanks for the useful gem btw!

Hey @bxjx, that sounds awesome. I'm not sure exactly how to implement that, but I'd love to hear your proposed solution and would definitely accept a pull request.

Closed ages ago.