Retrying request download causes new pages to be opened

Question

Retrying request download causes new pages to be opened

nichoi opened this issue 4 years ago · 1 comments

Hi, thanks for the useful project.

I have a middleware that retries a request up to X times, each time using a new proxy. This means _download_request is called X times, and as a result, X pages are created.

I am using a forked version of your project to close all pages before opening a new one. Wondering what you think of this solution / would you be interested in a contribution?

Also wondering if it would be possible to reuse the same page? New to pyppeteer.

https://github.com/TeamHG-Memex/scrapy-rotating-proxies

Answer 1 · 2020-12-02T16:58:20.000Z

Hi, thanks for the interest in this project.
Is this an issue of performance, memory usage, or both? Or something else? To be honest I didn't think this would be a problem, pages are relatively short-lived, as they're closed right after response content is read.
Closing all pages before opening a new one doesn't sound right from a concurrency standpoint, I'd be more inclined to add a way to request the handler to reuse a certain page. I'd be interested in seeing how you modified the handler, though.
Thanks again!