TeamHG-Memex/scrapy-rotating-proxies

Default middleware priority 610 is too close to downloader for compressed responses.

Granitosaurus opened this issue · 0 comments

The readme suggests to use these settings:

DOWNLOADER_MIDDLEWARES = {
    # ...
    'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
    'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
    # ...
}

A problem can arise here is that BanDetectonMiddleware runs before base downloader middleware scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware (it's on 590), so if you were to try to access response.body or response.body_as_unicode() in your ban policy you'd get an error on compressed responses:

raise NotSupported("Response content isn't text")

So to avoid this issue the recommended location for rotating_proxies middlewares should be < 590, and probably above RetryMiddleware 550.