TeamHG-Memex/scrapy-rotating-proxies

DOWNLOAD_DELAY will not work if use proxy.

dodoflyy opened this issue · 1 comments

Hello, It seems scrapy spider's DOWNLOAD_DELAY will not work if I use this proxy.
In my script I set DOWNLOAD_DELAY=8 and enable random DOWNLOAD_DELAY

custom_settings = {
        "RETRY_TIMES": 7,
        "DOWNLOAD_DELAY": 8,
        "RANDOMIZE_DOWNLOAD_DELAY": True,
        "ROBOTSTXT_OBEY": False
    }

But the scrapy runs too fast.

INFO: Crawled 40 pages (at 40 pages/min), scraped 0 items (at 0 items/min)
kmike commented

This is by design - see https://github.com/TeamHG-Memex/scrapy-rotating-proxies#concurrency and

# FIXME: an option to use website address as a part of slot as well?

So DOWNLOAD_DELAY works, but in a different way. I think it'd be a good feature to have to allow disabling that; pull requests are welcome.