Karmenzind/EasyGoSpider

看你博客过来的,想问scrapy里面重写RetryMiddleware的问题

lvouran opened this issue · 2 comments

只要值settings里面设置了retry_times的次数后,直到次数用完才会执行process_exception函数去删除代理,之后再次重新请求的时候,由于重试次数已经达到了限制,爬虫会整个停掉。所以想问,应该怎么写合适一点。具体代码如下,谢谢了

def delete_proxy(self, proxy):
    if proxy:
        res = requests.get(self.url.format(proxy.replace('https://', '').replace('http://', '')))
        if 'success' in res.text:
            self.logger.info('成功删除代理{}'.format(proxy))
        else:
            self.logger.info('代理删除失败:{}'.format(proxy))

def process_response(self, request, response, spider):
    if request.meta.get('dont_retry', False):
        return response
    if response.status in self.retry_http_codes:
        reason = response_status_message(response.status)
        # 删除该代理
        self.delete_proxy(request.meta.get('proxy', False))
        return self._retry(request, reason, spider) or response
    return response

def process_exception(self, request, exception, spider):
    if isinstance(exception, self.EXCEPTIONS_TO_RETRY) \
            and not request.meta.get('dont_retry', False):
        # 删除该代理
        self.delete_proxy(request.meta.get('proxy', False))

        return self._retry(request, exception, spider)

好吧,自己解决了,在中间件的配置里面把原生的中间件给关闭了就好了

hmmmm 欢迎pull request