看你博客过来的,想问scrapy里面重写RetryMiddleware的问题
lvouran opened this issue · 2 comments
lvouran commented
只要值settings里面设置了retry_times的次数后,直到次数用完才会执行process_exception函数去删除代理,之后再次重新请求的时候,由于重试次数已经达到了限制,爬虫会整个停掉。所以想问,应该怎么写合适一点。具体代码如下,谢谢了
def delete_proxy(self, proxy):
if proxy:
res = requests.get(self.url.format(proxy.replace('https://', '').replace('http://', '')))
if 'success' in res.text:
self.logger.info('成功删除代理{}'.format(proxy))
else:
self.logger.info('代理删除失败:{}'.format(proxy))
def process_response(self, request, response, spider):
if request.meta.get('dont_retry', False):
return response
if response.status in self.retry_http_codes:
reason = response_status_message(response.status)
# 删除该代理
self.delete_proxy(request.meta.get('proxy', False))
return self._retry(request, reason, spider) or response
return response
def process_exception(self, request, exception, spider):
if isinstance(exception, self.EXCEPTIONS_TO_RETRY) \
and not request.meta.get('dont_retry', False):
# 删除该代理
self.delete_proxy(request.meta.get('proxy', False))
return self._retry(request, exception, spider)
lvouran commented
好吧,自己解决了,在中间件的配置里面把原生的中间件给关闭了就好了
Karmenzind commented
hmmmm 欢迎pull request