User timeout caused connection failure.
Opened this issue · 1 comments
Don't know if it is because of the Amazon has detected our bot and block the IP?
But
https://www.amazon.com/best-sellers-video-games/zgbs/videogames/?aja
x=1&pg=3
Indeed doesn't existed, there is no page 3 there .
https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting-go%20ods/?ajax=1&pg=2
is correct, I can open it with chrome browser.
How can I set up the proxy?
Because of this error, it will lose all the data even already got some data from previous pages?
twisted.internet.error.TimeoutError: User timeout caused connection failure.
2018-11-19 23:40:32 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/best-sellers-video-games/zgbs/videogames/?aja
x=1&pg=3>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
scrapy.core.downloader.handlers.http11.TunnelError: Could not open CONNECT tunnel with proxy 46.38.52.36:8081 [{'status': 400, 'reason': b'B
ad Request'}]
2018-11-19 23:40:36 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting-go
ods/?ajax=1&pg=2>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoG
enerator
return g.throw(self.type, self.value, self.tb)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 320, in _cb
_timeout
raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout))
twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgb
s/sporting-goods/?ajax=1&pg=2 took longer than 30.0 seconds..
2018-11-19 23:41:51 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/best-sellers-software/zgbs/software/?ajax=1&p
g=2>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.TimeoutError: User timeout caused connection failure.
(1030, 'Got error 168 from storage engine')
total spent: 0:52:23.652052
done
add a proxy.json file in the amazon/amazon.
like this ["198.52.39.104:3128", "31.207.5.155:3128"]