When I trigger the download event and get the file response, an exception is thrown because of the wrong Content-Encoding
Closed this issue · 1 comments
ma-pony commented
An exception is thrown because of the wrong content encoding when I fire the download event and get the file response.
2024-10-08 15:26:39 [scrapy.core.scraper] ERROR: Error downloading <GET http://www.yanan.gov.cn/gk/fdzdgknr/zdxm/sphzbaxx/1833020334224224258.html>
Traceback (most recent call last):
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration: <200 http://www.yanan.gov.cn/upload/yanan/2024/09/09/202409091350252175.pdf>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/core/downloader/middleware.py", line 68, in process_response
method(request=request, response=response, spider=spider)
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 90, in process_response
decoded_body = self._decode(
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 130, in _decode
return gunzip(body, max_size=max_size)
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/utils/gz.py", line 21, in gunzip
chunk = f.read1(_CHUNK_SIZE)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 314, in read1
return self._buffer.read1(size)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 488, in read
if not self._read_gzip_header():
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 436, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'%P')