http.client.IncompleteRead crash during extract
Closed this issue · 1 comments
chfoo commented
Traceback (most recent call last):
File "/0/home/waxy/usr/local/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/0/home/waxy/usr/local/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/0/home/waxy/usr/local/lib/python3.4/site-packages/warcat/__main__.py", line 154, in <module>
main()
File "/0/home/waxy/usr/local/lib/python3.4/site-packages/warcat/__main__.py", line 70, in main
command_info[1](args)
File "/0/home/waxy/usr/local/lib/python3.4/site-packages/warcat/__main__.py", line 131, in extract_command
tool.process()
File "/0/home/waxy/usr/local/lib/python3.4/site-packages/warcat/tool.py", line 112, in process
raise e
File "/0/home/waxy/usr/local/lib/python3.4/site-packages/warcat/tool.py", line 106, in process
self.action(record)
File "/0/home/waxy/usr/local/lib/python3.4/site-packages/warcat/tool.py", line 229, in action
shutil.copyfileobj(response, f)
File "/0/home/waxy/usr/local/lib/python3.4/shutil.py", line 66, in copyfileobj
buf = fsrc.read(length)
File "/0/home/waxy/usr/local/lib/python3.4/http/client.py", line 500, in read
return super(HTTPResponse, self).read(amt)
File "/0/home/waxy/usr/local/lib/python3.4/http/client.py", line 529, in readinto
return self._readinto_chunked(b)
File "/0/home/waxy/usr/local/lib/python3.4/http/client.py", line 621, in _readinto_chunked
n = self._safe_readinto(mvb)
File "/0/home/waxy/usr/local/lib/python3.4/http/client.py", line 680, in _safe_readinto
raise IncompleteRead(bytes(mvb[0:total_bytes]), len(b))
http.client.IncompleteRead: IncompleteRead(7052 bytes read, 16384 more expected)
waxpancake commented
A little more information to reproduce this crash... I was running warcat on this 25GB megawarc using this command:
python3 -m warcat extract ~/archives/incoming/upcoming_20130420095943.megawarc.warc.gz --output-dir expanded/ --verbose --progress
It dies right after extracting this file:
INFO:warcat.tool:Extracted <urn:uuid:6eebc1d1-cdda-4e1a-b499-184e9681f1e6> to expanded/upcoming.yahoo.com/event/2715307/LA/New-Orleans/The-Louisiana-State-Museum-Jazz-Collection/Louisiana-State-Museum/_index_da39a3
Hope that helps.