The Unpacker fails to retrieve and unpack all the data while streaming with big data.
MasahiroYasumoto opened this issue · 2 comments
The Unpacker fails to retrieve and unpack all the data while streaming with a big data (e.g. 10GiB).
td-client-python uses msgpack-python internally to unpack the receiving data while streaming.
https://github.com/treasure-data/td-client-python/blob/1.2.1/tdclient/job_api.py#L220-L244
When the size of this file is 10GiB or above, I occasionally face the problem that the Unpacker fails to retrieve and unpack all the data while streaming, which result in premature termination without raising an error.
As a workaround, I rewrote the code as follows to first receive all the data, save it to a file, and unpack it from there, which seems to have solved the problem. Thus, I suspect this is a bug in Unpacker's handling of streaming input.
with open("temp.mpack", "wb") as output_file:
for chunk in res.stream(1024*1024*1024):
if chunk:
output_file.write(chunk)
with open("temp.mpack", "rb") as input_file:
unpacker = msgpack.Unpacker(input_file, raw=False)
for row in unpacker:
yield row
Unpacker can handle the file means Unpacker can handle >10GiB data.
Without reproduceer, I can not fix your issue.
Maybe, res object in your code has some file-unlike behavior. (I don't know what is self.get() and what is res in your code).
I recommend to use Unpacker.feed() method. You can be freed from "file-like" edge cases.
msgpack-python/msgpack/_unpacker.pyx
Lines 291 to 300 in 1408642
Thank you for your quick response! I'll try Unpacker.feed() and see if it can fix the problem.