dask/hdfs3

Cannot load a pickle file with python 2.7

aabadie opened this issue · 0 comments

Here a small snippet to reproduce the issue:

import pickle
from hdfs3 import HDFileSystem                      

hdfs = HDFileSystem(host='localhost', port=8020)

# Dump works
with hdfs.open("/user/aabadie/test.pkl", "wb") as f:
    pickle.dump('test', f)

# Load fails (see stacktrace below)
with hdfs.open("/user/aabadie/test.pkl", "rb") as f:
    print(pickle.load(f))

Error:

---------------------------------------------------------------------------
EOFError                                  Traceback (most recent call last)
<ipython-input-5-97136b6bb693> in <module>()
      1 with hdfs.open("/user/aabadie/test.pkl", "rb") as f:
----> 2     print(pickle.load(f))
      3 

/home/aabadie/conda3/envs/py27/lib/python2.7/pickle.pyc in load(file)
   1382 
   1383 def load(file):
-> 1384     return Unpickler(file).load()
   1385 
   1386 def loads(str):

/home/aabadie/conda3/envs/py27/lib/python2.7/pickle.pyc in load(self)
    862             while 1:
    863                 key = read(1)
--> 864                 dispatch[key](self)
    865         except _Stop, stopinst:
    866             return stopinst.value

/home/aabadie/conda3/envs/py27/lib/python2.7/pickle.pyc in load_eof(self)
    884 
    885     def load_eof(self):
--> 886         raise EOFError
    887     dispatch[''] = load_eof
    888 

If I download the pickle file with hadoop, pickle is able to load it, meaning that the pickle dump worked.

$ hadoop fs -get hdfs://localhost/user/aabadie/test.pkl

The same code works in python 3.5. Maybe python 2.7 is not supported, in this case feel free to close.