uqfoundation/dill

Error trying to serialize a binary file handle

neucer opened this issue · 3 comments

neucer commented

With fmode set to FILE_FMODE

import dill
dill.settings["fmode"] = dill.FILE_FMODE
with open('some_binary_file', 'rb') as file_handle:
    with open(f'some_binary_file.pkl', 'wb') as pkl_file:
        dill.dump(file_handle, pkl_file)

gives error

Traceback (most recent call last):
  File "C:\Users\neucer\PycharmProjects\pythonProject\tst.py", line 23, in <module>
    dill.dump(file_handle, pkl_file)
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 235, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 394, in dump
    StockPickler.dump(self, obj)
  File "C:\Program Files\Python310\lib\pickle.py", line 487, in dump
    self.save(obj)
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 388, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "C:\Program Files\Python310\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 1336, in save_file
    f = _save_file(pickler, obj, open)
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 1313, in _save_file
    fdata = f.read()
  File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1133: character maps to <undefined>

because the code tries to open the binary file as text.

The following seems to work for me, given python 3.8 and the latest dill:

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.settings['fmode'] = dill.FILE_FMODE
>>> f = open('xxx.db', 'wb')
>>> f.write(b'hello world')
11
>>> f.write(b'goodbye')
7
>>> f.close()
>>> f = open('xxx.db', 'rb')
>>> dill.dumps(f)
b'\x80\x04\x95a\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x12_create_filehandle\x94\x93\x94(\x8c\x06xxx.db\x94\x8c\x02rb\x94K\x00\x89\x8c\x02io\x94\x8c\x04open\x94\x93\x94\x89K\x02\x8c\x12hello worldgoodbye\x94t\x94R\x94.'
>>> dill.__version__
'0.3.8.dev0'

This also works:

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.settings['fmode'] = dill.FILE_FMODE
>>> with open('xxx.db', 'rb') as file_handle:
...   dill.dumps(file_handle)
... 
b'\x80\x04\x95a\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x12_create_filehandle\x94\x93\x94(\x8c\x06xxx.db\x94\x8c\x02rb\x94K\x00\x89\x8c\x02io\x94\x8c\x04open\x94\x93\x94\x89K\x02\x8c\x12hello worldgoodbye\x94t\x94R\x94.'
>>>

and this also works:

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.settings['fmode'] = dill.FILE_FMODE
>>> with open('xxx.db', 'rb') as file_handle:
...   with open('xxx.pkl', 'wb') as pkl_file:
...     dill.dump(file_handle, pkl_file)
... 
>>> with open('xxx.pkl', 'rb') as pkl_file:
...   dill.load(pkl_file)
... 
<_io.BufferedReader name='xxx.db'>

What is your version of dill`? It looks like you are using Python 3.10 on Windows. Can you confirm that, and give any further details? If my test code succeeds for you, then can you give an example binary file that fails so I can test it?

neucer commented

Yes, I confirm. But the difference is probably the file. Try this

import dill
dill.settings['fmode'] = dill.FILE_FMODE
f = open('xxx.db', 'wb')
f.write(b'\x81')
f.close()
f = open('xxx.db', 'rb')
dill.dumps(f)

I can reproduce the error with that code, thanks.

It would seem that the error is pretty self-contained, as it attempts to pickle the file and immediately fails in the registered function _save_file.

...
>>> dill.detect.trace(True)
>>> dill.dumps(f)
┬ Fi: <_io.BufferedReader name='xxx.db'>
Traceback (most recent call last).
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 278, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 250, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 418, in dump
    StockPickler.dump(self, obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pickle.py", line 487, in dump
    self.save(obj)
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 412, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 1362, in save_file
    f = _save_file(pickler, obj, open)
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 1339, in _save_file
    fdata = f.read()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte