The test-input image caused by pickle cannot be read in the web-python environment
Closed this issue · 6 comments
I got an error while trying to read some test-input(.npy) files in jupyterlite(pyodide) environment.
It seems to be caused by using pickle during saving. I think this may also lead to some problems when using JavaScript libraries such as npy.js to read npy files.
Here are some detail information:
Image URLs used in this demo:
- normal image: https://zenodo.org/api/records/6647674/files/test_input_0.npy/content
- error image: https://zenodo.org/api/records/7781091/files/test-input.npy/content
Perhaps it should be explicitly stated in the spec that test input/output should not be saved using pickle?
The odd thing in a sense, is that the saved array contained Python objects:
Following the header comes the array data. If the dtype contains Python objects (i.e. dtype.hasobject is True), then the data is a Python pickle of the array. Otherwise the data is the contiguous (either C- or Fortran-, depending on fortran_order) bytes of the array. Consumers can figure out the number of bytes by multiplying the number of elements given by the shape (noting that shape=() means there is 1 element) by dtype.itemsize.
From: https://numpy.org/doc/1.13/neps/npy-format.html#format-specification-version-1-0
(I know this is slightly older version docs, but I don't think this aspect changed).
The user would not have been necessarily aware that pickle was used - it would probably have been turned on automatically when non-pure-numeric data was detected upon saving.
Given that pickle
is a standard built-in module, I'm a bit surprised that this error occurred - is this a limitation of pyodide's port of cpython?
Also - did you try setting encoding='bytes'
in the load
call? That gets passed on to the pickle load call...
I think we should enforce that all the numpy arrays should be saved with pickle=False
, since it's a security risk for all the model consumers.
Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data. Consider passing allow_pickle=False to load data that is known not to contain object arrays for the safer handling of untrusted sources.
Changed in version 1.16.3: Made default False in response to CVE-2019-6446.
see: https://numpy.org/doc/stable/reference/generated/numpy.load.html#numpy.load
Also - did you try setting
encoding='bytes'
in theload
call? That gets passed on to the pickle load call...
Tried it, no difference.
pickle=False
added [here] for next release
fix is released