bioimage-io/spec-bioimage-io

The test-input image caused by pickle cannot be read in the web-python environment

Closed this issue · 6 comments

I got an error while trying to read some test-input(.npy) files in jupyterlite(pyodide) environment.
It seems to be caused by using pickle during saving. I think this may also lead to some problems when using JavaScript libraries such as npy.js to read npy files.

Here are some detail information:

image

Image URLs used in this demo:

Perhaps it should be explicitly stated in the spec that test input/output should not be saved using pickle?

jmetz commented

The odd thing in a sense, is that the saved array contained Python objects:

Following the header comes the array data. If the dtype contains Python objects (i.e. dtype.hasobject is True), then the data is a Python pickle of the array. Otherwise the data is the contiguous (either C- or Fortran-, depending on fortran_order) bytes of the array. Consumers can figure out the number of bytes by multiplying the number of elements given by the shape (noting that shape=() means there is 1 element) by dtype.itemsize.

From: https://numpy.org/doc/1.13/neps/npy-format.html#format-specification-version-1-0
(I know this is slightly older version docs, but I don't think this aspect changed).

The user would not have been necessarily aware that pickle was used - it would probably have been turned on automatically when non-pure-numeric data was detected upon saving.

Given that pickle is a standard built-in module, I'm a bit surprised that this error occurred - is this a limitation of pyodide's port of cpython?

jmetz commented

Also - did you try setting encoding='bytes' in the load call? That gets passed on to the pickle load call...

oeway commented

I think we should enforce that all the numpy arrays should be saved with pickle=False, since it's a security risk for all the model consumers.

Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data. Consider passing allow_pickle=False to load data that is known not to contain object arrays for the safer handling of untrusted sources.

Changed in version 1.16.3: Made default False in response to CVE-2019-6446.

see: https://numpy.org/doc/stable/reference/generated/numpy.load.html#numpy.load

Also - did you try setting encoding='bytes' in the load call? That gets passed on to the pickle load call...

Tried it, no difference.

FynnBe commented

pickle=False added [here] for next release

fix is released