uqfoundation/dill

`_is_builtin_module` is wrong for environments managed by Spack

Closed this issue · 1 comments

When I invoke Parsl (even the simplest possible case), Parsl uses dill to serialize the function and arguments, which fails. Serializing the function and arguments somehow leads to serializing collections.abc, which leads to serializing bytes_iterator according to the Dill trace. This fails to serialize with the following stderr:

...snip...
  File "/home/sam/Downloads/test/.spack-env/view/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/home/sam/Downloads/test/.spack-env/view/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/sam/Downloads/test/.spack-env/view/lib/python3.10/site-packages/dill/_dill.py", line 1711, in save_type
    StockPickler.save_global(pickler, obj, name=obj_name)
  File "/home/sam/Downloads/test/.spack-env/view/lib/python3.10/pickle.py", line 1071, in save_global
    raise PicklingError(
_pickle.PicklingError: Can't pickle <class 'bytes_iterator'>: it's not found as builtins.bytes_iterator

(full stderr)

bytes_iterator is indeed a member of collections.abc), but the bigger problem is why is Dill trying to serialize builtin modules?. In fact, dill._dill._is_builtin_module(collections) returns False instead of True when Python and Dill are installed by Spack.

>>> import dill, collections, sys, os
>>> dill._dill._is_builtin_module(collections)
False
>>> # This is incorrect; collections **is** builtin.
>>> collections.__file__
'/home/sam/Downloads/test/.spack-env/view/lib/python3.10/collections/__init__.py'
>>> sys.prefix
'/home/sam/Downloads/test/.spack-env/view'
>>> # So far, so good. collections.__file__.startswith(sys.prefix)
>>> os.path.realpath(collections.__file__)
'/home/sam/.local/share/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-11.3.0/python-3.10.8-wobwcruhfbzy5noyhl4vmvi2tuygyw6k/lib/python3.10/collections/__init__.py'
>>> os.path.realpath(sys.prefix)
'/home/sam/Downloads/test/.spack-env/._view/y3klaw6vrkdxyp23swulxprknwvfpsn6'

While collections.__file__ is within sys.prefix, the realpath is not. This is because Spack manages Python environments by symlinking packages into a "view".

Here is a minimum working example:

apt update && apt install -y build-essential ca-certificates coreutils curl environment-modules gfortran git gpg lsb-release python3 python3-distutils python3-venv unzip zip
git clone -c feature.manyFiles=true https://github.com/spack/spack.git
source spack/share/spack/setup-env.sh
spack install python@3.10
spack install py-dill@0.3.5.1
# Note this also fails in Dill 0.3.6, but that is not yet in Spack's default package repo.
python3.10 -c 'import collections, dill; print(dill._dill._is_builtin_module(collections))"
# Prints False
python3.10 -c 'import collections, dill; dill.dumps(collections)"
# Errors with the above traceback.

Thanks for reporting a #567. That bit of code is particularly bad for unusual installs or platforms, and needs a better solution.