uqfoundation/dill

TypeError: can't pickle _abc_data objects

Closed this issue ยท 33 comments

On Python 3.7, I'm getting this error:

Traceback (most recent call last):
  File "../scripts/make_preprocessor.py", line 101, in <module>
    pickle.dump(preprocessor, f)
  File "/home/ubuntu/repo/.venv/lib/python3.7/site-packages/dill/_dill.py", line 287, in dump
    pik.dump(obj)
  File "/usr/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/usr/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/repo/.venv/lib/python3.7/site-packages/dill/_dill.py", line 910, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.7/pickle.py", line 856, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.7/pickle.py", line 882, in _batch_setitems
    save(v)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 816, in save_list
    self._batch_appends(obj)
  File "/usr/lib/python3.7/pickle.py", line 840, in _batch_appends
    save(x)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 771, in save_tuple
    save(element)
  File "/usr/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.7/pickle.py", line 633, in save_reduce
    save(cls)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/repo/.venv/lib/python3.7/site-packages/dill/_dill.py", line 1323, in save_type
    obj.__bases__, _dict), obj=obj)
  File "/usr/lib/python3.7/pickle.py", line 638, in save_reduce
    save(args)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 786, in save_tuple
    save(element)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/repo/.venv/lib/python3.7/site-packages/dill/_dill.py", line 910, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.7/pickle.py", line 856, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.7/pickle.py", line 882, in _batch_setitems
    save(v)
  File "/usr/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
TypeError: can't pickle _abc_data objects

I believe it may be related to this issue: cloudpipe/cloudpickle#180

@mmckerns here is an minimal, reproducible example:

import dill as pickle

from abc import ABCMeta

class TestClass(metaclass=ABCMeta):
    pass

with open('test.pickle', 'wb') as f:
    pickle.dump(TestClass, f)

And more real world example...

from sklearn.compose import ColumnTransformer

class DataFrameTransformer(ColumnTransformer):

    def get_feature_names(self):
        """
        enable get_feature_names when using remainder passthrough
        """
        feature_names = []
        for name, transformer, columns in self.transformers_:
            if hasattr(transformer, "get_feature_names"):
                feature_names += [f"{name}_{x}" for x in transformer.get_feature_names()]
            elif transformer == "drop":
                continue
            elif name == "remainder" and transformer == "passthrough":
                feature_names += self._df_columns.take(columns).tolist()
        return feature_names

with open('test.pickle', 'wb') as f:
    pickle.dump(DataFrameTransformer, f)

I think this is a blocking issue. Cloudpickle handles it fine. I'm not able to migrate production code using dill to the newer python versions since I'm using abstract classes, hence I will need to consider switching to cloudpickle.

I still have the exact same problem. Python 3.8.5, multiprocess 0.70.11.1, dill 0.3.3.
Is there any planned fix? It is also blocking in our situation.

Update: The above minimal example runs through with Python 3.6 but throws the above error in Python 3.7, 3.8 and 3.9.

+1

and it's even not easy to debug which object is the cause of the problem

silly workaround (but worked for me): remove all ABCs from every class extending from ABC

Any thoughts on how to fix this... or where to start?
I would be willing to contribute but need some pointers to get started.

I did not realize this was still an issue... it's basically, the _abc_impl attribute which is an _abc_data object, which is coded in C, that's causing the issue. There's a method, _abc._get_dump that provides the serialization for the object. I'm not sure how the load works yet... I'll look into it.

Would it be something like registering those methods you reference similar to the save_XXX methods in dill/_dill.py?
The dill code is a bit hard to jump into... but I would be glad to help get this done if it isn't more trouble for you than I am worth.

+1 long running convo it seems.

I will take a crack at this this week.
Anybody interested in jumping in with me?

@mmckerns I was able to hack in a solution that solves my broken tests...
emfdavid@770c5a0
But I am still a bit lost. I looked at what cloudpickle does, which also works with /clsdict/_dict/ in _dill, but once I realized that the built in pickle seems to handle ABC objects, it seemed simpler to kick it back to StockPickler and more inline with your approach elsewhere.

Anyway, not sure I have this in the right place... but it does fix my test cases. Further input would be welcome!

I think what you have looks reasonable, but should probably be tested a bit more extensively. I think if it's possible to register the ABCMeta classes, that would be a preferable approach over catching them as you have with an if.

Yup - @mmckerns it just works with

@register(abc.ABCMeta)
def save_abc(pickler, obj):
    StockPickler.save_type(pickler, obj)

Amazing stuff - I even added a lambda as an instance variable to the test class and dill/pickle still handles both properly!
Please let me know what other angles I should test and whether you want me to keep the tests in test_objects or make a new test module for ABC. Will cleanup/rebase before publishing the PR for review. Thanks for hand holding to get this done!

Feel free to create your own test file for abc, especially if there's a lot to test.

Also having this issue; thanks @emfdavid for taking the time to create a fix.

gabyx commented

Any progress on this?

@gabyx: Yes, issue #450.

import dill as pickle

from abc import ABCMeta

class TestClass(metaclass=ABCMeta):
    pass

with open('test.pickle', 'wb') as f:
    pickle.dump(TestClass, f)

Hello, the code above doesn't work with the latest. Did I missed something ?

Python 3.6 implements the ABC mechanism in C as opposed to pure Python, so it cannot be pickled the same way as other classes. #450 fixes it, but it wasn't included in 0.3.5 because the feature was not 100% complete by the release date.

If the class inheriting from ABC is imported from another file, dill works just fine. It's only if the class is defined in the script that's executing the dill call (Python 3.8.10, dill 0.3.5.1).

So one workaround until the inclusion of #450 is to put anything inheriting from ABC or using ABCMeta in a separate file.

pickling.py (the script to execute):

import dill
import pickle


class A:
    pass

pickle.loads(pickle.dumps(A()))  # Succeeds
dill.loads(dill.dumps(A())) # Succeeds

from abc import ABC


class Virtual(ABC):
    pass


pickle.loads(pickle.dumps(Virtual()))  # Succeeds
dill.loads(dill.dumps(Virtual()))  # Fails

from a_module import VirtualInAnotherFile

pickle.loads(pickle.dumps(VirtualInAnotherFile()))  # Succeeds
dill.loads(dill.dumps(VirtualInAnotherFile()))  # Succeeds

a_module.py:

from abc import ABC


class VirtualInAnotherFile(ABC):
    pass

With dill.detect.trace(True), the failing call shows this:

T2: <class '__main__.Virtual'>
F2: <function _create_type at 0x0000018C7F3D9CA0>
# F2
T4: <class 'abc.ABCMeta'>
# T4
T4: <class 'abc.ABC'>
# T4
D2: <dict object at 0x0000018C7F5EA5C0>

at which point you get a TypeError: cannot pickle '_abc_data' object.

The trace for the ABC in the other file only shows this:

T4: <class 'a_module.VirtualInAnotherFile'>
# T4

Without updating the dill codebase, it is not possible to be able to pickle any ABC class completely. This short segment of code, however, can cover most cases:

import dill, _abc
@dill.register(_abc._abc_data)
def save_abc_impl(pickler, obj):
  pickler.save(None)

Unfortunately, this workaround throws away all metadata Python keeps about the ABC class, including registered subclasses, so don't use this work around if the ABCs you are pickling use this feature. Doing so, will lead to bizarre behavior.

>>> import abc
>>> import dill
>>>
>>> class A(abc.ABC):
...   @abc.abstractmethod
...   def f(self):
...     pass
...
>>> class B:
...   def f():
...     print('h')
...
>>> A.register(B)
>>>
>>> issubclass(B, dill.copy(A))
False
>>> issubclass(B, A)
True

I haven't experienced this issue before, however now with the release of Dill 0.3.6, I also get this error. I can't quite pin point it to a reproducible snippet yet, but my code base has been using abstract base classes which are also part of the serialized objects in various places without issues until now.

[ ... ]
  File "/Users/niklas.rosenstein/git/terraform-projects/azure-ad/build/.kraken/venv/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/Users/niklas.rosenstein/.pyenv/versions/3.10.2/lib/python3.10/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/Users/niklas.rosenstein/git/terraform-projects/azure-ad/build/.kraken/venv/lib/python3.10/site-packages/dill/_dill.py", line 1186, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/Users/niklas.rosenstein/.pyenv/versions/3.10.2/lib/python3.10/pickle.py", line 972, in save_dict
    self._batch_setitems(obj.items())
  File "/Users/niklas.rosenstein/.pyenv/versions/3.10.2/lib/python3.10/pickle.py", line 998, in _batch_setitems
    save(v)
  File "/Users/niklas.rosenstein/git/terraform-projects/azure-ad/build/.kraken/venv/lib/python3.10/site-packages/dill/_dill.py", line 388, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/Users/niklas.rosenstein/.pyenv/versions/3.10.2/lib/python3.10/pickle.py", line 578, in save
    rv = reduce(self.proto)
TypeError: cannot pickle '_abc._abc_data' object

Downgrading to Dill 0.3.5.1 fixes it for me.

@NiklasRosenstein: It's my understanding (as demonstrated in this and other issues) that dill could never serialize _abc_data objects. So, I'm assuming that you are experiencing a change in the serialization of some other object that is now failing upon hitting an _abc_data object... if that is indeed the case, you probably should open a new issue.

as @NiklasRosenstein writes, this worked for me as well.

Downgrading to Dill 0.3.5.1 fixes it for me.

@mmckerns it is interesting that downgrading dill alone fixes the issue, hinting that something in dill 0.3.6 changed that surfaces this issue

@miraculixx: I agree that some recent change may have brought this issue to the surface. My point is that the abc_data object was never serializable, so indeed something brought that to the surface for you. I'd like to help you, so if you or @NiklasRosenstein can open a new ticket with a minimal test code that reproduces your particular issue, that would be a huge help. Or, if you can't do that, then maybe you could isolate the commit that causes the change in behavior you are seeing.

Should this issue be closed now that PRs #577 and #580 have been merged?

It appears like it was left dangling, thanks.

>>> import dill
>>> from abc import ABCMeta
>>> 
>>> class TestClass(metaclass=ABCMeta):
...     pass
... 
>>> dill.dumps(TestClass)
b'\x80\x04\x95\xd0\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x0c_create_type\x94\x93\x94(\x8c\x03abc\x94\x8c\x07ABCMeta\x94\x93\x94\x8c\tTestClass\x94h\x00\x8c\n_load_type\x94\x93\x94\x8c\x06object\x94\x85\x94R\x94\x85\x94}\x94(\x8c\n__module__\x94\x8c\x08__main__\x94\x8c\x07__doc__\x94N\x8c\x13__abstractmethods__\x94(\x91\x94ut\x94R\x94\x8c\x08builtins\x94\x8c\x07setattr\x94\x93\x94h\x14\x8c\x0c__qualname__\x94h\x06\x87\x94R0.'

I still have this issue with Python 3.10

I'm not seeing an issue with 3.10. Can you provide an example that demonstrates the issue?

Still facing this issue with Python 3.8 and dill 3.6

The following example still returns "TypeError: cannot pickle '_abc_data' object"

import dill
from abc import ABCMeta

class AbstractClass(metaclass=ABCMeta):
     pass
 
class ConcreteClass(AbstractClass):
     pass
 
dill.dumps(ConcreteClass)

Here's your example using dill master:

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> from abc import ABCMeta
>>> 
>>> class AbstractClass(metaclass=ABCMeta):
...   pass
... 
>>> class ConcreteClass(AbstractClass):
...   pass
... 
>>> dill.dumps(ConcreteClass)
b'\x80\x04\x95\x03\x01\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x0c_create_type\x94\x93\x94(\x8c\x03abc\x94\x8c\x07ABCMeta\x94\x93\x94\x8c\rConcreteClass\x94h\x02(h\x05\x8c\rAbstractClass\x94h\x00\x8c\n_load_type\x94\x93\x94\x8c\x06object\x94\x85\x94R\x94\x85\x94}\x94(\x8c\n__module__\x94\x8c\x08__main__\x94\x8c\x07__doc__\x94N\x8c\x13__abstractmethods__\x94(\x91\x94ut\x94R\x94\x8c\x08builtins\x94\x8c\x07setattr\x94\x93\x94h\x15\x8c\x0c__qualname__\x94h\x07\x87\x94R0\x85\x94}\x94h\x12(\x91\x94st\x94R\x94h\x18h\x1fh\x19h\x06\x87\x94R0.'
>>> dill.__version__
'0.3.7.dev0'

If you update to master do you still see the issue?

any updates on a fix? ๐Ÿ‘‰๐Ÿ‘ˆ๐Ÿฅบ