Pickling / Unpickling leads to inconsistent hash value
Closed this issue · 0 comments
matthiasdiener commented
Running the following program twice can lead to inconsistent hash values due to restoring the hash from the pickled file (which might differ from the hash value when creating the same immutabledict
from scratch):
from pickle import dump, load
from immutabledict import immutabledict
i1 = immutabledict({"a": "1", "b": "2", "c": "3"})
hash(i1) # Force creating a cached hash value
try:
# Second run: load pickle file
with open("pickle_imm.pkl", "rb") as f:
i2 = load(f)
except FileNotFoundError:
# First run: create pickle file
i2 = immutabledict({"a": "1", "b": "2", "c": "3"})
with open("pickle_imm.pkl", "wb") as f:
dump(i1, f)
assert i1 == i2
print(hash(i1), hash(i2))
assert hash(i1) == hash(i2) # Fails on the second run: immutabledicts compare equal, but their hash values are different!
$ python pickletest.py
1128449412753896495 1128449412753896495
$ python pickletest.py
-5549929714807656641 1128449412753896495
Traceback (most recent call last):
File "/Users/mdiener/Work/orderedsets/pickletest.py", line 24, in <module>
assert hash(i1) == hash(i2)
^^^^^^^^^^^^^^^^^^^^
AssertionError
This mostly affects situations where strings, class types, etc. are stored in the immutabledict
, which do not have stable hash values across Python invocations.
There are a few potential solutions:
- Exclude the cached hash from
__getstate__
/__setstate__
Usepytools.memoize_method
to cache the hash value (similar to what I did in matthiasdiener/orderedsets#28)Use something like https://stackoverflow.com/a/71663059/1250282 to cache the hash value- Don't cache the hash value at all