Implementing __hash__ properly
Closed this issue · 1 comments
The current __hash__
implementation is not good. It simply filters the fields of the instance it's being applied to, ignores fields that aren't hashable and creates a tuple which it then hashes. This is slow and also provides no warning if a field that should be hashable is not, potentially causing hash collisions. The fact that no one has complained so far is likely testament to the fact that it's almost always better to write a __hash__
manually anyway. But this should be improved.
There are at least two ways to do this:
- Running
typing.get_type_hints
at class initialisation time, to work out which fields are hashable. This is slow and wouldn't give the flexibility of being able to determine which fields to include in the hash - Adding a new "type wrapper", like
Internal
, calledHashed
which would include the field in hash. I think this is better. The only caveat is that I'm pretty sure this breaks tools like mypy. Maybe there is a way to tell it to treat the "wrapped" type as the significant one, like withOptional
.
I have merged the new, more flexible, Hint
implementation using Union
and the new Hashed
type into master. On second thought, I don't believe it will affect mypy because I presume it is either naive enough to read external annotations as strings or smart enough to be able to evaluate the __class_getitem__
, since that is itself fully type hinted.
These changes will be included in the next bugfix release, v0.7.3
. My plan for v0.8
is to include a new __hash__
implementation using Hashed
, since this will be a breaking change.