biqqles/dataclassy

Implementing __hash__ properly

Closed this issue · 1 comments

The current __hash__ implementation is not good. It simply filters the fields of the instance it's being applied to, ignores fields that aren't hashable and creates a tuple which it then hashes. This is slow and also provides no warning if a field that should be hashable is not, potentially causing hash collisions. The fact that no one has complained so far is likely testament to the fact that it's almost always better to write a __hash__ manually anyway. But this should be improved.

There are at least two ways to do this:

  • Running typing.get_type_hints at class initialisation time, to work out which fields are hashable. This is slow and wouldn't give the flexibility of being able to determine which fields to include in the hash
  • Adding a new "type wrapper", like Internal, called Hashed which would include the field in hash. I think this is better. The only caveat is that I'm pretty sure this breaks tools like mypy. Maybe there is a way to tell it to treat the "wrapped" type as the significant one, like with Optional.

I have merged the new, more flexible, Hint implementation using Union and the new Hashed type into master. On second thought, I don't believe it will affect mypy because I presume it is either naive enough to read external annotations as strings or smart enough to be able to evaluate the __class_getitem__, since that is itself fully type hinted.

These changes will be included in the next bugfix release, v0.7.3. My plan for v0.8 is to include a new __hash__ implementation using Hashed, since this will be a breaking change.