Equality testing and hashing
Opened this issue · 5 comments
Hi there! I am interested in using spectra
as a colour library and was hoping that it could perform equality testing and hashing. Currently, this is the state of equality testing:
spectra.html('black') == spectra.html('black') # False
spectra.rgb(1, 0, 0) == spectra.rgb(1, 0, 0) # False
hash(spectra.rgb(1, 0, 0)) == hash(spectra.rgb(1, 0, 0)) # False
Would you be open to a PR that implemented equality testing through the __eq__
and __hash__
methods?
Hi @connorferster, and thanks — that seems like a reasonable PR. I'm curious what the plan is for comparisons such as:
spectra.html('black') == spectra.rgb(0, 0, 0)
On the one hand, I can see returning True
, but then that sets up a similar expectation for all inter-color-space comparisons, some of which are not as precisely comparable as others. Or perhaps the comparison above raises a custom error?
Hi @jsvine,
I asked a seemingly simple question but one that raises so many other questions. While I may be approaching this too naively, here are my thoughts.
It seems that, regardless of the colour space used, the Color
class always creates an RGB conversion and stores it in the .rgb
attributes as a tuple. It is quoted in the README.md that spectra uses the colormath convention that allows RGB conversion to go out of gamut which suggests to me that "any" color created will always have a comparable RGB representation stored, regardless of whether it is valid or not.
My thoughts on comparison possibilities:
- Use this stored
.rgb
attribute as the equality comparison between colour spaces allowing comparison across all colour spaces (is such a concept valid between colour spaces?) - Allow only equality comparison between instances in the same colour space or limited sets of colour spaces (e.g. rgb <-> html). If comparisons are attempted between invalid colour spaces, a
ValueError
would be triggered explaining that two different colour spaces cannot be compared (or that the these two colour spaces cannot be compared). If comparing only within the same colour space, the.value
attribute can be compared. If across colour spaces then the.rgb
attribute would be compared.
Problems that I can currently see:
- Doing comparisons between
float
s in a tuple is difficult. Is it a good assumption to say thatrgb(0.499999999999, 0.499999999999 0.499999999999)
andrgb(0.5, 0.5, 0.5)
are equal? I do not know. - Currently
Color
objects are not frozen or immutable however, the convention is that when aColor
object is "modified" by using one of the methods like.brighten
, a newColor
object is returned. This implies thatColor
objects are immutable or at least should be thought of this way. To makeColor
frozen would be good practice for hashing and could be handled easily by makingColor
a dataclass with thefrozen
attribute set toTrue
. Would makingColor
a frozen dataclass break anything? I do not know. - If comparisons across colour spaces are allowed, then that implies that
rgb(255, 0, 0)
== html(red)
which in turn implies thathash(rgb(255, 0, 0)) == hash(html(red))
. On one hand, "red is red" so that makes intuitive sense. On the other hand, aColor(html(red))
and aColor(rgb(255, 0, 0))
are different objects represented by different attributes so they should absolutely have different hashes.
Applications that I had in mind:
- If
Color
s are hashable (and comparable for equality), they can then be used in set operations - In documents, text and annotations are sometimes assigned a colour value for different purposes. It is useful to be able to write tests like, "if the colour of this component is equal to x, then do y".
- Additionally, if the common colours are used ( e.g.
rgb(1, 0, 0)
) it is also nice to be able sayhtml(red)
in the code to make it more human-readable (possible by comparing across (some) colour spaces). Like, "give me the list of words that are pink highlighted" when they are highlighted asrgb(255, 192, 203)
.
Let me know what your thoughts are on all this. If this is beyond the scope of what you think spectra should be used for then maybe it's best not to open the can of worms that might result from this. However, I really like the spectra API and how user-friendly it is. I thought this could be a good feature, if it is handled in an intuitive, reasoned, and well-documented manner.
Thank you for the thorough set of thoughts, @connorferster! Given your set of interests and concerns, I'd propose something like the following:
def __eq__(self, other):
if self.space == other.space:
return self.values == other.values
else:
return self.rgb == other.rgb
... so if the two colors use the same space, they are compared strictly based on their values. But if they do not use the same space, they are compared via RGB conversion. (spectra
's conversions use https://python-colormath.readthedocs.io/en/latest/conversions.html.)
The downside of this approach is that you could theoretically get a situation where A == B
and A == C
but B != C
, if A
is in one space, while B
and C
are in another. But I think it's worth it (given competing demands), and can be handled with clear documentation. And if someone wants to do strict equality testing, they can do a.space == b.space and a.values == b.values
.
For a similar reason, I'm not terribly concerned about the float comparisons; it should not affect the most common types of comparisons (e.g., those in the same colorspace or with simple conversions such as "red"->(1, 0, 0)) and the fact that conversions may result is hard-to-compare values can be documented.
@jsvine That sounds good to me. The final question is, "How would you like to hash?" Since, we are basing equality testing, ultimately, on self.rgb
then perhaps the implementation should be:
def __hash__(self):
return hash(self.rgb)
Then, in most cases, it will hold true that Color(...) == Color(...) # True
and hash(Color(...)) == hash(Color(...)) # True
but some may consider the behaviour surprising between different color spaces, even though it is consistent with the Python guidelines for hash.
However, in order to implement __hash__
, the instances should be immutable. What are your thoughts on using dataclasses.dataclass
to implement the Color
object as an immutable object?
Thanks for raising these questions. One bigger question: What would the goal of implementing __hash__
be here?
As I understand it, for example, __hash__
is necessary for calling set(...)
on a list of items. Take this expression:
set([
spectra.html('black'),
spectra.rgb(0, 0, 0),
spectra.hsb(0, 0, 0),
])
... would you want it to return {spectra.html('black')}
? To me, that seems somewhat strange / unexpected, given that these are values in different color spaces. But what do you think?
And are there other more pertinent use-cases you have in mind?