jsvine/spectra

Equality testing and hashing

Opened this issue · 5 comments

Hi there! I am interested in using spectra as a colour library and was hoping that it could perform equality testing and hashing. Currently, this is the state of equality testing:

spectra.html('black') == spectra.html('black') # False
spectra.rgb(1, 0, 0) == spectra.rgb(1, 0, 0) # False
hash(spectra.rgb(1, 0, 0)) == hash(spectra.rgb(1, 0, 0)) # False

Would you be open to a PR that implemented equality testing through the __eq__ and __hash__ methods?

Hi @connorferster, and thanks — that seems like a reasonable PR. I'm curious what the plan is for comparisons such as:

spectra.html('black') == spectra.rgb(0, 0, 0)

On the one hand, I can see returning True, but then that sets up a similar expectation for all inter-color-space comparisons, some of which are not as precisely comparable as others. Or perhaps the comparison above raises a custom error?

Hi @jsvine,

I asked a seemingly simple question but one that raises so many other questions. While I may be approaching this too naively, here are my thoughts.

It seems that, regardless of the colour space used, the Color class always creates an RGB conversion and stores it in the .rgb attributes as a tuple. It is quoted in the README.md that spectra uses the colormath convention that allows RGB conversion to go out of gamut which suggests to me that "any" color created will always have a comparable RGB representation stored, regardless of whether it is valid or not.

My thoughts on comparison possibilities:

  • Use this stored .rgb attribute as the equality comparison between colour spaces allowing comparison across all colour spaces (is such a concept valid between colour spaces?)
  • Allow only equality comparison between instances in the same colour space or limited sets of colour spaces (e.g. rgb <-> html). If comparisons are attempted between invalid colour spaces, a ValueError would be triggered explaining that two different colour spaces cannot be compared (or that the these two colour spaces cannot be compared). If comparing only within the same colour space, the .value attribute can be compared. If across colour spaces then the .rgb attribute would be compared.

Problems that I can currently see:

  1. Doing comparisons between floats in a tuple is difficult. Is it a good assumption to say that rgb(0.499999999999, 0.499999999999 0.499999999999) and rgb(0.5, 0.5, 0.5) are equal? I do not know.
  2. Currently Color objects are not frozen or immutable however, the convention is that when a Color object is "modified" by using one of the methods like .brighten, a new Color object is returned. This implies that Color objects are immutable or at least should be thought of this way. To make Color frozen would be good practice for hashing and could be handled easily by making Color a dataclass with the frozen attribute set to True. Would making Color a frozen dataclass break anything? I do not know.
  3. If comparisons across colour spaces are allowed, then that implies that rgb(255, 0, 0) == html(red) which in turn implies that hash(rgb(255, 0, 0)) == hash(html(red)). On one hand, "red is red" so that makes intuitive sense. On the other hand, a Color(html(red)) and a Color(rgb(255, 0, 0)) are different objects represented by different attributes so they should absolutely have different hashes.

Applications that I had in mind:

  1. If Colors are hashable (and comparable for equality), they can then be used in set operations
  2. In documents, text and annotations are sometimes assigned a colour value for different purposes. It is useful to be able to write tests like, "if the colour of this component is equal to x, then do y".
  3. Additionally, if the common colours are used ( e.g. rgb(1, 0, 0)) it is also nice to be able say html(red) in the code to make it more human-readable (possible by comparing across (some) colour spaces). Like, "give me the list of words that are pink highlighted" when they are highlighted as rgb(255, 192, 203).

Let me know what your thoughts are on all this. If this is beyond the scope of what you think spectra should be used for then maybe it's best not to open the can of worms that might result from this. However, I really like the spectra API and how user-friendly it is. I thought this could be a good feature, if it is handled in an intuitive, reasoned, and well-documented manner.

Thank you for the thorough set of thoughts, @connorferster! Given your set of interests and concerns, I'd propose something like the following:

def __eq__(self, other):
    if self.space == other.space:
        return self.values == other.values
    else:
        return self.rgb == other.rgb

... so if the two colors use the same space, they are compared strictly based on their values. But if they do not use the same space, they are compared via RGB conversion. (spectra's conversions use https://python-colormath.readthedocs.io/en/latest/conversions.html.)

The downside of this approach is that you could theoretically get a situation where A == B and A == C but B != C, if A is in one space, while B and C are in another. But I think it's worth it (given competing demands), and can be handled with clear documentation. And if someone wants to do strict equality testing, they can do a.space == b.space and a.values == b.values.

For a similar reason, I'm not terribly concerned about the float comparisons; it should not affect the most common types of comparisons (e.g., those in the same colorspace or with simple conversions such as "red"->(1, 0, 0)) and the fact that conversions may result is hard-to-compare values can be documented.

@jsvine That sounds good to me. The final question is, "How would you like to hash?" Since, we are basing equality testing, ultimately, on self.rgb then perhaps the implementation should be:

def __hash__(self):
    return hash(self.rgb)

Then, in most cases, it will hold true that Color(...) == Color(...) # True and hash(Color(...)) == hash(Color(...)) # True but some may consider the behaviour surprising between different color spaces, even though it is consistent with the Python guidelines for hash.

However, in order to implement __hash__, the instances should be immutable. What are your thoughts on using dataclasses.dataclass to implement the Color object as an immutable object?

Thanks for raising these questions. One bigger question: What would the goal of implementing __hash__ be here?

As I understand it, for example, __hash__ is necessary for calling set(...) on a list of items. Take this expression:

set([
  spectra.html('black'),
  spectra.rgb(0, 0, 0),
  spectra.hsb(0, 0, 0),
])

... would you want it to return {spectra.html('black')}? To me, that seems somewhat strange / unexpected, given that these are values in different color spaces. But what do you think?

And are there other more pertinent use-cases you have in mind?