How to deal with coloring datapoints in plots?
mkaze opened this issue · 3 comments
This is a place to discuss how to properly let the user assign color to the datapoints using the plot API. Currently, there are two different behaviors which are supported:
-
In our plot methods supported by matplotlib, the
color
argument is expected to take the same format as supported by matplotlib (documentation): -
In our plot methods supported by Altair, we expect the
color
argument to be a string referring to a "attribute" of embeddings in the embeddingset. The values of that attribute are not important, as they are automatically mapped to colors by Altair (also note that if the value of the attribute for an embedding is "red", this does not mean that its corresponding datapoint would be shown with red color).
So now the question is: how can we have a consistent approach in all plot methods regarding the color of datapoints? Should we use only one of the above behaviors or support both? To make it more concrete, let's provide some examples:
-
embset.the_plot_method(..., color="red")
, should we use:- the color red for all the datapoints?
- the value of the "red" property of embeddings for all datapoints? In this case, should we interpret the attribute values as colors, or let the underlying library decide? (I don't think matplotlib understands non-color strings).
-
embset.the_plot_method(..., color="intent_class")
, should we use:- the color "intent_class" for all the datapoints? There is no such color!
- the value of the "intent_class" property of embeddings for all datapoints?
-
embset.the_plot_method(..., color=a_list_of_numbers_or_color_strings__one_for_each_embedding)
, should we allow this as well (like matplotlib)? -
Should the final approach we use be consistent/unified across all plot methods and different visualization backends? Or should we set different constraints for different backends/methods?
One more point which might be worth mentioning: matplotlib uses c
argument for the color; however, you can find the below note in the documentation as well:
If you wish to specify a single color for all points prefer the color keyword argument.
We can also keep this in mind for the approach we want to take, especially if we decide to support additional keyword arguments for plot methods as I suggested here in response to point 1.
@mkaze might it work if we use embset.plot_method(..., color="string")
with the following:
- If
color
is a property set on the embeddings in the embeddingset we'll use that. If the property is a float we'll leave the numeric value as is, if the property contains categories we'll do preprocessing such that matplotlib can handle it. - If
color
is not in the set we'll set the color based on a color list. We'll need to worry a bit here because thecolor="red"
from matplotlib might be different than Altair. - If a
color
is passed that is neither a known color or a property, we throw an error.
I'm closing issues because ever since the project moved to my personal account it's been more into maintenance mode than a "active work" mode.