Warning for axis_metric
koaning opened this issue · 5 comments
Let's set up a small embeddingset.
from whatlies.language import SpacyLanguage
from whatlies.transformers import Pca
words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman",
"cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire",
"dog", "cat", "mouse", "red", "bluee", "green", "yellow", "water",
"person", "family", "brother", "sister"]
lang = SpacyLanguage("en_core_web_md")
emb = lang[words]
Let's now make some charts.
emb.transform(Pca(2)).plot(kind='scatter', annot=True)
emb.transform(Pca(2)).plot(kind='scatter', annot=True, axis_metric='cosine')
emb.transform(Pca(2)).plot(kind='scatter', annot=True, axis_metric='euclidean')
At the moment, they all seem to return this chart.
It took me a while to realise that the reason why this was not working as expected was because of the fact that I'm not using "king"
as an x_axis/y_axis. Maybe we should introduce a warning here. Might be more user friendly.
Maybe we should introduce a warning here. Might be more user friendly.
Well, I have mentioned this in the docstrings that custom metric is only effective when the axis is a string or an Embedding
(hence it will be ignored when integer axis is provided, which is the default):
But we can also add a warning for it if you consider it to be useful.
It's certainly in the docs, and this was certainly a moment of "lack of attention" on my part, but I can imagine that these small changes can help beginning users. I'll add a small warning.
Further, I would like to resolve a recurring.... I don't know what to call it because I am sure you are fully aware of this, but let's call it a "confusion": the pca_0
, pca_1
and alike were only unit indicator vectors which helped with keeping the plot API consistent and make its implementation and usage easier (of course, before introducing integer axis support); hence, they should not be confused with the principal component vectors or alike or think that they were representations of principal components (which is not correct at all). In other words, they actually encoded no useful information and were not essential to exist, and that's why I would call them "helper vectors" (in the same sense as "helper functions").
Good to point out. I'll try to refer to the phenomenon as "helper vectors" from here on.
I'm closing issues because ever since the project moved to my personal account it's been more into maintenance mode than a "active work" mode.